Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE. Genome annotation assessment in Drosophila melanogaster. Genome Res 2000;10:483-501. [PMID: 10779488 PMCID: PMC310877 DOI: 10.1101/gr.10.4.483] [Citation(s) in RCA: 125] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2000] [Accepted: 02/29/2000] [Indexed: 11/24/2022]

For:	Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE. Genome annotation assessment in Drosophila melanogaster. Genome Res 2000;10:483-501. [PMID: 10779488 PMCID: PMC310877 DOI: 10.1101/gr.10.4.483] [Citation(s) in RCA: 125] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2000] [Accepted: 02/29/2000] [Indexed: 11/24/2022]

Number

Cited by Other Article(s)

Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez Martinez JM, Hunt T, Lagarde J, Liang CE, Li H, Meade MJ, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Çelik MH, Chen Y, Du MRM, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Li JL, Lienhard M, Mikheenko A, Mulligan D, Nip KM, Pertea M, Ritchie ME, Sim AD, Tang AD, Wan YK, Wang C, Wong BY, Yang C, Barnes I, Berry AE, Capella-Gutierrez S, Cousineau A, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Götz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Ren X, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Smith ML, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Maehr R, Shen Y, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Au KF, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nat Methods 2024:10.1038/s41592-024-02298-3. [PMID: 38849569 DOI: 10.1038/s41592-024-02298-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 05/03/2024] [Indexed: 06/09/2024]

Affiliation(s)

Francisco J Pardo-Palacios Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Dingjie Wang Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Fairlie Reese Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
Mark Diekhans UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
Sílvia Carbonell-Sala Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
Brian Williams Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
Jane E Loveland European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Maite De María Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA Cherokee Nation System Solutions, contractor to the US Geological Survey-Wetland and Aquatic Research Center, Gainesville, FL, USA
Matthew S Adams Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
Gabriela Balderrama-Gutierrez Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
Amit K Behera Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Jose M Gonzalez Martinez European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Toby Hunt European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Julien Lagarde Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Flomics Biotech, SL, Barcelona, Spain
Cindy E Liang Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
Haoran Li Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Marcus Jerryd Meade Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
David A Moraga Amador Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
Andrey D Prjibelski Department of Computer Science, University of Helsinki, Helsinki, Finland Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
Inanc Birol Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
Hamed Bostan Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
Ashley M Brooks Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
Muhammed Hasan Çelik Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
Ying Chen Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Mei R M Du Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
Colette Felton Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Jonathan Göke Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
Saber Hafezqorani Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
Ralf Herwig Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
Hideya Kawaji Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
Joseph Lee Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Jian-Liang Li Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC, USA
Matthias Lienhard Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
Alla Mikheenko Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
Dennis Mulligan Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Ka Ming Nip Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
Mihaela Pertea Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
Matthew E Ritchie Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
Andre D Sim Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Alison D Tang Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Yuk Kei Wan Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Changqing Wang Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
Brandon Y Wong Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
Chen Yang Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, British Columbia, Canada
If Barnes European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Andrew E Berry European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Salvador Capella-Gutierrez Barcelona Supercomputing Center, Barcelona, Spain
Alyssa Cousineau Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA
Namrita Dhillon Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Jose M Fernandez-Gonzalez Barcelona Supercomputing Center, Barcelona, Spain
Luis Ferrández-Peral Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Natàlia Garcia-Reyero Energy, Installations & Environment, Office of the Assistant Secretary of Defense, Washington, DC, USA
Stefan Götz Biobam Bioinformatics, Valencia, Spain
Carles Hernández-Ferrer Barcelona Supercomputing Center, Barcelona, Spain
Liudmyla Kondratova Genetics Institute, University of Florida, Gainesville, FL, USA
Tianyuan Liu Cardiff University, Cardiff, UK
Alessandra Martinez-Martin Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Carlos Menor Biobam Bioinformatics, Valencia, Spain
Jorge Mestre-Tomás Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Jonathan M Mudge European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Nedka G Panayotova Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
Alejandro Paniagua Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Dmitry Repchevsky Barcelona Supercomputing Center, Barcelona, Spain
Xingjie Ren Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
Eric Rouchka Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA
Brandon Saint-John Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
Enrique Sapena European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Leon Sheynkman Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
Melissa Laird Smith Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, KY, USA
Marie-Marthe Suner European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK
Hazuki Takahashi Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
Ingrid A Youngworth Department of Genetics, Stanford University, Palo Alto, CA, USA
Piero Carninci Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan Human Technopole, Milano, Italy
Nancy D Denslow Department of Physiological Sciences, College of Veterinary Medicine, Gainesville, FL, USA Center for Environmental and Human Toxicology, Department of Physiological Sciences, University of Florida, Gainesville, FL, USA
Roderic Guigó Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain
Margaret E Hunter US Geological Survey, Wetland and Aquatic Research Center, Gainesville, FL, USA
Rene Maehr Program in Molecular Medicine, Diabetes Center of Excellence, University of Massachusetts Chan Medical School, Worcester, MA, USA
Yin Shen Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Hagen U Tilgner Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, NY, USA
Barbara J Wold Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
Christopher Vollmers Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge, UK.
Kin Fai Au Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Gloria M Sheynkman Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA. Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA. UVA Cancer Center, University of Virginia, Charlottesville, VA, USA.
Ali Mortazavi Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA. Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
Ana Conesa Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain. Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.
Angela N Brooks UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA. Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.

Collapse

Sullivan DK, Min KHJ, Hjörleifsson KE, Luebbert L, Holley G, Moses L, Gustafsson J, Bray NL, Pimentel H, Booeshaghi AS, Melsted P, Pachter L. kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.21.568164. [PMID: 38045414 PMCID: PMC10690192 DOI: 10.1101/2023.11.21.568164] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

Affiliation(s)

Delaney K Sullivan Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Kyung Hoi Joseph Min Ginkgo Bioworks, Boston, MA, 02210, USA
Kristján Eldjárn Hjörleifsson Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
Laura Luebbert Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
Guillaume Holley deCODE Genetics/Amgen Inc., Reykjavik, Iceland
Lambda Moses Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
Johan Gustafsson Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Nicolas L Bray Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Harold Pimentel Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
A Sina Booeshaghi Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Páll Melsted deCODE Genetics/Amgen Inc., Reykjavik, Iceland School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Lior Pachter Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA

Collapse

Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MR, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]

Affiliation(s)

Francisco J. Pardo-Palacios Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain These authors contributed equally to this work
Dingjie Wang Department of Biomedical Informatics, The Ohio State University, Columbus, USA Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA These authors contributed equally to this work
Fairlie Reese Developmental and Cell Biology, University of California, Irvine, Irvine, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, USA These authors contributed equally to this work
Mark Diekhans UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA These authors contributed equally to this work
Sílvia Carbonell-Sala Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain These authors contributed equally to this work
Brian Williams Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA These authors contributed equally to this work
Jane E. Loveland European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK These authors contributed equally to this work
Maite De María Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA Center for Environmental and Human Toxicology, University of Florida, Gainesville, USA These authors contributed equally to this work
Matthew S. Adams Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA These authors contributed equally to this work
Gabriela Balderrama-Gutierrez Developmental and Cell Biology, University of California, Irvine, Irvine, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, USA These authors contributed equally to this work
Amit K. Behera Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA These authors contributed equally to this work
Jose M. Gonzalez European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK These authors contributed equally to this work
Toby Hunt European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK These authors contributed equally to this work
Julien Lagarde Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain Flomics Biotech, Dr Aiguader 88, Barcelona 08003, Spain These authors contributed equally to this work
Cindy E. Liang Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA These authors contributed equally to this work
Haoran Li Department of Biomedical Informatics, The Ohio State University, Columbus, USA Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA These authors contributed equally to this work
Marcus Jerryd Meade Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA These authors contributed equally to this work
David A. Moraga Amador Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA These authors contributed equally to this work
Andrey D. Prjibelski Department of Computer Science, University of Helsinki, Helsinki, Finland Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia These authors contributed equally to this work
Inanc Birol Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
Hamed Bostan Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
Ashley M. Brooks Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
Muhammed Hasan Çelik Developmental and Cell Biology, University of California, Irvine, Irvine, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
Ying Chen Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Mei R,M. Du Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
Colette Felton Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Jonathan Göke Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
Saber Hafezqorani Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
Ralf Herwig Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
Hideya Kawaji Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
Joseph Lee Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Jian Liang Li Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
Matthias Lienhard Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
Alla Mikheenko Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
Dennis Mulligan Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Ka Ming Nip Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
Mihaela Pertea Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA Center for Computational Biology, Johns Hopkins University, Baltimore, USA
Matthew E. Ritchie Walter and Eliza Hall Institute of Medical Research, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia
Andre D. Sim Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Alison D. Tang Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Yuk Kei Wan Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Changqing Wang Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
Brandon Y. Wong Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA Center for Computational Biology, Johns Hopkins University, Baltimore, USA
Chen Yang Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
If Barnes European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Andrew Berry European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Salvador Capella Barcelona Supercomputing Cente, Barcelona, Spain
Namrita Dhillon Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Jose M. Fernandez-Gonzalez Barcelona Supercomputing Cente, Barcelona, Spain
Luis Ferrández-Peral Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Natàlia Garcia-Reyero Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, USA
Stefan Goetz Biobam Bioinformatics SL, Valencia, Spain
Carles Hernández-Ferrer Barcelona Supercomputing Cente, Barcelona, Spain
Liudmyla Kondratova Genetics Institute, University of Florida, Gainesville, USA
Tianyuan Liu Cardiff University, Cardiff, UK
Alessandra Martinez-Martin Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Carlos Menor Biobam Bioinformatics SL, Valencia, Spain
Jorge Mestre-Tomás Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Jonathan M. Mudge European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Nedka G. Panayotova Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
Alejandro Paniagua Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Dmitry Repchevsky Barcelona Supercomputing Cente, Barcelona, Spain
Eric Rouchka Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
Brandon Saint-John Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Enrique Sapena European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, UK
Leon Sheynkman Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
Melissa Laird Smith Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
Marie-Marthe Suner European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Hazuki Takahashi Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
Ingrid Ashley. Youngworth Department of Genetics, Stanford University, Palo Alto, USA
Piero Carninci Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan Human Technopole, Milano, Italy
Nancy D. Denslow Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA Center for Environmental and Human Toxicology, Department of Physiological Sciences,, University of Florida, Gainesville, USA
Roderic Guigó Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
Margaret E. Hunter U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, USA
Hagen U. Tilgner Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, USA
Barbara J. Wold Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
Christopher Vollmers Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
Adam Frankish European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Kin Fai Au Department of Biomedical Informatics, The Ohio State University, Columbus, USA Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
Gloria M. Sheynkman Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA Center for Public Health Genomics UVA Cancer Center, University of Virginia, Charlottesville, USA
Ali Mortazavi Developmental and Cell Biology, University of California, Irvine, Irvine, USA Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
Ana Conesa Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, USA
Angela N. Brooks UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA

Collapse

Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019;20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open

Abstract

Background

The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative.

Results

Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities.

Conclusions

In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.

Collapse

Ye X, Tang X, Wang X, Che J, Wu M, Liang J, Ye L, Qian Q, Li J, You Z, Zhang Y, Wang S, Zhong B. Improving Silkworm Genome Annotation Using a Proteogenomics Approach. J Proteome Res 2019;18:3009-3019. [PMID: 31250652 DOI: 10.1021/acs.jproteome.8b00965] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Yi F, Jia Z, Xiao Y, Ma W, Wang J. SPTEdb: a database for transposable elements in salicaceous plants. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:4925802. [PMID: 29688371 PMCID: PMC5846285 DOI: 10.1093/database/bay024] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Accepted: 02/12/2018] [Indexed: 01/10/2023]

Reid I. Evaluating Programs for Predicting Genes and Transcripts with RNA-Seq Support in Fungal Genomes. Methods Mol Biol 2018;1775:209-227. [PMID: 29876820 DOI: 10.1007/978-1-4939-7804-5_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Next Generation Sequencing Data and Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;926:11-19. [DOI: 10.1007/978-3-319-42316-6_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Buisine N, Ruan X, Bilesimo P, Grimaldi A, Alfama G, Ariyaratne P, Mulawadi F, Chen J, Sung WK, Liu ET, Demeneix BA, Ruan Y, Sachs LM. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis. PLoS One 2015;10:e0137526. [PMID: 26348928 PMCID: PMC4562602 DOI: 10.1371/journal.pone.0137526] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 08/19/2015] [Indexed: 12/11/2022] Open

Reid I, O’Toole N, Zabaneh O, Nourzadeh R, Dahdouli M, Abdellateef M, Gordon PMK, Soh J, Butler G, Sensen CW, Tsang A. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models. BMC Bioinformatics 2014;15:229. [PMID: 24980894 PMCID: PMC4084796 DOI: 10.1186/1471-2105-15-229] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Accepted: 06/17/2014] [Indexed: 12/02/2022] Open

Ashkenazi S, Snir R, Ofran Y. Assessing the relationship between conservation of function and conservation of sequence using photosynthetic proteins. ACTA ACUST UNITED AC 2012;28:3203-10. [PMID: 23080118 DOI: 10.1093/bioinformatics/bts608] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Xu HE, Zhang HH, Han MJ, Shen YH, Huang XZ, Xiang ZH, Zhang Z. [Computational approaches for identification and classification of transposable elements in eukaryotic genomes]. YI CHUAN = HEREDITAS 2012;34:1009-1019. [PMID: 22917906 DOI: 10.3724/sp.j.1005.2012.01009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Alioto T. Gene prediction. Methods Mol Biol 2012;855:175-201. [PMID: 22407709 DOI: 10.1007/978-1-61779-582-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Grassa CJ, Kulathinal RJ. Elevated Evolutionary Rates among Functionally Diverged Reproductive Genes across Deep Vertebrate Lineages. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2011;2011:274975. [PMID: 21811675 PMCID: PMC3147129 DOI: 10.4061/2011/274975] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Revised: 05/17/2011] [Accepted: 05/23/2011] [Indexed: 11/24/2022]

Renuse S, Chaerkady R, Pandey A. Proteogenomics. Proteomics 2011;11:620-30. [DOI: 10.1002/pmic.201000615] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 11/14/2010] [Accepted: 11/16/2010] [Indexed: 12/13/2022]

Orthopoxvirus genome evolution: the role of gene loss. Viruses 2010;2:1933-1967. [PMID: 21994715 PMCID: PMC3185746 DOI: 10.3390/v2091933] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 08/25/2010] [Accepted: 09/01/2010] [Indexed: 12/26/2022] Open

Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K. A standard variation file format for human genome sequences. Genome Biol 2010;11:R88. [PMID: 20796305 PMCID: PMC2945790 DOI: 10.1186/gb-2010-11-8-r88] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 07/26/2010] [Accepted: 08/26/2010] [Indexed: 12/03/2022] Open

Bodian DL, Klein TE. COLdb, a database linking genetic data to molecular function in fibrillar collagens. Hum Mutat 2009;30:946-51. [PMID: 19370761 DOI: 10.1002/humu.20978] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Schweikert G, Zien A, Zeller G, Behr J, Dieterich C, Ong CS, Philips P, De Bona F, Hartmann L, Bohlen A, Krüger N, Sonnenburg S, Rätsch G. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res 2009;19:2133-43. [PMID: 19564452 DOI: 10.1101/gr.090597.108] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Liang C, Mao L, Ware D, Stein L. Evidence-based gene predictions in plant genomes. Genome Res 2009;19:1912-23. [PMID: 19541913 DOI: 10.1101/gr.088997.108] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Commins J, Toft C, Fares MA. Computational biology methods and their application to the comparative genomics of endocellular symbiotic bacteria of insects. Biol Proced Online 2009;11:52-78. [PMID: 19495914 PMCID: PMC3055744 DOI: 10.1007/s12575-009-9004-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 02/17/2009] [Indexed: 12/02/2022] Open

Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 2009;10:67. [PMID: 19236712 PMCID: PMC2653490 DOI: 10.1186/1471-2105-10-67] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Accepted: 02/23/2009] [Indexed: 11/22/2022] Open

Blanco E, Abril JF. Computational gene annotation in new genome assemblies using GeneID. Methods Mol Biol 2009;537:243-61. [PMID: 19378148 DOI: 10.1007/978-1-59745-251-9_12] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Coghlan A, Fiedler TJ, McKay SJ, Flicek P, Harris TW, Blasiar D, Stein LD. nGASP--the nematode genome annotation assessment project. BMC Bioinformatics 2008;9:549. [PMID: 19099578 PMCID: PMC2651883 DOI: 10.1186/1471-2105-9-549] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2008] [Accepted: 12/19/2008] [Indexed: 11/15/2022] Open

Abstract

Background

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase.

Results

The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders.

Conclusion

This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.

Collapse

Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 2008;19:294-305. [PMID: 19015323 DOI: 10.1101/gr.083311.108] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res 2008;18:1979-90. [PMID: 18757608 DOI: 10.1101/gr.081612.108] [Citation(s) in RCA: 638] [Impact Index Per Article: 39.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Advances in the sequencing of the genome of the adenophorean nematode Trichinella spiralis. Parasitology 2008;135:869-80. [PMID: 18598573 DOI: 10.1017/s0031182008004472] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Zou J, Hallen MA, Yankel CD, Endow SA. A microtubule-destabilizing kinesin motor regulates spindle length and anchoring in oocytes. ACTA ACUST UNITED AC 2008;180:459-66. [PMID: 18250200 PMCID: PMC2234233 DOI: 10.1083/jcb.200711031] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 2008;450:219-32. [PMID: 17994088 DOI: 10.1038/nature06340] [Citation(s) in RCA: 462] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2007] [Accepted: 10/04/2007] [Indexed: 12/25/2022]

Díaz-Pérez C, Cervantes C, Campos-García J, Julián-Sánchez A, Riveros-Rosas H. Phylogenetic analysis of the chromate ion transporter (CHR) superfamily. FEBS J 2007;274:6215-27. [DOI: 10.1111/j.1742-4658.2007.06141.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Bowser PRF, Tobe SS. Comparative genomic analysis of allatostatin-encoding (Ast) genes in Drosophila species and prediction of regulatory elements by phylogenetic footprinting. Peptides 2007;28:83-93. [PMID: 17175069 DOI: 10.1016/j.peptides.2006.08.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/27/2006] [Revised: 08/04/2006] [Accepted: 08/04/2006] [Indexed: 01/02/2023]

Christoffels A, Bartfai R, Srinivasan H, Komen H, Orban L. Comparative genomics in cyprinids: common carp ESTs help the annotation of the zebrafish genome. BMC Bioinformatics 2006;7 Suppl 5:S2. [PMID: 17254304 PMCID: PMC1764476 DOI: 10.1186/1471-2105-7-s5-s2] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Abstract

Background

Automatic annotation of sequenced eukaryotic genomes integrates a combination of methodologies such as ab-initio methods and alignment of homologous genes and/or proteins. For example, annotation of the zebrafish genome within Ensembl relies heavily on available cDNA and protein sequences from two distantly related fish species and other vertebrates that have diverged several hundred million years ago. The scarcity of genomic information from other cyprinids provides the impetus to leverage EST collections to understand gene structures in this diverse teleost group.

Results

We have generated 6,050 ESTs from the differentiating testis of common carp (Cyprinus carpio) and clustered them with 9,303 non-gonadal ESTs from CarpBase as well as 1,317 ESTs and 652 common carp mRNAs from GenBank. Over 28% of the resulting 8,663 unique transcripts are exclusively testis-derived ESTs. Moreover, 974 of these transcripts did not match any sequence in the zebrafish or fathead minnow EST collection.

A total of 1,843 unique common carp sequences could be stringently mapped to the zebrafish genome (version 5), of which 1,752 matched coding sequences of zebrafish genes with or without potential splice variants. We show that 91 common carp transcripts map to intergenic and intronic regions on the zebrafish genome assembly and regions annotated with non-teleost sequences. Interestingly, an additional 42 common carp transcripts indicate the potential presence of new splicing variants not found in zebrafish databases so far. The fact that common carp transcripts help the identification or confirmation of these coding regions in zebrafish exemplifies the usefulness of sequences from closely related species for the annotation of model genomes.

We also demonstrate that 5' UTR sequences of common carp and zebrafish orthologs share a significant level of similarity based on preservation of motif arrangements for as many as 10 ab-initio motifs.

Conclusion

Our data show that there is sufficient homology between the transcribed sequences of common carp and zebrafish to warrant an even deeper cyprinid transcriptome comparison. On the other hand, the comparative analysis illustrates the value in utilizing partially sequenced transcriptomes to understand gene structure in this diverse teleost group. We highlight the need for integrated resources to leverage the wealth of fragmented genomic data.

Collapse

Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase). BMC Genomics 2006;7:300. [PMID: 17134497 PMCID: PMC1684263 DOI: 10.1186/1471-2164-7-300] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2006] [Accepted: 11/29/2006] [Indexed: 11/10/2022] Open

Ohler U. Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res 2006;34:5943-50. [PMID: 17068082 PMCID: PMC1635271 DOI: 10.1093/nar/gkl608] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bandyopadhyay S, Sharan R, Ideker T. Systematic identification of functional orthologs based on protein network comparison. Genome Res 2006;16:428-35. [PMID: 16510899 PMCID: PMC1415213 DOI: 10.1101/gr.4526006] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Reese MG, Guigó R. EGASP: Introduction. Genome Biol 2006;7 Suppl 1:S1.1-3. [PMID: 16925831 PMCID: PMC1810546 DOI: 10.1186/gb-2006-7-s1-s1] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL. Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol 2006;7 Suppl 1:S3.1-13. [PMID: 16925837 PMCID: PMC1810552 DOI: 10.1186/gb-2006-7-s1-s3] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends.

RESULTS

The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions.

CONCLUSION

The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.

Collapse

Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 2006;7 Suppl 1:S2.1-31. [PMID: 16925836 PMCID: PMC1810551 DOI: 10.1186/gb-2006-7-s1-s2] [Citation(s) in RCA: 198] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Abstract

BACKGROUND

We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment.

RESULTS

The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.

CONCLUSION

This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.

Collapse

Affiliation(s)

Roderic Guigó Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain Member of the EGASP Organizing Committee
Paul Flicek European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Josep F Abril Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
Alexandre Reymond Center for Integrative Genomics, University of Lausanne, Switzerland
Julien Lagarde Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
France Denoeud Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
Stylianos Antonarakis University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland
Michael Ashburner Department of Genetics, University of Cambridge, Cambridge CB3 2EH, UK Member of the EGASP Advisory Board
Vladimir B Bajic South African National Bioinformatics Institute (SANBI), University of Western Cape, Bellville 7535, South Africa Member of the EGASP Advisory Board
Ewan Birney European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Member of the EGASP Organizing Committee
Robert Castelo Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
Eduardo Eyras Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica-Universitat Pompeu Fabra, E08003 Barcelona, Catalonia, Spain
Catherine Ucla University of Geneva Medical School and University Hospitals of Geneva, 1211 Geneva, Switzerland
Thomas R Gingeras Affymetrix Inc., Santa Clara, California 95051, USA Member of the EGASP Advisory Board
Jennifer Harrow Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Member of the EGASP Organizing Committee
Tim Hubbard Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Member of the EGASP Organizing Committee
Suzanna E Lewis Department of Molecular and Cellular Biology, University of California, Berkeley, California 94792, USA Member of the EGASP Advisory Board
Martin G Reese Omicia Inc., Christie Ave., Emeryville, California 94608, USA Member of the EGASP Advisory Board

Collapse

Moult J. Rigorous performance evaluation in protein structure modelling and implications for computational biology. Philos Trans R Soc Lond B Biol Sci 2006;361:453-8. [PMID: 16524833 PMCID: PMC1609338 DOI: 10.1098/rstb.2005.1810] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Li J, Riehle MM, Zhang Y, Xu J, Oduol F, Gomez SM, Eiglmeier K, Ueberheide BM, Shabanowitz J, Hunt DF, Ribeiro JMC, Vernick KD. Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms. Genome Biol 2006;7:R24. [PMID: 16569258 PMCID: PMC1557760 DOI: 10.1186/gb-2006-7-3-r24] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Revised: 01/19/2006] [Accepted: 02/23/2006] [Indexed: 11/10/2022] Open

Guigó R, Reese MG. EGASP: collaboration through competition to find human genes. Nat Methods 2005;2:575-7. [PMID: 16094379 DOI: 10.1038/nmeth0805-575] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

O'Neill B. Prices for Ingenuity. PLoS Biol 2005;3:e288. [PMID: 16089506 PMCID: PMC1187858 DOI: 10.1371/journal.pbio.0030288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Yu L, Haverty PM, Mariani J, Wang Y, Shen HY, Schwarzschild MA, Weng Z, Chen JF. Genetic and pharmacological inactivation of adenosine A2A receptor reveals an Egr-2-mediated transcriptional regulatory network in the mouse striatum. Physiol Genomics 2005;23:89-102. [PMID: 16046619 DOI: 10.1152/physiolgenomics.00068.2005] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Ren D, Nedialkov YA, Li F, Xu D, Reimers S, Finkelstein A, Burton ZF. Spacing requirements for simultaneous recognition of the adenovirus major late promoter TATAAAAG box and initiator element. Arch Biochem Biophys 2005;435:347-62. [PMID: 15708378 DOI: 10.1016/j.abb.2004.12.028] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2004] [Revised: 12/28/2004] [Indexed: 11/18/2022]

Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 2005;23:137-44. [PMID: 15637633 DOI: 10.1038/nbt1053] [Citation(s) in RCA: 691] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Blaschke C, Leon EA, Krallinger M, Valencia A. Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005;6 Suppl 1:S16. [PMID: 15960828 PMCID: PMC1869008 DOI: 10.1186/1471-2105-6-s1-s16] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Background

Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.

Results

The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of protein – GO term – article passage. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment.

Conclusion

Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.

Collapse

Szafranski K, Lehmann R, Parra G, Guigo R, Glöckner G. Gene organization features in A/T-rich organisms. J Mol Evol 2005;60:90-8. [PMID: 15696371 DOI: 10.1007/s00239-004-0201-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2003] [Accepted: 08/18/2004] [Indexed: 10/25/2022]

Martin RE, Henry RI, Abbey JL, Clements JD, Kirk K. The 'permeome' of the malaria parasite: an overview of the membrane transport proteins of Plasmodium falciparum. Genome Biol 2005;6:R26. [PMID: 15774027 PMCID: PMC1088945 DOI: 10.1186/gb-2005-6-3-r26] [Citation(s) in RCA: 129] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2004] [Revised: 12/31/2004] [Accepted: 01/28/2005] [Indexed: 11/24/2022] Open

Abstract

Bioinformatic and expression analyses attribute putative functions to transporters and channels encoded by the Plasmodium falciparum genome. The malaria parasite has substantially more membrane transport proteins than previously thought.

Background

The uptake of nutrients, expulsion of metabolic wastes and maintenance of ion homeostasis by the intraerythrocytic malaria parasite is mediated by membrane transport proteins. Proteins of this type are also implicated in the phenomenon of antimalarial drug resistance. However, the initial annotation of the genome of the human malaria parasite Plasmodium falciparum identified only a limited number of transporters, and no channels. In this study we have used a combination of bioinformatic approaches to identify and attribute putative functions to transporters and channels encoded by the malaria parasite, as well as comparing expression patterns for a subset of these.

Results

A computer program that searches a genome database on the basis of the hydropathy plots of the corresponding proteins was used to identify more than 100 transport proteins encoded by P. falciparum. These include all the transporters previously annotated as such, as well as a similar number of candidate transport proteins that had escaped detection. Detailed sequence analysis enabled the assignment of putative substrate specificities and/or transport mechanisms to all those putative transport proteins previously without. The newly-identified transport proteins include candidate transporters for a range of organic and inorganic nutrients (including sugars, amino acids, nucleosides and vitamins), and several putative ion channels. The stage-dependent expression of RNAs for 34 candidate transport proteins of particular interest are compared.

Conclusion

The malaria parasite possesses substantially more membrane transport proteins than was originally thought, and the analyses presented here provide a range of novel insights into the physiology of this important human pathogen.

Collapse

Nelson DR, Nebert DW. The truth about mouse, human, worms and yeast. Hum Genomics 2005;1:146-9. [PMID: 15601543 PMCID: PMC3525071 DOI: 10.1186/1479-7364-1-2-146] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Identification of true EST alignments and exon regions of gene sequences. ACTA ACUST UNITED AC 2004. [DOI: 10.1007/bf03183715] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]