Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 2008;9:353. [PMID: 18752676 PMCID: PMC2542381 DOI: 10.1186/1471-2105-9-353] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 08/27/2008] [Indexed: 01/21/2023] Open

For:	Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L. Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 2008;9:353. [PMID: 18752676 PMCID: PMC2542381 DOI: 10.1186/1471-2105-9-353] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 08/27/2008] [Indexed: 01/21/2023] Open

Number

Cited by Other Article(s)

Khodji H, Collet P, Thompson JD, Jeannin-Girardon A. De-MISTED: Image-based classification of erroneous multiple sequence alignments using convolutional neural networks. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04390-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Bagheri H, Severin AJ, Rajan H. Detecting and correcting misclassified sequences in the large-scale public databases. Bioinformatics 2020;36:4699-4705. [PMID: 32579213 PMCID: PMC7821992 DOI: 10.1093/bioinformatics/btaa586] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/10/2020] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open

Soluri MF, Puccio S, Caredda G, Grillo G, Licciulli VF, Consiglio A, Edomi P, Santoro C, Sblattero D, Peano C. Interactome-Seq: A Protocol for Domainome Library Construction, Validation and Selection by Phage Display and Next Generation Sequencing. J Vis Exp 2018. [PMID: 30346377 DOI: 10.3791/56981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Bányai L, Kerekes K, Trexler M, Patthy L. Morphological Stasis and Proteome Innovation in Cephalochordates. Genes (Basel) 2018;9:genes9070353. [PMID: 30013013 PMCID: PMC6071037 DOI: 10.3390/genes9070353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 07/11/2018] [Accepted: 07/11/2018] [Indexed: 11/16/2022] Open

Stroehlein AJ, Young ND, Gasser RB. Improved strategy for the curation and classification of kinases, with broad applicability to other eukaryotic protein groups. Sci Rep 2018;8:6808. [PMID: 29717207 PMCID: PMC5931623 DOI: 10.1038/s41598-018-25020-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/12/2018] [Indexed: 12/20/2022] Open

VirusSeeker, a computational pipeline for virus discovery and virome composition analysis. Virology 2017;503:21-30. [PMID: 28110145 DOI: 10.1016/j.virol.2017.01.005] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Revised: 01/07/2017] [Accepted: 01/10/2017] [Indexed: 01/21/2023]

Gradnigo JS, Majumdar A, Norgren RB, Moriyama EN. Advantages of an Improved Rhesus Macaque Genome for Evolutionary Analyses. PLoS One 2016;11:e0167376. [PMID: 27911958 PMCID: PMC5135103 DOI: 10.1371/journal.pone.0167376] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 11/14/2016] [Indexed: 01/12/2023] Open

Leuthaeuser JB, Morris JH, Harper AF, Ferrin TE, Babbitt PC, Fetrow JS. DASP3: identification of protein sequences belonging to functionally relevant groups. BMC Bioinformatics 2016;17:458. [PMID: 27835946 PMCID: PMC5106842 DOI: 10.1186/s12859-016-1295-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 10/20/2016] [Indexed: 01/26/2023] Open

Abstract

Background

Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can’t keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP.

Results

The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2.

Conclusions

DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1295-z) contains supplementary material, which is available to authorized users.

Collapse

Putative extremely high rate of proteome innovation in lancelets might be explained by high rate of gene prediction errors. Sci Rep 2016;6:30700. [PMID: 27476717 PMCID: PMC4967905 DOI: 10.1038/srep30700] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 07/06/2016] [Indexed: 01/17/2023] Open

Meng X, Li C, Xiu C, Zhang J, Li J, Huang L, Zhang Y, Liu Z. Identification and Biochemical Properties of Two New Acetylcholinesterases in the Pond Wolf Spider (Pardosa pseudoannulata). PLoS One 2016;11:e0158011. [PMID: 27337188 PMCID: PMC4919072 DOI: 10.1371/journal.pone.0158011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 06/08/2016] [Indexed: 01/17/2023] Open

Abstract

Acetylcholinesterase (AChE), an important neurotransmitter hydrolase in both invertebrates and vertebrates, is targeted by organophosphorus and carbamate insecticides. In this study, two new AChEs were identified in the pond wolf spider Pardosa pseudoannulata, an important predatory natural enemy of several insect pests. In total, four AChEs were found in P. pseudoannulata (including two AChEs previously identified in our laboratory). The new putative AChEs PpAChE3 and PpAChE4 contain most of the common features of the AChE family, including cysteine residues, choline binding sites, the conserved sequence 'FGESAG' and conserved aromatic residues but with a catalytic triad of 'SDH' rather than 'SEH'. Recombinant enzymes expressed in Sf9 cells showed significant differences in biochemical properties compared to other AChEs, such as the optimal pH, substrate specificity, and catalytic efficiency. Among three test substrates, PpAChE1, PpAChE3 and PpAChE4 showed the highest catalytic efficiency (Vmax/KM) for ATC (acetylthiocholine iodide), with PpAChE3 exhibiting a clear preference for ATC based on the VmaxATC/VmaxBTC ratio. In addition, the four PpAChEs were more sensitive to the AChE-specific inhibitor BW284C51, which acts against ATC hydrolysis, than to the BChE-specific inhibitor ISO-OMPA, which acts against BTC hydrolysis, with at least a 8.5-fold difference in IC50 values for each PpAChE. PpAChE3, PpAChE4, and PpAChE1 were more sensitive than PpAChE2 to the tested Carb insecticides, and PpAChE3 was more sensitive than the other three AChEs to the tested OP insecticides. Based on all the results, two new functional AChEs were identified from P. pseudoannulata. The differences in AChE sequence between this spider and insects enrich our knowledge of invertebrate AChE diversity, and our findings will be helpful for understanding the selectivity of insecticides between insects and natural enemy spiders.

Collapse

Holliday GL, Bairoch A, Bagos PG, Chatonnet A, Craik DJ, Finn RD, Henrissat B, Landsman D, Manning G, Nagano N, O’Donovan C, Pruitt KD, Rawlings ND, Saier M, Sowdhamini R, Spedding M, Srinivasan N, Vriend G, Babbitt PC, Bateman A. Key challenges for the creation and maintenance of specialist protein resources. Proteins 2015;83:1005-13. [PMID: 25820941 PMCID: PMC4446195 DOI: 10.1002/prot.24803] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/06/2015] [Accepted: 03/20/2015] [Indexed: 11/12/2022]

Affiliation(s)

Gemma L Holliday Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
Amos Bairoch SIB—Swiss Institute of Bioinformatics, University of GenevaGeneva, Switzerland
Pantelis G Bagos Department of Computer Science and Biomedical Informatics, University of ThessalyLamia, 35100, Greece
Arnaud Chatonnet INRA, Umr866 Dynamique Musculaire Et MétabolismeMontpellier, F-34000, France Université MontpellierMontpellier, F-34000, France
David J Craik Institute for Molecular Bioscience. The University of QueenslandBrisbane, Queensland, 4072, Australia
Robert D Finn European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
Bernard Henrissat Architecture Et Fonction Des Macromolécules Biologiques, CNRS, Aix-Marseille UniversitéMarseille, 13288, France Department of Biological Sciences, King Abdulaziz UniversityJeddah, Saudi Arabia
David Landsman National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
Gerard Manning Department of Bioinformatics & Computational Biology, Genentech1 DNA Way, South San Francisco, California, 98010
Nozomi Nagano Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyo, 135-0064, Japan
Claire O’Donovan European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
Kim D Pruitt National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
Neil D Rawlings European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom Wellcome Trust Sanger InstituteWellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
Milton Saier Department of Molecular Biology, University of California at San DiegoLa Jolla, California, 92093
Ramanathan Sowdhamini National Centre for Biological Sciences, TIFRGKVK Campus, Bellary Road, Bangalore, 560065, India
Michael Spedding Chair NC-IUPHAR, Spedding Research Solutions SARL6 Rue Ampere, Le Vesinet, 78110, France
Narayanaswamy Srinivasan Molecular Biophysics Unit, Indian Institute of ScienceBangalore, 560012, India
Gert Vriend Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Center, Geert Grooteplein Zuid 26-28, 6525 GANijmegen, The Netherlands
Patricia C Babbitt Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
Alex Bateman European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom

Collapse

Triant DA, Pearson WR. Most partial domains in proteins are alignment and annotation artifacts. Genome Biol 2015;16:99. [PMID: 25976240 PMCID: PMC4443539 DOI: 10.1186/s13059-015-0656-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 12/19/2022] Open

Expression and functional activity of neurotransmitter system components in sea urchins' early development. ZYGOTE 2015;24:206-18. [PMID: 25920999 DOI: 10.1017/s0967199415000040] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Yu JF, Guo J, Liu QB, Hou Y, Xiao K, Chen QL, Wang JH, Sun X. A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome. Genes Genomics 2015. [DOI: 10.1007/s13258-014-0263-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Chowanadisai W. Comparative genomic analysis of slc39a12/ZIP12: insight into a zinc transporter required for vertebrate nervous system development. PLoS One 2014;9:e111535. [PMID: 25375179 PMCID: PMC4222902 DOI: 10.1371/journal.pone.0111535] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 10/04/2014] [Indexed: 01/23/2023] Open

Abstract

The zinc transporter ZIP12, which is encoded by the gene slc39a12, has previously been shown to be important for neuronal differentiation in mouse Neuro-2a neuroblastoma cells and primary mouse neurons and necessary for neurulation during Xenopus tropicalis embryogenesis. However, relatively little is known about the biochemical properties, cellular regulation, or the physiological role of this gene. The hypothesis that ZIP12 is a zinc transporter important for nervous system function and development guided a comparative genetics approach to uncover the presence of ZIP12 in various genomes and identify conserved sequences and expression patterns associated with ZIP12. Ortholog detection of slc39a12 was conducted with reciprocal BLAST hits with the amino acid sequence of human ZIP12 in comparison to the human paralog ZIP4 and conserved local synteny between genomes. ZIP12 is present in the genomes of almost all vertebrates examined, from humans and other mammals to most teleost fish. However, ZIP12 appears to be absent from the zebrafish genome. The discrimination of ZIP12 compared to ZIP4 was unsuccessful or inconclusive in other invertebrate chordates and deuterostomes. Splice variation, due to the inclusion or exclusion of a conserved exon, is present in humans, rats, and cows and likely has biological significance. ZIP12 also possesses many putative di-leucine and tyrosine motifs often associated with intracellular trafficking, which may control cellular zinc uptake activity through the localization of ZIP12 within the cell. These findings highlight multiple aspects of ZIP12 at the biochemical, cellular, and physiological levels with likely biological significance. ZIP12 appears to have conserved function as a zinc uptake transporter in vertebrate nervous system development. Consequently, the role of ZIP12 may be an important link to reported congenital malformations in numerous animal models and humans that are caused by zinc deficiency.

Collapse

Zimin AV, Cornish AS, Maudhoo MD, Gibbs RM, Zhang X, Pandey S, Meehan DT, Wipfler K, Bosinger SE, Johnson ZP, Tharp GK, Marçais G, Roberts M, Ferguson B, Fox HS, Treangen T, Salzberg SL, Yorke JA, Norgren RB. A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biol Direct 2014;9:20. [PMID: 25319552 PMCID: PMC4214606 DOI: 10.1186/1745-6150-9-20] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 10/03/2014] [Indexed: 12/13/2022] Open

Abstract

Background

The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.

Results

We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.

Conclusions

The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.

Reviewers

This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

Collapse

Yoder AD, Chan LM, dos Reis M, Larsen PA, Campbell CR, Rasoloarison R, Barrett M, Roos C, Kappeler P, Bielawski J, Yang Z. Molecular evolutionary characterization of a V1R subfamily unique to strepsirrhine primates. Genome Biol Evol 2014;6:213-27. [PMID: 24398377 PMCID: PMC3914689 DOI: 10.1093/gbe/evu006] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Gotoh O, Morita M, Nelson DR. Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment. BMC Bioinformatics 2014;15:189. [PMID: 24927652 PMCID: PMC4065584 DOI: 10.1186/1471-2105-15-189] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 06/09/2014] [Indexed: 03/29/2024] Open

-Biao Guo F, Lin Y, -Ling Chen L. Recognition of Protein-coding Genes Based on Z-curve Algorithms. Curr Genomics 2014;15:95-103. [PMID: 24822027 PMCID: PMC4009845 DOI: 10.2174/1389202915999140328162724] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Revised: 11/19/2013] [Accepted: 11/20/2013] [Indexed: 01/18/2023] Open

Khenoussi W, Vanhoutrève R, Poch O, Thompson JD. SIBIS: a Bayesian model for inconsistent protein sequence estimation. Bioinformatics 2014;30:2432-9. [PMID: 24825613 DOI: 10.1093/bioinformatics/btu329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Nagy A, Patthy L. FixPred: a resource for correction of erroneous protein sequences. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014;2014:bau032. [PMID: 24705206 PMCID: PMC3975993 DOI: 10.1093/database/bau032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Nagy A, Patthy L. MisPred: a resource for identification of erroneous protein sequences in public databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat053. [PMID: 23864220 PMCID: PMC3713709 DOI: 10.1093/database/bat053] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Light S, Elofsson A. The impact of splicing on protein domain architecture. Curr Opin Struct Biol 2013;23:451-8. [PMID: 23562110 DOI: 10.1016/j.sbi.2013.02.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 02/22/2013] [Accepted: 02/28/2013] [Indexed: 10/27/2022]

Abrusán G, Szilágyi A, Zhang Y, Papp B. Turning gold into 'junk': transposable elements utilize central proteins of cellular networks. Nucleic Acids Res 2013;41:3190-200. [PMID: 23341038 PMCID: PMC3597677 DOI: 10.1093/nar/gkt011] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Norgren RB. Improving genome assemblies and annotations for nonhuman primates. ILAR J 2013;54:144-53. [PMID: 24174438 PMCID: PMC3814395 DOI: 10.1093/ilar/ilt037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Doolittle RF, McNamara K, Lin K. Correlating structure and function during the evolution of fibrinogen-related domains. Protein Sci 2012;21:1808-23. [PMID: 23076991 PMCID: PMC3575912 DOI: 10.1002/pro.2177] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 10/04/2012] [Accepted: 10/05/2012] [Indexed: 12/29/2022]

Wang Q, Lei Y, Xu X, Wang G, Chen LL. Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58. PLoS One 2012;7:e43176. [PMID: 22984411 PMCID: PMC3439454 DOI: 10.1371/journal.pone.0043176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2012] [Accepted: 07/18/2012] [Indexed: 11/19/2022] Open

Zhang X, Goodsell J, Norgren RB. Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics 2012;13:206. [PMID: 22646658 PMCID: PMC3426473 DOI: 10.1186/1471-2164-13-206] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 05/30/2012] [Indexed: 11/30/2022] Open

Guo B, Zou M, Wagner A. Pervasive indels and their evolutionary dynamics after the fish-specific genome duplication. Mol Biol Evol 2012;29:3005-22. [PMID: 22490820 DOI: 10.1093/molbev/mss108] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Prosdocimi F, Linard B, Pontarotti P, Poch O, Thompson JD. Controversies in modern evolutionary biology: the imperative for error detection and quality control. BMC Genomics 2012;13:5. [PMID: 22217008 PMCID: PMC3311146 DOI: 10.1186/1471-2164-13-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/04/2012] [Indexed: 12/03/2022] Open

Polymorphisms in Ly6 genes in Msq1 encoding susceptibility to mouse adenovirus type 1. Mamm Genome 2011;23:250-8. [PMID: 22101863 DOI: 10.1007/s00335-011-9368-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 10/20/2011] [Indexed: 12/17/2022]

Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011;18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Doolittle RF. The protochordate Ciona intestinalis has a protein like full-length vertebrate fibrinogen. J Innate Immun 2011;4:219-22. [PMID: 21860218 DOI: 10.1159/000329823] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 05/31/2011] [Indexed: 11/19/2022] Open

Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes (Basel) 2011;2:449-501. [PMID: 24710207 PMCID: PMC3927609 DOI: 10.3390/genes2030449] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 06/14/2011] [Accepted: 06/20/2011] [Indexed: 11/17/2022] Open

Abstract

In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].

Collapse

D'Angelo S, Velappan N, Mignone F, Santoro C, Sblattero D, Kiss C, Bradbury ARM. Filtering "genic" open reading frames from genomic DNA samples for advanced annotation. BMC Genomics 2011;12 Suppl 1:S5. [PMID: 21810207 PMCID: PMC3223728 DOI: 10.1186/1471-2164-12-s1-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

In order to carry out experimental gene annotation, DNA encoding open reading frames (ORFs) derived from real genes (termed "genic") in the correct frame is required. When genes are correctly assigned, isolation of genic DNA for functional annotation can be carried out by PCR. However, not all genes are correctly assigned, and even when correctly assigned, gene products are often incorrectly folded when expressed in heterologous hosts. This is a problem that can sometimes be overcome by the expression of protein fragments encoding domains, rather than full-length proteins. One possible method to isolate DNA encoding such domains would to "filter" complex DNA (cDNA libraries, genomic and metagenomic DNA) for gene fragments that confer a selectable phenotype relying on correct folding, with all such domains present in a complex DNA sample, termed the “domainome”.

Results

In this paper we discuss the preparation of diverse genic ORF libraries from randomly fragmented genomic DNA using ß-lactamase to filter out the open reading frames. By cloning DNA fragments between leader sequences and the mature ß-lactamase gene, colonies can be selected for resistance to ampicillin, conferred by correct folding of the lactamase gene. Our experiments demonstrate that the majority of surviving colonies contain genic open reading frames, suggesting that ß-lactamase is acting as a selectable folding reporter. Furthermore, different leaders (Sec, TAT and SRP), normally translocating different protein classes, filter different genic fragment subsets, indicating that their use increases the fraction of the “domainone” that is accessible.

Conclusions

The availability of ORF libraries, obtained with the filtering method described here, combined with screening methods such as phage display and protein-protein interaction studies, or with protein structure determination projects, can lead to the identification and structural determination of functional genic ORFs. ORF libraries represent, moreover, a useful tool to proceed towards high-throughput functional annotation of newly sequenced genomes.

Collapse

Williams GW, Davis PA, Rogers AS, Bieri T, Ozersky P, Spieth J. Methods and strategies for gene structure curation in WormBase. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011;2011:baq039. [PMID: 21543339 PMCID: PMC3092607 DOI: 10.1093/database/baq039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Hegyi H, Kalmar L, Horvath T, Tompa P. Verification of alternative splicing variants based on domain integrity, truncation length and intrinsic protein disorder. Nucleic Acids Res 2010;39:1208-19. [PMID: 20972208 PMCID: PMC3045584 DOI: 10.1093/nar/gkq843] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Assigning biological functions to rice genes by genome annotation, expression analysis and mutagenesis. Biotechnol Lett 2010;32:1753-63. [PMID: 20703802 DOI: 10.1007/s10529-010-0377-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 07/28/2010] [Indexed: 12/17/2022]

Temeyer KB, Pruett JH, Olafson PU. Baculovirus expression, biochemical characterization and organophosphate sensitivity of rBmAChE1, rBmAChE2, and rBmAChE3 of Rhipicephalus (Boophilus) microplus. Vet Parasitol 2010;172:114-21. [DOI: 10.1016/j.vetpar.2010.04.016] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2010] [Revised: 04/08/2010] [Accepted: 04/09/2010] [Indexed: 01/31/2023]

Poptsova MS, Gogarten JP. Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology (Reading) 2010;156:1909-1917. [DOI: 10.1099/mic.0.033811-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Rouchka EC. Database of exact tandem repeats in the Zebrafish genome. BMC Genomics 2010;11:347. [PMID: 20515480 PMCID: PMC2901318 DOI: 10.1186/1471-2164-11-347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Accepted: 06/01/2010] [Indexed: 11/23/2022] Open

Bányai L, Sonderegger P, Patthy L. Agrin binds BMP2, BMP4 and TGFbeta1. PLoS One 2010;5:e10758. [PMID: 20505824 PMCID: PMC2874008 DOI: 10.1371/journal.pone.0010758] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2009] [Accepted: 05/03/2010] [Indexed: 01/13/2023] Open

GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 2010;7:455-7. [PMID: 20436475 DOI: 10.1038/nmeth.1457] [Citation(s) in RCA: 450] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2010] [Accepted: 03/26/2010] [Indexed: 11/09/2022]

Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F. CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources. BMC Microbiol 2010;10:88. [PMID: 20331850 PMCID: PMC2850352 DOI: 10.1186/1471-2180-10-88] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 03/23/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes.

DESCRIPTION

The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays.

CONCLUSIONS

With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten.

Collapse

Eisenhaber B, Eisenhaber F. Prediction of posttranslational modification of proteins from their amino acid sequence. Methods Mol Biol 2010;609:365-84. [PMID: 20221930 DOI: 10.1007/978-1-60327-241-4_21] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Kim W, Silby MW, Purvine SO, Nicoll JS, Hixson KK, Monroe M, Nicora CD, Lipton MS, Levy SB. Proteomic detection of non-annotated protein-coding genes in Pseudomonas fluorescens Pf0-1. PLoS One 2009;4:e8455. [PMID: 20041161 PMCID: PMC2794547 DOI: 10.1371/journal.pone.0008455] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 12/02/2009] [Indexed: 11/18/2022] Open

Yang Y, Gilbert D, Kim S. Annotation confidence score for genome annotation: a genome comparison approach. ACTA ACUST UNITED AC 2009;26:22-9. [PMID: 19855104 DOI: 10.1093/bioinformatics/btp613] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Vallender EJ. Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. Methods 2009;49:50-5. [PMID: 19467333 PMCID: PMC2732758 DOI: 10.1016/j.ymeth.2009.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Revised: 04/27/2009] [Accepted: 05/18/2009] [Indexed: 01/26/2023] Open

Alternative splicing of transcription factors' genes: beyond the increase of proteome diversity. Comp Funct Genomics 2009:905894. [PMID: 19609452 PMCID: PMC2709715 DOI: 10.1155/2009/905894] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2008] [Revised: 04/06/2009] [Accepted: 05/18/2009] [Indexed: 11/29/2022] Open

Tipney HJ, Schuyler RP, Hunter L. Consistent visualizations of changing knowledge. SUMMIT ON TRANSLATIONAL BIOINFORMATICS 2009;2009:129-32. [PMID: 21347184 PMCID: PMC3041575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]