Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rigoutsos I, Huynh T, Floratos A, Parida L, Platt D. Dictionary-driven protein annotation. Nucleic Acids Res 2002;30:3901-16. [PMID: 12202776 PMCID: PMC137405 DOI: 10.1093/nar/gkf464] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2002] [Revised: 06/04/2002] [Accepted: 06/04/2002] [Indexed: 11/14/2022] Open

For:	Rigoutsos I, Huynh T, Floratos A, Parida L, Platt D. Dictionary-driven protein annotation. Nucleic Acids Res 2002;30:3901-16. [PMID: 12202776 PMCID: PMC137405 DOI: 10.1093/nar/gkf464] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2002] [Revised: 06/04/2002] [Accepted: 06/04/2002] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Chantzi N, Mareboina M, Konnaris MA, Montgomery A, Patsakis M, Mouratidis I, Georgakopoulos-Soares I. The determinants of the rarity of nucleic and peptide short sequences in nature. NAR Genom Bioinform 2024;6:lqae029. [PMID: 38584871 PMCID: PMC10993293 DOI: 10.1093/nargab/lqae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/09/2024] Open

Bernard G, Chan CX, Chan YB, Chua XY, Cong Y, Hogan JM, Maetschke SR, Ragan MA. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform 2019;20:426-435. [PMID: 28673025 PMCID: PMC6433738 DOI: 10.1093/bib/bbx067] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 05/04/2017] [Indexed: 11/22/2022] Open

Dong Q, Wang K, Liu X. Identifying the missing proteins in human proteome by biological language model. BMC SYSTEMS BIOLOGY 2016;10:113. [PMID: 28155671 PMCID: PMC5259966 DOI: 10.1186/s12918-016-0352-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Cunial F, Apostolico A. Phylogeny Construction with Rigid Gapped Motifs. J Comput Biol 2012;19:911-27. [DOI: 10.1089/cmb.2012.0060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Rajasekaran S, Balla S, Gradie P, Gryk MR, Kadaveru K, Kundeti V, Maciejewski MW, Mi T, Rubino N, Vyas J, Schiller MR. Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Res 2009;37:D185-90. [PMID: 18978024 PMCID: PMC2686579 DOI: 10.1093/nar/gkn865] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 10/16/2008] [Indexed: 11/24/2022] Open

Affiliation(s)

Sanguthevar Rajasekaran Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Sudha Balla Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Patrick Gradie Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Michael R. Gryk Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Krishna Kadaveru Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Vamsi Kundeti Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Mark W. Maciejewski Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Tian Mi Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Nicholas Rubino Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Jay Vyas Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA
Martin R. Schiller Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, Department of Molecular, Microbial, and Structural Biology, Biological System Modeling Group, University of Connecticut Health Center, 263 Farmington Ave. Farmington, CT 06030-3305 and Memorial Sloan-Kettering Cancer Center, NY 10021, USA

Collapse

Wang A, Ren L, Abenes G, Hai R. Genome sequence divergences and functional variations in human cytomegalovirus strains. ACTA ACUST UNITED AC 2008;55:23-33. [PMID: 19076227 DOI: 10.1111/j.1574-695x.2008.00489.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Miranda KC, Huynh T, Tay Y, Ang YS, Tam WL, Thomson AM, Lim B, Rigoutsos I. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell 2006;126:1203-17. [PMID: 16990141 DOI: 10.1016/j.cell.2006.07.031] [Citation(s) in RCA: 1499] [Impact Index Per Article: 83.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Revised: 06/16/2006] [Accepted: 07/26/2006] [Indexed: 12/12/2022]

Thompson JD, Muller A, Waterhouse A, Procter J, Barton GJ, Plewniak F, Poch O. MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinformatics 2006;7:318. [PMID: 16792820 PMCID: PMC1539025 DOI: 10.1186/1471-2105-7-318] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 06/23/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family.

RESULTS

MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis.

CONCLUSION

MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at http://bips.u-strasbg.fr/MACSIMS/.

Collapse

Darzentas N, Rigoutsos I, Ouzounis CA. Sensitive detection of sequence similarity using combinatorial pattern discovery: A challenging study of two distantly related protein families. Proteins 2005;61:926-37. [PMID: 16224785 DOI: 10.1002/prot.20608] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Pal D, Eisenberg D. Inference of Protein Function from Protein Structure. Structure 2005;13:121-30. [PMID: 15642267 DOI: 10.1016/j.str.2004.10.015] [Citation(s) in RCA: 152] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2004] [Revised: 10/18/2004] [Accepted: 10/20/2004] [Indexed: 11/28/2022]

Lu X, Zhai C, Gopalakrishnan V, Buchanan BG. Automatic annotation of protein motif function with Gene Ontology terms. BMC Bioinformatics 2004;5:122. [PMID: 15345032 PMCID: PMC517493 DOI: 10.1186/1471-2105-5-122] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 09/02/2004] [Indexed: 11/15/2022] Open

Abstract

Background

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base.

Results

This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria.

Conclusions

In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.

Collapse

Huynh T, Rigoutsos I. The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update. Nucleic Acids Res 2004;32:W10-5. [PMID: 15215340 PMCID: PMC441505 DOI: 10.1093/nar/gkh367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Bahr U, Darai G. Re-evaluation and in silico annotation of the Tupaia herpesvirus proteins. Virus Genes 2004;28:99-120. [PMID: 14739655 DOI: 10.1023/b:viru.0000012267.97659.e0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

Herpesviruses represent an exceptionally suitable model to analyze evolutionary old pathogens, their competency to adapt to existing and changing molecular niches in host species, and the modulation of the gene content and function to comply with the requirements of life. The basis for numerous studies dealing with these questions are reliable statements about the gene content of herpesviral genomes and the functions of viral proteins. The recent determination of the coding strategy of the chimpanzee cytomegalovirus genome and the re-evaluation of the gene content of the human cytomegalovirus genome made it also necessary to restructure the putative transcription map of the Tupaia herpesvirus (THV) genome. Twenty-three THV-specific ORFs formerly predicted to be coding for viral proteins were deleted from the THV transcription map resulting in a gene layout that is now characterized by the presence of conserved genes in the genome center, that probably reflect the genome structure of common herpesviral ancestors, and species-specific genes at the termini. The conserved regions in the THV genome are characterized by high G + C contents between 60% and 80%, a high CpG dinucleotide frequency, and the presence of densely packed putative CpG islands. The genome termini seem to provide the requirements of large scale rearrangements and complements of the gene content to adapt to new environmental demands. With the help of the recently designed method of dictionary-driven, pattern-based protein annotation it was possible to assign putative functions to almost all potential THV proteins, e.g. 123 were found to be putative membrane or secreted proteins, putative signal domains were identified in 69, and 29 proteins were predicted to be glycosylated. The present study adds new aspects to the knowledge about the precise gene composition of herpesvirus genomes and viral protein functions that are of exceptional importance for studies dealing with the phylogeny, the evolution, vaccine vector development, virus-host interactions, pathogenesis and the determination of protein functions of herpesviruses.

Collapse

Rigoutsos I, Riek P, Graham RM, Novotny J. Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors. Nucleic Acids Res 2003;31:4625-31. [PMID: 12888523 PMCID: PMC169910 DOI: 10.1093/nar/gkg639] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ouzounis CA, Coulson RMR, Enright AJ, Kunin V, Pereira-Leal JB. Classification schemes for protein structure and function. Nat Rev Genet 2003;4:508-19. [PMID: 12838343 DOI: 10.1038/nrg1113] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Huynh T, Rigoutsos I, Parida L, Platt D, Shibuya T. The web server of IBM's Bioinformatics and Pattern Discovery group. Nucleic Acids Res 2003;31:3645-50. [PMID: 12824385 PMCID: PMC169027 DOI: 10.1093/nar/gkg621] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 04/08/2003] [Accepted: 04/08/2003] [Indexed: 11/12/2022] Open

Linial M. How incorrect annotations evolve--the case of short ORFs. Trends Biotechnol 2003;21:298-300. [PMID: 12837613 DOI: 10.1016/s0167-7799(03)00139-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Mondal S, Jaishankar SP, Ramakumar S. Role of context in the relationship between form and function: structural plasticity of some PROSITE patterns. Biochem Biophys Res Commun 2003;305:1078-84. [PMID: 12767941 DOI: 10.1016/s0006-291x(03)00882-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Rigoutsos I, Novotny J, Huynh T, Chin-Bow ST, Parida L, Platt D, Coleman D, Shenk T. In silico pattern-based analysis of the human cytomegalovirus genome. J Virol 2003;77:4326-44. [PMID: 12634390 PMCID: PMC150618 DOI: 10.1128/jvi.77.7.4326-4344.2003] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2002] [Accepted: 12/23/2002] [Indexed: 11/20/2022] Open