Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

159
(from Reference Citation Analysis)

Article PDFs (50)

Cited by ≥ 1 (108)

Searched Name

Proteome/classification

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Journal Articles

Number	Citation Analysis
101	Li S, Wu L, Zhang Z. Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach. Bioinformatics 2006;22:2143-50. [PMID: 16820422 DOI: 10.1093/bioinformatics/btl363] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract MOTIVATION Network reconstruction of biological entities is very important for understanding biological processes and the organizational principles of biological systems. This work focuses on integrating both the literatures and microarray gene-expression data, and a combined literature mining and microarray analysis (LMMA) approach is developed to construct gene networks of a specific biological system. RESULTS In the LMMA approach, a global network is first constructed using the literature-based co-occurrence method. It is then refined using microarray data through a multivariate selection procedure. An application of LMMA to the angiogenesis is presented. Our result shows that the LMMA-based network is more reliable than the co-occurrence-based network in dealing with multiple levels of KEGG gene, KEGG Orthology and pathway. AVAILABILITY The LMMA program is available upon request. Collapse Key Words Collapse MESH Headings Abstracting and Indexing/methods Database Management Systems Databases, Protein Gene Expression Profiling/methods Information Storage and Retrieval/methods MEDLINE Natural Language Processing Oligonucleotide Array Sequence Analysis/methods Periodicals as Topic Proteome/classification Proteome/genetics Proteome/metabolism Signal Transduction/physiology Systems Integration Collapse Grants Collapse
102	Malik R, Franke L, Siebes A. Combination of text-mining algorithms increases the performance. ACTA ACUST UNITED AC 2006;22:2151-7. [PMID: 16766558 DOI: 10.1093/bioinformatics/btl281] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Abstract MOTIVATION Recently, several information extraction systems have been developed to retrieve relevant information out of biomedical text. However, these methods represent individual efforts. In this paper, we show that by combining different algorithms and their outcome, the results improve significantly. For this reason, CONAN has been created, a system which combines different programs and their outcome. Its methods include tagging of gene/protein names, finding interaction and mutation data, tagging of biological concepts and linking to MeSH and Gene Ontology terms. RESULTS In this paper, we will present data that show that combining different text-mining algorithms significantly improves the results. Not only is CONAN a full-scale approach that will ultimately cover all of PubMed/MEDLINE, we also show that this universality has no effect on quality: our system performs as well as or better than existing systems. AVAILABILITY The LDD corpus presented is available by request to the author. The system will be available shortly. For information and updates on CONAN please visit http://www.cs.uu.nl/people/rainer/conan.html. Collapse Key Words Collapse MESH Headings Abstracting and Indexing/methods Database Management Systems Databases, Protein Information Storage and Retrieval/methods MEDLINE Natural Language Processing Periodicals as Topic Proteome/classification Proteome/genetics Proteome/metabolism Systems Integration Collapse Grants Collapse
103	McGuffin LJ, Smith RT, Bryson K, Sørensen SA, Jones DT. High throughput profile-profile based fold recognition for the entire human proteome. BMC Bioinformatics 2006;7:288. [PMID: 16759376 PMCID: PMC1513610 DOI: 10.1186/1471-2105-7-288] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2006] [Accepted: 06/07/2006] [Indexed: 11/10/2022] Open Abstract BACKGROUND In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. RESULTS We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. CONCLUSION This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Artificial Intelligence Humans Molecular Sequence Data Pattern Recognition, Automated/methods Protein Folding Proteome/chemistry Proteome/classification Sequence Alignment/methods Sequence Analysis, Protein Collapse Grants BEP17014 Biotechnology and Biological Sciences Research Council Collapse
104	Stein KK, Go JC, Lane WS, Primakoff P, Myles DG. Proteomic analysis of sperm regions that mediate sperm-egg interactions. Proteomics 2006;6:3533-43. [PMID: 16758446 DOI: 10.1002/pmic.200500845] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Abstract The sperm interacts with three oocyte-associated structures during fertilization: the cumulus cell layer surrounding the oocyte, the egg extracellular matrix (the zona pellucida), and the oocyte plasma membrane. Each of these interactions is mediated by the sperm head, probably through proteins both on the sperm surface and within the acrosome, a specialized secretory granule. In this study, we have used subcellular fractionation in order to generate a proteome of the sperm head subcellular compartments that interact with oocytes. Of the proteins we identified for which a gene knockout has been tested, a third have been shown to be essential for efficient reproduction in vivo. Many of the other presently untested proteins are likely to have a similarly important role. Twenty-five percent of the cell surface fraction proteins are previously uncharacterized. We have shown that at least two of these novel proteins are localized to the sperm head. In summary, we have identified over 100 proteins that are expressed on mature sperm at the site of sperm-oocyte interactions. Collapse Key Words Collapse MESH Headings Acrosome/chemistry Acrosome/metabolism Acrosome/ultrastructure Animals Biotinylation Male Mice Mice, Inbred C57BL Proteome/analysis Proteome/classification Proteomics/methods Reproducibility of Results Silver Staining Sperm Head/chemistry Sperm Head/metabolism Sperm Head/ultrastructure Sperm-Ovum Interactions Spermatozoa/chemistry Spermatozoa/metabolism Spermatozoa/physiology Spermatozoa/ultrastructure Subcellular Fractions/chemistry Subcellular Fractions/metabolism Collapse Grants R0-1 HD 16580 NICHD NIH HHS U54-29125 PHS HHS Collapse
105	García-Serna R, Opatowski L, Mestres J. FCP: functional coverage of the proteome by structures. Bioinformatics 2006;22:1792-3. [PMID: 16705012 DOI: 10.1093/bioinformatics/btl188] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Tools and resources for translating the remarkable growth witnessed in recent years in the number of protein structures determined experimentally into actual gain in the functional coverage of the proteome are becoming increasingly necessary. We introduce FCP, a publicly accessible web tool dedicated to analyzing the current state and trends of the population of structures within protein families. FCP offers both graphical and quantitative data on the degree of functional coverage of enzymes and nuclear receptors by existing structures, as well as on the bias observed in the distribution of structures along their respective functional classification schemes. AVAILABILITY http://cgl.imim.es/fcp CONTACT jmestres@imim.es. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Database Management Systems Databases, Protein Information Storage and Retrieval/methods Internet Molecular Sequence Data Proteome/chemistry Proteome/classification Proteome/metabolism Sequence Alignment/methods Sequence Analysis, Protein/methods Software Structure-Activity Relationship User-Computer Interface Collapse Grants Collapse
106	Luz H, Vingron M. Family specific rates of protein evolution. Bioinformatics 2006;22:1166-71. [PMID: 16510497 DOI: 10.1093/bioinformatics/btl073] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Amino acid changing mutations in proteins are contstrained by purifying selection and accumulate at different rates. We estimate evolutionary rates on multiple alignments of eukaryotic protein families in a maximum likelihood framework and spot sets of slow and fast evolving proteins. RESULTS We find that the evolution of indispensable proteins is constrained by selection and that protein secretion is coupled to an increased evolutionary rate. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Animals Computer Simulation Conserved Sequence Evolution, Molecular Humans Models, Genetic Molecular Sequence Data Proteome/classification Proteome/genetics Proteome/metabolism Sequence Alignment/methods Sequence Analysis, Protein/methods Sequence Homology, Amino Acid Species Specificity Collapse Grants Collapse
107	Palidwor G, Reynaud EG, Andrade-Navarro MA. Taxonomic colouring of phylogenetic trees of protein sequences. BMC Bioinformatics 2006;7:79. [PMID: 16503967 PMCID: PMC1386715 DOI: 10.1186/1471-2105-7-79] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 02/17/2006] [Indexed: 11/10/2022] Open Abstract Background Phylogenetic analyses of protein families are used to define the evolutionary relationships between homologous proteins. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to those sequences. However, there is no online tool to facilitate this interpretation, for example, by automatically attaching taxonomic information to the nodes of a tree, or by interactively colouring the branches of a tree according to any combination of taxonomic divisions. This is especially problematic if the tree contains on the order of hundreds of sequences, which, given the accelerated increase in the size of the protein sequence databases, is a situation that is becoming common. Results We have developed PhyloView, a web based tool for colouring phylogenetic trees upon arbitrary taxonomic properties of the species represented in a protein sequence phylogenetic tree. Provided that the tree contains SwissProt, SpTrembl, or GenBank protein identifiers, the tool retrieves the taxonomic information from the corresponding database. A colour picker displays a summary of the findings and allows the user to associate colours to the leaves of the tree according to any number of taxonomic partitions. Then, the colours are propagated to the branches of the tree. Conclusion PhyloView can be used at . A tutorial, the software with documentation, and GPL licensed source code, can be accessed at the same web address. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Color Evolution, Molecular Molecular Sequence Data Phylogeny Proteome/classification Proteome/genetics Sequence Analysis, Protein/methods Software User-Computer Interface Collapse Grants Collapse
108	Yang C, Zeng E, Li T, Narasimhan G. Clustering genes using gene expression and text literature data. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:329-40. [PMID: 16447990 DOI: 10.1109/csb.2005.23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Abstract Clustering of gene expression data is a standard technique used to identify closely related genes. In this paper, we develop a new clustering algorithm, MSC (Multi-Source Clustering), to perform exploratory analysis using two or more diverse sources of data. In particular, we investigate the problem of improving the clustering by integrating information obtained from gene expression data with knowledge extracted from biomedical text literature. In each iteration of algorithm MSC, an EM-type procedure is employed to bootstrap the model obtained from one data source by starting with the cluster assignments obtained in the previous iteration using the other data sources. Upon convergence, the two individual models are used to construct the final cluster assignment. We compare the results of algorithm MSC for two data sources with the results obtained when the clustering is applied on the two sources of data separately. We also compare it with that obtained using the feature level integration method that performs the clustering after simply concatenating the features obtained from the two data sources. We show that the z-scores of the clustering results from MSC are better than that from the other methods. To evaluate our clusters better, function enrichment results are presented using terms from the Gene Ontology database. Finally, by investigating the success of motif detection programs that use the clusters, we show that our approach integrating gene expression data and text data reveals clusters that are biologically more meaningful than those identified using gene expression data alone. Collapse Key Words Collapse MESH Headings Artificial Intelligence Cluster Analysis Gene Expression Profiling/methods Information Storage and Retrieval/methods Multigene Family/physiology Natural Language Processing Oligonucleotide Array Sequence Analysis/methods Periodicals as Topic Proteome/classification Proteome/metabolism Collapse Grants P01 DA15027-01 NIDA NIH HHS Collapse
109	Li H, Li J, Wong L. Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale. Bioinformatics 2006;22:989-96. [PMID: 16446278 DOI: 10.1093/bioinformatics/btl020] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining. RESULTS We found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches. AVAILABILITY http://research.i2r.a-star.edu.sg/BindingMotifPairs/resources. SUPPLEMENTARY INFORMATION http://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Motifs Binding Sites Databases, Protein Information Storage and Retrieval/methods Protein Binding Protein Interaction Mapping/methods Proteins/analysis Proteins/chemistry Proteome/chemistry Proteome/classification Sequence Analysis, Protein/methods Collapse Grants Collapse
110	Höglund A, Dönnes P, Blum T, Adolph HW, Kohlbacher O. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 2006;22:1158-65. [PMID: 16428265 DOI: 10.1093/bioinformatics/btl002] [Citation(s) in RCA: 213] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/ Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Motifs Amino Acid Sequence Artificial Intelligence Binding Sites Computer Simulation Models, Biological Models, Chemical Molecular Sequence Data Pattern Recognition, Automated Protein Binding Proteome/chemistry Proteome/classification Proteome/metabolism Sequence Analysis, Protein/methods Software Subcellular Fractions/chemistry Subcellular Fractions/metabolism Collapse Grants Collapse
111	Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 2006;34:D363-8. [PMID: 16381887 PMCID: PMC1347485 DOI: 10.1093/nar/gkj123] [Citation(s) in RCA: 647] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 10/20/2005] [Accepted: 10/20/2005] [Indexed: 11/12/2022] Open Abstract The OrthoMCL database (http://orthomcl.cbil.upenn.edu) houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. A total of 511,797 proteins (81.6% of the total dataset) were clustered into 70,388 ortholog groups. The ortholog database may be queried based on protein or group accession numbers, keyword descriptions or BLAST similarity. Ortholog groups exhibiting specific phyletic patterns may also be identified, using either a graphical interface or a text-based Phyletic Pattern Expression grammar. Information for ortholog groups includes the phyletic profile, the list of member proteins and a multiple sequence alignment, a statistical summary and graphical view of similarities, and a graphical representation of domain architecture. OrthoMCL software, the entire FASTA dataset employed and clustering results are available for download. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available. Collapse Key Words Collapse MESH Headings Animals Cluster Analysis Databases, Protein Genomics Internet Phylogeny Proteome/classification Proteome/genetics Sequence Homology, Amino Acid User-Computer Interface Collapse Grants HHSN266200400037C NIAID NIH HHS R01 AI058515 NIAID NIH HHS HHSN266200400037C AHRQ HHS R01-AI058515 NIAID NIH HHS Collapse
112	Pérès S, Beurton-Aimar M, Mazat JP. Pathway classification of TCA cycle. ACTA ACUST UNITED AC 2006;153:369-71. [PMID: 16986319 DOI: 10.1049/ip-syb:20060013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Abstract The structural analysis of large metabolic networks exhibits a combinatorial explosion of elementary modes. A new method of classification has been developed [called aggregation around common motif (ACoM)], which groups elementary modes into classes with similar substructures. This method is applied to the tricarboxylic acid cycle and metabolite carriers. The analysis of this network evidences a great number of elementary flux modes (204) despite the low number of reactions (23). The ACoM is used to class these elementary modes in a low number of sets (8) with biological meanings. Collapse Key Words Collapse MESH Headings Adaptation, Physiological/physiology Algorithms Animals Cell Physiological Phenomena Citric Acid Cycle/physiology Computer Simulation Feedback/physiology Homeostasis/physiology Humans Kinetics Models, Biological Proteome/classification Proteome/metabolism Signal Transduction/physiology Collapse Grants Collapse
113	Zhu J, Chen S, Alvarez S, Asirvatham VS, Schachtman DP, Wu Y, Sharp RE. Cell wall proteome in the maize primary root elongation zone. I. Extraction and identification of water-soluble and lightly ionically bound proteins. PLANT PHYSIOLOGY 2006;140:311-25. [PMID: 16377746 PMCID: PMC1326053 DOI: 10.1104/pp.105.070219] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2005] [Revised: 10/04/2005] [Accepted: 11/07/2005] [Indexed: 05/05/2023] Abstract Cell wall proteins (CWPs) play important roles in various processes, including cell elongation. However, relatively little is known about the composition of CWPs in growing regions. We are using a proteomics approach to gain a comprehensive understanding of the identity of CWPs in the maize (Zea mays) primary root elongation zone. As the first step, we examined the effectiveness of a vacuum infiltration-centrifugation technique for extracting water-soluble and loosely ionically bound (fraction 1) CWPs from the root elongation zone. The purity of the CWP extract was evaluated by comparing with total soluble proteins extracted from homogenized tissue. Several lines of evidence indicated that the vacuum infiltration-centrifugation technique effectively enriched for CWPs. Protein identification revealed that 84% of the CWPs were different from the total soluble proteins. About 40% of the fraction 1 CWPs had traditional signal peptides and 33% were predicted to be nonclassical secretory proteins, whereas only 3% and 11%, respectively, of the total soluble proteins were in these categories. Many of the CWPs have previously been shown to be involved in cell wall metabolism and cell elongation. In addition, maize has type II cell walls, and several of the CWPs identified in this study have not been identified in previous cell wall proteomics studies that have focused only on type I walls. These proteins include endo-1,3;1,4-beta-D-glucanase and alpha-L-arabinofuranosidase, which act on the major polysaccharides only or mainly present in type II cell walls. Collapse Key Words Collapse MESH Headings Cell Enlargement Cell Wall/chemistry Cell Wall/classification Electrophoresis, Gel, Two-Dimensional Mass Spectrometry Plant Proteins/chemistry Plant Proteins/classification Plant Proteins/isolation & purification Plant Roots/chemistry Plant Roots/growth & development Plant Roots/ultrastructure Proteome/chemistry Proteome/classification Proteome/isolation & purification Proteomics/methods Solubility Zea mays/chemistry Zea mays/growth & development Zea mays/ultrastructure Collapse Grants Collapse
114	Persico M, Ceol A, Gavrila C, Hoffmann R, Florio A, Cesareni G. HomoMINT: an inferred human network based on orthology mapping of protein interactions discovered in model organisms. BMC Bioinformatics 2005;6 Suppl 4:S21. [PMID: 16351748 PMCID: PMC1866386 DOI: 10.1186/1471-2105-6-s4-s21] [Citation(s) in RCA: 108] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open Abstract BACKGROUND The application of high throughput approaches to the identification of protein interactions has offered for the first time a glimpse of the global interactome of some model organisms. Until now, however, such genome-wide approaches have not been applied to the human proteome. RESULTS In order to fill this gap we have assembled an inferred human protein interaction network where interactions discovered in model organisms are mapped onto the corresponding human orthologs. In addition to a stringent assignment to orthology classes based on the InParanoid algorithm, we have implemented a string matching algorithm to filter out orthology assignments of proteins whose global domain organization is not conserved. Finally, we have assessed the accuracy of our own, and related, inferred networks by benchmarking them against i) an assembled experimental interactome, ii) a network derived by mining of the scientific literature and iii) by measuring the enrichment of interacting protein pairs sharing common Gene Ontology annotation. CONCLUSION The resulting networks are named HomoMINT and HomoMINT_filtered, the latter being based on the orthology table filtered by the domain architecture matching algorithm. They contains 9749 and 5203 interactions respectively and can be analyzed and viewed in the context of the experimentally verified interactions between human proteins stored in the MINT database. HomoMINT is constantly updated to take into account the growing information in the MINT database. Collapse Key Words Collapse MESH Headings Algorithms Computational Biology/methods Genome Humans Information Storage and Retrieval/methods Internet Models, Statistical Pattern Recognition, Automated Protein Binding Protein Structure, Tertiary Proteins/chemistry Proteome/classification Sequence Alignment/methods Sequence Analysis, Protein/methods Software Collapse Grants GTF02011 Telethon Collapse
115	López-Bigas N, Blencowe BJ, Ouzounis CA. Highly consistent patterns for inherited human diseases at the molecular level. ACTA ACUST UNITED AC 2005;22:269-77. [PMID: 16287936 DOI: 10.1093/bioinformatics/bti781] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Abstract Over 1600 mammalian genes are known to cause an inherited disorder, when subjected to one or more mutations. These disease genes represent a unique resource for the identification and quantification of relationships between phenotypic attributes of a disease and the molecular features of the associated disease genes, including their ascribed annotated functional classes and expression patterns. Such analyses can provide a more global perspective and a deeper understanding of the probable causes underlying human hereditary diseases. In this perspective and critical view of disease genomics, we present a comparative analysis of genes reported to cause inherited diseases in humans in terms of their causative effects on physiology, their genetics and inheritance modes, the functional processes they are involved in and their expression profiles across a wide spectrum of tissues. Our analysis reveals that there are more extensive correlations between these attributes of genetic disease genes than previously appreciated. For instance, the functional pattern of genes causing dominant and recessive diseases is markedly different. Also, the function of the genes and their expression correlate with the type of disease they cause when mutated. The results further indicate that a comparative genomics approach for the analysis of genes linked to human genetic diseases will facilitate the elucidation of the underlying molecular and cellular mechanisms. Collapse Key Words Collapse MESH Headings Biomarkers/analysis Gene Expression Profiling/methods Genetic Diseases, Inborn/genetics Genetic Predisposition to Disease/genetics Humans Models, Genetic Molecular Biology/methods Pattern Recognition, Automated Periodicals as Topic Proteome/classification Proteome/genetics Statistics as Topic Collapse Grants Collapse
116	Sharabiani MTA, Siermala M, Lehtinen TO, Vihinen M. Dynamic covariation between gene expression and proteome characteristics. BMC Bioinformatics 2005;6:215. [PMID: 16131395 PMCID: PMC1236912 DOI: 10.1186/1471-2105-6-215] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2004] [Accepted: 08/30/2005] [Indexed: 02/07/2023] Open Abstract Background Cells react to changing intra- and extracellular signals by dynamically modulating complex biochemical networks. Cellular responses to extracellular signals lead to changes in gene and protein expression. Since the majority of genes encode proteins, we investigated possible correlations between protein parameters and gene expression patterns to identify proteome-wide characteristics indicative of trends common to expressed proteins. Results Numerous bioinformatics methods were used to filter and merge information regarding gene and protein annotations. A new statistical time point-oriented analysis was developed for the study of dynamic correlations in large time series data. The method was applied to investigate microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Conclusion We show that the properties of proteins synthesized correlate dynamically with the gene expression profile, indicating that not only is the actual identity and function of expressed proteins important for cellular responses but that several physicochemical and other protein properties correlate with gene expression as well. Gene expression correlates strongly with amino acid composition, composition- and sequence-derived variables, functional, structural, localization and gene ontology parameters. Thus, our results suggest that a dynamic relationship exists between proteome properties and gene expression in many biological systems, and therefore this relationship is fundamental to understanding cellular mechanisms in health and disease. Collapse Key Words Collapse MESH Headings Animals B-Lymphocytes/physiology Cell Cycle/genetics Computational Biology/methods Data Display Drosophila melanogaster/genetics Electronic Data Processing Gene Expression Profiling/methods Gene Frequency Humans Information Storage and Retrieval/methods Lymphocyte Activation/genetics Markov Chains Models, Biological Oligonucleotide Array Sequence Analysis/methods Proteome/classification Saccharomyces cerevisiae/genetics Sequence Analysis, Protein/methods Signal Transduction/genetics Software T-Lymphocytes/physiology Collapse Grants Collapse
117	Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res 2005;33:3390-400. [PMID: 15951512 PMCID: PMC1150220 DOI: 10.1093/nar/gki615] [Citation(s) in RCA: 241] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open Abstract We analyzed length differences of eukaryotic, bacterial and archaeal proteins in relation to function, conservation and environmental factors. Comparing Eukaryotes and Prokaryotes, we found that the greater length of eukaryotic proteins is pervasive over all functional categories and involves the vast majority of protein families. The magnitude of these differences suggests that the evolution of eukaryotic proteins was influenced by processes of fusion of single-function proteins into extended multi-functional and multi-domain proteins. Comparing Bacteria and Archaea, we determined that the small but significant length difference observed between their proteins results from a combination of three factors: (i) bacterial proteomes include a greater proportion than archaeal proteomes of longer proteins involved in metabolism or cellular processes, (ii) within most functional classes, protein families unique to Bacteria are generally longer than protein families unique to Archaea and (iii) within the same protein family, homologs from Bacteria tend to be longer than the corresponding homologs from Archaea. These differences are interpreted with respect to evolutionary trends and prevailing environmental conditions within the two prokaryotic groups. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Animals Archaeal Proteins/chemistry Archaeal Proteins/classification Bacterial Proteins/chemistry Bacterial Proteins/classification Eukaryotic Cells/metabolism Evolution, Molecular Humans Protein Structure, Tertiary Proteome/chemistry Proteome/classification Proteomics Collapse Grants R01 GM010452 NIGMS NIH HHS 2 R01 GM010452 NIGMS NIH HHS Collapse
118	Mao X, Cai T, Olyarchuk JG, Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 2005;21:3787-93. [PMID: 15817693 DOI: 10.1093/bioinformatics/bti430] [Citation(s) in RCA: 2257] [Impact Index Per Article: 118.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open Abstract MOTIVATION High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has certain limitations such as the lack of direct association with pathways. RESULTS We demonstrated the use of the KEGG Orthology (KO), part of the KEGG suite of resources, as an alternative controlled vocabulary for automated annotation and pathway identification. We developed a KO-Based Annotation System (KOBAS) that can automatically annotate a set of sequences with KO terms and identify both the most frequent and the statistically significantly enriched pathways. Results from both whole genome and microarray gene cluster annotations with KOBAS are comparable and complementary to known annotations. KOBAS is a freely available stand-alone Python program that can contribute significantly to genome annotation and microarray analysis. Collapse Key Words Collapse MESH Headings Artificial Intelligence Chromosome Mapping/methods Database Management Systems Documentation/methods Information Storage and Retrieval/methods Natural Language Processing Proteome/classification Proteome/metabolism Sequence Analysis/methods Signal Transduction/physiology Vocabulary, Controlled Collapse Grants Collapse
119	Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R. A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 2005;22:1459-66. [PMID: 15529173 DOI: 10.1038/nbt1031] [Citation(s) in RCA: 570] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Abstract A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research. Collapse Key Words Collapse MESH Headings Database Management Systems Databases, Factual Information Dissemination/methods Information Storage and Retrieval/methods Information Storage and Retrieval/standards Mass Spectrometry/methods Mass Spectrometry/standards Proteome/analysis Proteome/chemistry Proteome/classification Proteomics/methods Proteomics/standards Software User-Computer Interface Collapse Grants BBS/B/12407 Biotechnology and Biological Sciences Research Council N01-HV-28179 NHLBI NIH HHS 1R33CA93302 NCI NIH HHS Collapse
120	Grønborg M, Bunkenborg J, Kristiansen TZ, Jensen ON, Yeo CJ, Hruban RH, Maitra A, Goggins MG, Pandey A. Comprehensive proteomic analysis of human pancreatic juice. J Proteome Res 2005;3:1042-55. [PMID: 15473694 DOI: 10.1021/pr0499085] [Citation(s) in RCA: 173] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Abstract Proteomic technologies provide an excellent means for analysis of body fluids for cataloging protein constituents and identifying biomarkers for early detection of cancers. The biomarkers currently available for pancreatic cancer, such as CA19-9, lack adequate sensitivity and specificity contributing to late diagnosis of this deadly disease. In this study, we carried out a comprehensive characterization of the "pancreatic juice proteome" in patients with pancreatic adenocarcinoma. Pancreatic juice was first fractionated by 1-dimensional gel electrophoresis and subsequently analyzed by liquid chromatography tandem mass spectrometry (LC-MS/MS). A total of 170 unique proteins were identified including known pancreatic cancer tumor markers (e.g., CEA, MUC1) and proteins overexpressed in pancreatic cancers (e.g., hepatocarcinoma-intestine-pancreas/pancreatitis-associated protein (HIP/PAP) and lipocalin 2). In addition, we identified a number of proteins that have not been previously described in pancreatic juice (e.g., tumor rejection antigen (pg96) and azurocidin). Interestingly, a novel protein that is 85% identical to HIP/PAP was identified, which we have designated as PAP-2. The proteins identified in this study could be directly assessed for their potential as biomarkers for pancreatic cancer by quantitative proteomics methods or immunoassays. Collapse Key Words Collapse MESH Headings Agglutinins/analysis Agglutinins/genetics Agglutinins/metabolism Amino Acid Sequence Antigens, Neoplasm/analysis Antigens, Neoplasm/genetics Antigens, Neoplasm/metabolism Antimicrobial Cationic Peptides Biomarkers, Tumor/analysis Biomarkers, Tumor/genetics Biomarkers, Tumor/metabolism Blood Proteins/analysis Blood Proteins/metabolism Calcium-Binding Proteins/genetics Carrier Proteins/analysis Carrier Proteins/genetics Carrier Proteins/metabolism Cell Adhesion Molecules/analysis Cell Adhesion Molecules/genetics Cell Adhesion Molecules/metabolism Chromatography, Liquid DNA-Binding Proteins Electrophoresis, Polyacrylamide Gel Gene Expression/genetics Glycoproteins/genetics Humans Lectins, C-Type/analysis Lectins, C-Type/genetics Lectins, C-Type/metabolism Lithostathine Mass Spectrometry Membrane Proteins/analysis Membrane Proteins/genetics Membrane Proteins/metabolism Molecular Sequence Data Pancreatic Juice/chemistry Pancreatic Juice/metabolism Pancreatic Neoplasms/metabolism Pancreatitis-Associated Proteins Peptide Fragments/analysis Phylogeny Proteome/analysis Proteome/classification Proteome/genetics RNA, Messenger/genetics RNA, Messenger/metabolism Receptors, Cell Surface/analysis Receptors, Cell Surface/genetics Receptors, Cell Surface/metabolism Sequence Alignment Sequence Homology, Amino Acid Trypsin/metabolism Tumor Suppressor Proteins alpha-Defensins/analysis alpha-Defensins/genetics alpha-Defensins/metabolism Collapse Grants P50 CA 62924 NCI NIH HHS Collapse
121	Aloy P, Russell RB. Ten thousand interactions for the molecular biologist. Nat Biotechnol 2005;22:1317-21. [PMID: 15470473 DOI: 10.1038/nbt1018] [Citation(s) in RCA: 161] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Previous studies have suggested that nature is restricted to about 1,000 protein folds to perform a great diversity of functions. Here, we use protein interaction data from different sources and three-dimensional structures to suggest that the total number of interaction types is also limited, and estimate that most interactions in nature will conform to one of about 10,000 types. We currently know fewer than 2,000, and at the present rate of structure determination, it will be more than 20 years before we know a full representative set. Collapse Key Words Collapse MESH Headings Cell Physiological Phenomena Evolution, Molecular Models, Biological Models, Chemical Molecular Biology/methods Protein Folding Protein Interaction Mapping/methods Proteins/chemistry Proteins/classification Proteins/metabolism Proteome/classification Proteome/metabolism Species Specificity Collapse Grants Collapse
122	Andersen JS, Lam YW, Leung AKL, Ong SE, Lyon CE, Lamond AI, Mann M. Nucleolar proteome dynamics. Nature 2005;433:77-83. [PMID: 15635413 DOI: 10.1038/nature03207] [Citation(s) in RCA: 890] [Impact Index Per Article: 46.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2004] [Accepted: 11/16/2004] [Indexed: 01/17/2023] Abstract The nucleolus is a key organelle that coordinates the synthesis and assembly of ribosomal subunits and forms in the nucleus around the repeated ribosomal gene clusters. Because the production of ribosomes is a major metabolic activity, the function of the nucleolus is tightly linked to cell growth and proliferation, and recent data suggest that the nucleolus also plays an important role in cell-cycle regulation, senescence and stress responses. Here, using mass-spectrometry-based organellar proteomics and stable isotope labelling, we perform a quantitative analysis of the proteome of human nucleoli. In vivo fluorescent imaging techniques are directly compared to endogenous protein changes measured by proteomics. We characterize the flux of 489 endogenous nucleolar proteins in response to three different metabolic inhibitors that each affect nucleolar morphology. Proteins that are stably associated, such as RNA polymerase I subunits and small nuclear ribonucleoprotein particle complexes, exit from or accumulate in the nucleolus with similar kinetics, whereas protein components of the large and small ribosomal subunits leave the nucleolus with markedly different kinetics. The data establish a quantitative proteomic approach for the temporal characterization of protein flux through cellular organelles and demonstrate that the nucleolar proteome changes significantly over time in response to changes in cellular growth conditions. Collapse Key Words Collapse MESH Headings Amino Acid Sequence Cell Nucleolus/drug effects Cell Nucleolus/metabolism Cell Survival Dactinomycin/pharmacology HeLa Cells Humans Kinetics Mass Spectrometry Nuclear Proteins/analysis Nuclear Proteins/chemistry Nuclear Proteins/classification Nuclear Proteins/metabolism Proteome/analysis Proteome/chemistry Proteome/classification Proteome/metabolism Proteomics RNA, Messenger/analysis RNA, Messenger/genetics Transcription, Genetic/drug effects Transcription, Genetic/genetics Collapse Grants 073980 Wellcome Trust Collapse
123	Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, Bergmüller E, Dörmann P, Weckwerth W, Gibon Y, Stitt M, Willmitzer L, Fernie AR, Steinhauser D. GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics 2004;21:1635-8. [PMID: 15613389 DOI: 10.1093/bioinformatics/bti236] [Citation(s) in RCA: 875] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open Abstract UNLABELLED Metabolomics, in particular gas chromatography-mass spectrometry (GC-MS) based metabolite profiling of biological extracts, is rapidly becoming one of the cornerstones of functional genomics and systems biology. Metabolite profiling has profound applications in discovering the mode of action of drugs or herbicides, and in unravelling the effect of altered gene expression on metabolism and organism performance in biotechnological applications. As such the technology needs to be available to many laboratories. For this, an open exchange of information is required, like that already achieved for transcript and protein data. One of the key-steps in metabolite profiling is the unambiguous identification of metabolites in highly complex metabolite preparations from biological samples. Collections of mass spectra, which comprise frequently observed metabolites of either known or unknown exact chemical structure, represent the most effective means to pool the identification efforts currently performed in many laboratories around the world. Here we present GMD, The Golm Metabolome Database, an open access metabolome database, which should enable these processes. GMD provides public access to custom mass spectral libraries, metabolite profiling experiments as well as additional information and tools, e.g. with regard to methods, spectral information or compounds. The main goal will be the representation of an exchange platform for experimental research activities and bioinformatics to develop and improve metabolomics by multidisciplinary cooperation. AVAILABILITY http://csbdb.mpimp-golm.mpg.de/gmd.html CONTACT Steinhauser@mpimp-golm.mpg.de SUPPLEMENTARY INFORMATION http://csbdb.mpimp-golm.mpg.de/ Collapse Key Words Collapse MESH Headings Database Management Systems Databases, Protein Documentation/methods Gas Chromatography-Mass Spectrometry/methods Gene Expression Profiling/methods Information Dissemination/methods Information Storage and Retrieval/methods Internet Natural Language Processing Proteome/analysis Proteome/chemistry Proteome/classification Proteome/metabolism Publications Structure-Activity Relationship Vocabulary, Controlled Collapse Grants Collapse
124	Halperin E, Buhler J, Karp R, Krauthgamer R, Westover B. Detecting protein sequence conservation via metric embeddings. Bioinformatics 2004;19 Suppl 1:i122-9. [PMID: 12855448 DOI: 10.1093/bioinformatics/btg1016] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract MOTIVATION Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992). Distance-based approaches to this problem map each peptide in the database to a point in a metric space, such that peptides aligning with higher scores are mapped to closer points. Many techniques exist to discover close pairs of points in a metric space efficiently, but the challenge in applying this work to proteomic comparison is to find a distance mapping that accurately encodes all the distinctions among residue pairs made by a proteomic score matrix. Buhler (2002) proposed one such mapping but found that it led to a relatively inefficient algorithm for protein-protein comparison. RESULTS This work proposes a new distance mapping for peptides under the BLOSUM matrices that permits more efficient similarity search. We first propose a new distance function on peptides derived from a given score matrix. We then show how to map peptides to bit vectors such that the distance between any two peptides is closely approximated by the Hamming distance (i.e. number of mismatches) between their corresponding bit vectors. We combine these two results with the LSH-ALL-PAIRS-SIM algorithm of Buhler (2002) to produce an improved distance-based algorithm for proteomic comparison. An initial implementation of the improved algorithm exhibits sensitivity within 5% of that of the original LSH-ALL-PAIRS-SIM, while running up to eight times faster. Collapse Key Words Collapse MESH Headings Algorithms Amino Acid Sequence Conserved Sequence Databases, Protein Molecular Sequence Data Proteins/chemistry Proteins/classification Proteome/chemistry Proteome/classification Sequence Alignment/methods Sequence Analysis, Protein/methods Sequence Homology, Amino Acid Collapse Grants Collapse
125	Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Güldener U, Mannhaupt G, Münsterkötter M, Mewes HW. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 2004;32:5539-45. [PMID: 15486203 PMCID: PMC524302 DOI: 10.1093/nar/gkh894] [Citation(s) in RCA: 757] [Impact Index Per Article: 37.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract In this paper, we present the Functional Catalogue (FunCat), a hierarchically structured, organism-independent, flexible and scalable controlled classification system enabling the functional description of proteins from any organism. FunCat has been applied for the manual annotation of prokaryotes, fungi, plants and animals. We describe how FunCat is implemented as a highly efficient and robust tool for the manual and automatic annotation of genomic sequences. Owing to its hierarchical architecture, FunCat has also proved to be useful for many subsequent downstream bioinformatic applications. This is illustrated by the analysis of large-scale experiments from various investigations in transcriptomics and proteomics, where FunCat was used to project experimental data into functional units, as 'gold standard' for functional classification methods, and also served to compare the significance of different experimental methods. Over the last decade, the FunCat has been established as a robust and stable annotation scheme that offers both, meaningful and manageable functional classification as well as ease of perception. Collapse Key Words Collapse MESH Headings Abstracting and Indexing Animals Automation/instrumentation Automation/methods Computational Biology/instrumentation Computational Biology/methods Genome Genomics/instrumentation Genomics/methods Internet Protein Binding Proteins/classification Proteins/genetics Proteins/metabolism Proteome/classification Proteome/genetics Proteome/metabolism Proteomics/instrumentation Proteomics/methods Reproducibility of Results Saccharomyces cerevisiae/chemistry Saccharomyces cerevisiae/genetics Saccharomyces cerevisiae Proteins/classification Saccharomyces cerevisiae Proteins/genetics Saccharomyces cerevisiae Proteins/metabolism Software Terminology as Topic Transcription, Genetic/genetics Collapse Grants Collapse