351
|
Hinsby AM, Kiemer L, Karlberg EO, Lage K, Fausbøll A, Juncker AS, Andersen JS, Mann M, Brunak S. A Wiring of the Human Nucleolus. Mol Cell 2006; 22:285-95. [PMID: 16630896 DOI: 10.1016/j.molcel.2006.03.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Revised: 01/31/2006] [Accepted: 03/07/2006] [Indexed: 11/22/2022]
Abstract
Recent proteomic efforts have created an extensive inventory of the human nucleolar proteome. However, approximately 30% of the identified proteins lack functional annotation. We present an approach of assigning function to uncharacterized nucleolar proteins by data integration coupled to a machine-learning method. By assembling protein complexes, we present a first draft of the human ribosome biogenesis pathway encompassing 74 proteins and hereby assign function to 49 previously uncharacterized proteins. Moreover, the functional diversity of the nucleolus is underlined by the identification of a number of protein complexes with functions beyond ribosome biogenesis. Finally, we were able to obtain experimental evidence of nucleolar localization of 11 proteins, which were predicted by our platform to be associates of nucleolar complexes. We believe other biological organelles or systems could be "wired" in a similar fashion, integrating different types of data with high-throughput proteomics, followed by a detailed biological analysis and experimental validation.
Collapse
Affiliation(s)
- Anders M Hinsby
- Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby
| | | | | | | | | | | | | | | | | |
Collapse
|
352
|
Yue P, Melamud E, Moult J. SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics 2006; 7:166. [PMID: 16551372 PMCID: PMC1435944 DOI: 10.1186/1471-2105-7-166] [Citation(s) in RCA: 341] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2005] [Accepted: 03/22/2006] [Indexed: 11/25/2022] Open
Abstract
Background The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level. Description The resource has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension. Conclusion The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction.
Collapse
Affiliation(s)
- Peng Yue
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
- Molecular and cellular Biology Program, University of Maryland, College Park, MD 20742, USA
| | - Eugene Melamud
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
- Molecular and cellular Biology Program, University of Maryland, College Park, MD 20742, USA
| | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA
| |
Collapse
|
353
|
Snyder KA, Feldman HJ, Dumontier M, Salama JJ, Hogue CWV. Domain-based small molecule binding site annotation. BMC Bioinformatics 2006; 7:152. [PMID: 16545112 PMCID: PMC1435939 DOI: 10.1186/1471-2105-7-152] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2005] [Accepted: 03/17/2006] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites. DESCRIPTION Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives. CONCLUSION By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.
Collapse
Affiliation(s)
- Kevin A Snyder
- The Blueprint Initiative, 200 Elm St., Suite 101, Toronto ON, M5T 1K4, Canada
| | - Howard J Feldman
- The Blueprint Initiative, 200 Elm St., Suite 101, Toronto ON, M5T 1K4, Canada
| | - Michel Dumontier
- The Blueprint Initiative, 200 Elm St., Suite 101, Toronto ON, M5T 1K4, Canada
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa ON, K1S 5B6, Canada
| | - John J Salama
- The Blueprint Initiative, 200 Elm St., Suite 101, Toronto ON, M5T 1K4, Canada
| | - Christopher WV Hogue
- The Blueprint Initiative, 200 Elm St., Suite 101, Toronto ON, M5T 1K4, Canada
- Samuel Lunenfeld Research Institute, Room 1060, Mount Sinai Hospital, 600 University Ave., Toronto, Ontario, M5G 1X5, Canada
| |
Collapse
|
354
|
Kim JJ, Park JC. Extracting contrastive information from negation patterns in biomedical literature. ACTA ACUST UNITED AC 2006. [DOI: 10.1145/1131348.1131352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Expressions of negation in the biomedical literature often encode information of contrast as a means for explaining significant differences between the objects that are so contrasted. We show that such information gives additional insights into the nature of the structures and/or biological functions of these objects, leading to valuable knowledge for subcategorization of protein families by the properties that the involved proteins do not have in common. Based on the observation that the expressions of negation employ mostly predictable syntactic structures that can be characterized by subclausal coordination and by clause-level parallelism, we present a system that extracts such contrastive information by identifying those syntactic structures with natural language processing techniques and with additional linguistic resources for semantics. The implemented system shows the performance of 85.7% precision and 61.5% recall, including 7.7% partial recall, or an F score of 76.6. We apply the system to the biological interactions as extracted by our biomedical information-extraction system in order to enrich proteome databases with contrastive information.
Collapse
Affiliation(s)
- Jung-Jae Kim
- Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Jong C. Park
- Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| |
Collapse
|
355
|
Abstract
UNLABELLED We present a software framework and tool called Protein Interactions And Network Analysis (PIANA) that facilitates working with protein interaction networks by (1) integrating data from multiple sources, (2) providing a library that handles graph-related tasks and (3) automating the analysis of protein-protein interaction networks. PIANA can also be used as a stand-alone application to create protein interaction networks and perform tasks such as predicting protein interactions and helping to identify spots in a 2D electrophoresis gel. AVAILABILITY PIANA is under the GNU GPL. Source code, database and detailed documentation may be freely downloaded from http://sbi.imim.es/piana.
Collapse
Affiliation(s)
- Ramon Aragues
- Structural Bioinformatics Group (GRIB-IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, C/Doctor Aiguader, 83, Barcelona 08003, Catalonia, Spain.
| | | | | |
Collapse
|
356
|
Abstract
Much of systems biology aims to predict the behaviour of biological systems on the basis of the set of molecules involved. Understanding the interactions between these molecules is therefore crucial to such efforts. Although many thousands of interactions are known, precise molecular details are available for only a tiny fraction of them. The difficulties that are involved in experimentally determining atomic structures for interacting proteins make predictive methods essential for progress. Structural details can ultimately turn abstract system representations into models that more accurately reflect biological reality.
Collapse
Affiliation(s)
- Patrick Aloy
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | | |
Collapse
|
357
|
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006; 34:D535-9. [PMID: 16381927 PMCID: PMC1347471 DOI: 10.1093/nar/gkj109] [Citation(s) in RCA: 2653] [Impact Index Per Article: 139.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Access to unified datasets of protein and genetic interactions is critical for interrogation of gene/protein function and analysis of global network properties. BioGRID is a freely accessible database of physical and genetic interactions available at . BioGRID release version 2.0 includes >116 000 interactions from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens. Over 30 000 interactions have recently been added from 5778 sources through exhaustive curation of the Saccharomyces cerevisiae primary literature. An internally hyper-linked web interface allows for rapid search and retrieval of interaction data. Full or user-defined datasets are freely downloadable as tab-delimited text files and PSI-MI XML. Pre-computed graphical layouts of interactions are available in a variety of file formats. User-customized graphs with embedded protein, gene and interaction attributes can be constructed with a visualization system called Osprey that is dynamically linked to the BioGRID.
Collapse
Affiliation(s)
| | - Bobby-Joe Breitkreutz
- Samuel Lunenfeld Research Institute, Mount Sinai HospitalToronto, Ontario, Canada M5G 1X5
| | - Teresa Reguly
- Samuel Lunenfeld Research Institute, Mount Sinai HospitalToronto, Ontario, Canada M5G 1X5
| | - Lorrie Boucher
- Samuel Lunenfeld Research Institute, Mount Sinai HospitalToronto, Ontario, Canada M5G 1X5
- Department of Medical Genetics and Microbiology, University of TorontoToronto, Ontario, Canada M5S 1A8
| | - Ashton Breitkreutz
- Samuel Lunenfeld Research Institute, Mount Sinai HospitalToronto, Ontario, Canada M5G 1X5
| | - Mike Tyers
- Samuel Lunenfeld Research Institute, Mount Sinai HospitalToronto, Ontario, Canada M5G 1X5
- Department of Medical Genetics and Microbiology, University of TorontoToronto, Ontario, Canada M5S 1A8
- To whom correspondence should be addressed. Tel: +416 586 8371; Fax: +416 586 8869;
| |
Collapse
|
358
|
Yeats C, Maibaum M, Marsden R, Dibley M, Lee D, Addou S, Orengo CA. Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 2006; 34:D281-4. [PMID: 16381865 PMCID: PMC1347420 DOI: 10.1093/nar/gkj057] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The Gene3D release 4 database and web portal () provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives—including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein–protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.
Collapse
Affiliation(s)
- Corin Yeats
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| | | | | | | | | | | | | |
Collapse
|
359
|
Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D, Shen MY, Kelly L, Melo F, Sali A. MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2006; 34:D291-5. [PMID: 16381869 PMCID: PMC1347422 DOI: 10.1093/nar/gkj059] [Citation(s) in RCA: 209] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, ).
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Narayanan Eswar
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Fred P. Davis
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Hannes Braberg
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - M. S. Madhusudhan
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Andrea Rossi
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Marc Marti-Renom
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Rachel Karchin
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Ben M. Webb
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - David Eramian
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Graduate Group in Biophysics, University of CaliforniaSan Francisco, CA, USA
| | - Min-Yi Shen
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
| | - Libusha Kelly
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Graduate Group in Biological and Medical Informatics, University of CaliforniaSan Francisco, CA, USA
| | - Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de ChileAlameda 340, Santiago, Chile
| | - Andrej Sali
- Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical ResearchQB3 at Mission Bay, Office 503BUniversity of California at San Francisco1700 4th Street, San Francisco, CA 94158, USA
- To whom correspondence should be addressed. Tel: +1 415 514 4227; Fax: +1 415 514 4231;
| |
Collapse
|
360
|
Abstract
The first release of Protein-protein Interactions Thermodynamic Database (PINT) contains >1500 data of several thermodynamic parameters along with sequence and structural information, experimental conditions and literature information. Each entry contains numerical data for the free energy change, dissociation constant, association constant, enthalpy change, heat capacity change and so on of the interacting proteins upon binding, which are important for understanding the mechanism of protein-protein interactions. PINT also includes the name and source of the proteins involved in binding, their Protein Information Resource, SWISS-PROT and Protein Data Bank (PDB) codes, secondary structure and solvent accessibility of residues at mutant positions, measuring methods, experimental conditions, such as buffers, ions and additives, and literature information. A WWW interface facilitates users to search data based on various conditions, feasibility to select the terms for output and different sorting options. Further, PINT is cross-linked with other related databases, PIR, SWISS-PROT, PDB and NCBI PUBMED literature database. The database is freely available at http://www.bioinfodatabase.com/pint/index.html.
Collapse
Affiliation(s)
- M D Shaji Kumar
- Department of Biochemical Engineering and Science, Kyushu Institute of Technology Iizuka 820-8502, Fukuoka, Japan.
| | | |
Collapse
|
361
|
Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 2006; 34:D257-60. [PMID: 16381859 PMCID: PMC1347442 DOI: 10.1093/nar/gkj079] [Citation(s) in RCA: 751] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The Simple Modular Architecture Research Tool (SMART) is an online resource () used for protein domain identification and the analysis of protein domain architectures. Many new features were implemented to make SMART more accessible to scientists from different fields. The new ‘Genomic’ mode in SMART makes it easy to analyze domain architectures in completely sequenced genomes. Domain annotation has been updated with a detailed taxonomic breakdown and a prediction of the catalytic activity for 50 SMART domains is now available, based on the presence of essential amino acids. Furthermore, intrinsically disordered protein regions can be identified and displayed. The network context is now displayed in the results page for more than 350 000 proteins, enabling easy analyses of domain interactions.
Collapse
Affiliation(s)
| | - Richard R. Copley
- Wellcome Trust Centre for Human GeneticsRoosevelt Drive, Oxford OX3 7BN, UK
| | - Birgit Pils
- Bioinformatik, Biozentrum, Am Hubland, University of Wuerzburg97074 Wuerzburg, Germany
| | - Stefan Pinkert
- Bioinformatik, Biozentrum, Am Hubland, University of Wuerzburg97074 Wuerzburg, Germany
| | - Jörg Schultz
- Bioinformatik, Biozentrum, Am Hubland, University of Wuerzburg97074 Wuerzburg, Germany
| | - Peer Bork
- To whom correspondence should be addressed. Tel: +49 6221 387 8526; Fax: +49 6221 387 517;
| |
Collapse
|
362
|
Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M. pSTIING: a 'systems' approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer. Nucleic Acids Res 2006; 34:D527-34. [PMID: 16381926 PMCID: PMC1347407 DOI: 10.1093/nar/gkj044] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
pSTIING (http://pstiing.licr.org) is a new publicly accessible web-based application and knowledgebase featuring 65 228 distinct molecular associations (comprising protein-protein, protein-lipid, protein-small molecule interactions and transcriptional regulatory associations), ligand-receptor-cell type information and signal transduction modules. It has a particular major focus on regulatory networks relevant to chronic inflammation, cell migration and cancer. The web application and interface provide graphical representations of networks allowing users to combine and extend transcriptional regulatory and signalling modules, infer molecular interactions across species and explore networks via protein domains/motifs, gene ontology annotations and human diseases. pSTIING also supports the direct cross-correlation of experimental results with interaction information in the knowledgebase via the CLADIST tool associated with pSTIING, which currently analyses and clusters gene expression, proteomic and phenotypic datasets. This allows the contextual projection of co-expression patterns onto prior network information, facilitating the identification of functional modules in physiologically relevant systems.
Collapse
Affiliation(s)
| | | | | | | | - Marketa Zvelebil
- To whom correspondence should be addressed. Tel: +44 20 7878 4012; Fax: +44 20 7878 4040;
| |
Collapse
|
363
|
Feldman HJ, Snyder KA, Ticoll A, Pintilie G, Hogue CWV. A complete small molecule dataset from the protein data bank. FEBS Lett 2006; 580:1649-53. [PMID: 16494871 DOI: 10.1016/j.febslet.2006.02.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Revised: 01/19/2006] [Accepted: 02/07/2006] [Indexed: 10/25/2022]
Abstract
A complete set of 6300 small molecule ligands was extracted from the protein data bank, and deposited online in PubChem as data source 'SMID'. This set's major improvement over prior methods is the inclusion of cyclic polypeptides and branched polysaccharides, including an unambiguous nomenclature, in addition to normal monomeric ligands. Only the best available example of each ligand structure is retained, and an additional dataset is maintained containing co-ordinates for all examples of each structure. Attempts are made to correct ambiguous atomic elements and other common errors, and a perception algorithm was used to determine bond order and aromaticity when no other information was available.
Collapse
Affiliation(s)
- Howard J Feldman
- The Blueprint Initiative, Suite 101, 200 Elm Street, Toronto, Ont., Canada M5T 1K4
| | | | | | | | | |
Collapse
|
364
|
Brinkworth RI, Munn AL, Kobe B. Protein kinases associated with the yeast phosphoproteome. BMC Bioinformatics 2006; 7:47. [PMID: 16445868 PMCID: PMC1373605 DOI: 10.1186/1471-2105-7-47] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2005] [Accepted: 01/31/2006] [Indexed: 02/08/2023] Open
Abstract
Background Protein phosphorylation is an extremely important mechanism of cellular regulation. A large-scale study of phosphoproteins in a whole-cell lysate of Saccharomyces cerevisiae has previously identified 383 phosphorylation sites in 216 peptide sequences. However, the protein kinases responsible for the phosphorylation of the identified proteins have not previously been assigned. Results We used Predikin in combination with other bioinformatic tools, to predict which of 116 unique protein kinases in yeast phosphorylates each experimentally determined site in the phosphoproteome. The prediction was based on the match between the phosphorylated 7-residue sequence and the predicted substrate specificity of each kinase, with the highest weight applied to the residues or positions that contribute most to the substrate specificity. We estimated the reliability of the predictions by performing a parallel prediction on phosphopeptides for which the kinase has been experimentally determined. Conclusion The results reveal that the functions of the protein kinases and their predicted phosphoprotein substrates are often correlated, for example in endocytosis, cytokinesis, transcription, replication, carbohydrate metabolism and stress response. The predictions link phosphoproteins of unknown function with protein kinases with known functions and vice versa, suggesting functions for the uncharacterized proteins. The study indicates that the phosphoproteins and the associated protein kinases represented in our dataset have housekeeping cellular roles; certain kinases are not represented because they may only be activated during specific cellular responses. Our results demonstrate the utility of our previously reported protein kinase substrate prediction approach (Predikin) as a tool for establishing links between kinases and phosphoproteins that can subsequently be tested experimentally.
Collapse
Affiliation(s)
- Ross I Brinkworth
- School of Molecular and Microbial Sciences, University of Queensland, Brisbane 4072, Australia
| | - Alan L Munn
- Institute for Molecular Bioscience and Special Research Centre for Functional and Applied Genomics, University of Queensland, Brisbane 4072, Australia
- School of Biomedical Sciences, University of Queensland, Brisbane 4072, Australia
| | - Boštjan Kobe
- School of Molecular and Microbial Sciences, University of Queensland, Brisbane 4072, Australia
- Institute for Molecular Bioscience and Special Research Centre for Functional and Applied Genomics, University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
365
|
Teber ET, Crawford E, Bolton KB, Van Dyk D, Schofield PR, Kapoor V, Church WB. Djinn Lite: a tool for customised gene transcript modelling, annotation-data enrichment and exploration. BMC Bioinformatics 2006; 7:33. [PMID: 16426464 PMCID: PMC1397871 DOI: 10.1186/1471-2105-7-33] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2005] [Accepted: 01/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is an ever increasing rate of data made available on genetic variation, transcriptomes and proteomes. Similarly, a growing variety of bioinformatic programs are becoming available from many diverse sources, designed to identify a myriad of sequence patterns considered to have potential biological importance within inter-genic regions, genes, transcripts, and proteins. However, biologists require easy to use, uncomplicated tools to integrate this information, visualise and print gene annotations. Integrating this information usually requires considerable informatics skills, and comprehensive knowledge of the data format to make full use of this information. Tools are needed to explore gene model variants by allowing users the ability to create alternative transcript models using novel combinations of exons not necessarily represented in current database deposits of mRNA/cDNA sequences. RESULTS Djinn Lite is designed to be an intuitive program for storing and visually exploring of custom annotations relating to a eukaryotic gene sequence and its modelled gene products. In particular, it is helpful in developing hypothesis regarding alternate splicing of transcripts by allowing the construction of model transcripts and inspection of their resulting translations. It facilitates the ability to view a gene and its gene products in one synchronised graphical view, allowing one to drill down into sequence related data. Colour highlighting of selected sequences and added annotations further supports exploration, visualisation of sequence regions and motifs known or predicted to be biologically significant. CONCLUSION Gene annotating remains an ongoing and challenging task that will continue as gene structures, gene transcription repertoires, disease loci, protein products and their interactions become more precisely defined. Djinn Lite offers an accessible interface to help accumulate, enrich, and individualize sequence annotations relating to a gene, its transcripts and translations. The mechanism of transcript definition and creation, and subsequent navigation and exploration of features, are very intuitive and demand only a short learning curve. Ultimately, Djinn Lite can form the basis for providing valuable clues to plan new experiments, providing storage of sequences and annotations for dedication to customised projects. The application is appropriate for Windows 98-ME-2000-XP-2003 operating systems.
Collapse
Affiliation(s)
- Erdahl T Teber
- School of Medical Sciences, University of New South Wales NSW 2052, Australia
- Neurobiology Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- Faculty of Pharmacy, University of Sydney NSW 2006, Australia
| | - Edward Crawford
- School of Medical Sciences, University of New South Wales NSW 2052, Australia
| | - Kent B Bolton
- EBM Pty Ltd, Level 6, 110 Sussex Street, Sydney, NSW 2000, Australia
| | - Derek Van Dyk
- NSW Ministry for Science and Medical Research, GPO Box 5341, Sydney NSW 2001, Australia
| | - Peter R Schofield
- Neurobiology Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- Prince of Wales Medical Research Institute, Sydney NSW 2031, Australia
| | - Vimal Kapoor
- School of Medical Sciences, University of New South Wales NSW 2052, Australia
- Department of Medicine and Pharmacology, University of Western Australia, Crawley WA 6009, Australia
| | - W Bret Church
- School of Medical Sciences, University of New South Wales NSW 2052, Australia
- Neurobiology Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- Faculty of Pharmacy, University of Sydney NSW 2006, Australia
| |
Collapse
|
366
|
Brown JA, Sherlock G, Myers CL, Burrows NM, Deng C, Wu HI, McCann KE, Troyanskaya OG, Brown JM. Global analysis of gene function in yeast by quantitative phenotypic profiling. Mol Syst Biol 2006; 2:2006.0001. [PMID: 16738548 PMCID: PMC1681475 DOI: 10.1038/msb4100043] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2005] [Accepted: 12/01/2005] [Indexed: 11/09/2022] Open
Abstract
We present a method for the global analysis of the function of genes in budding yeast based on hierarchical clustering of the quantitative sensitivity profiles of the 4756 strains with individual homozygous deletion of nonessential genes to a broad range of cytotoxic or cytostatic agents. This method is superior to other global methods of identifying the function of genes involved in the various DNA repair and damage checkpoint pathways as well as other interrogated functions. Analysis of the phenotypic profiles of the 51 diverse treatments places a total of 860 genes of unknown function in clusters with genes of known function. We demonstrate that this can not only identify the function of unknown genes but can also suggest the mechanism of action of the agents used. This method will be useful when used alone and in conjunction with other global approaches to identify gene function in yeast.
Collapse
Affiliation(s)
- James A Brown
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA 94305-5152, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
367
|
Abstract
Bioinformatics plays an essential role in today's plant science. As the amount of data grows exponentially, there is a parallel growth in the demand for tools and methods in data management, visualization, integration, analysis, modeling, and prediction. At the same time, many researchers in biology are unfamiliar with available bioinformatics methods, tools, and databases, which could lead to missed opportunities or misinterpretation of the information. In this review, we describe some of the key concepts, methods, software packages, and databases used in bioinformatics, with an emphasis on those relevant to plant science. We also cover some fundamental issues related to biological sequence analyses, transcriptome analyses, computational proteomics, computational metabolomics, bio-ontologies, and biological databases. Finally, we explore a few emerging research topics in bioinformatics.
Collapse
Affiliation(s)
- Seung Yon Rhee
- Department of Plant Biology, Carnegie Institution, Stanford, California 94305, USA.
| | | | | |
Collapse
|
368
|
Tobita M, Horiuchi K, Araki K, Nemoto M, Shimada H, Nishikawa T. BirdsAnts: A protein-small molecule interaction viewer. CHEM-BIO INFORMATICS JOURNAL 2006. [DOI: 10.1273/cbij.6.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Motoi Tobita
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
- Hitachi, Ltd., Advanced Research Laboratory
| | - Ken Horiuchi
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
| | - Kenji Araki
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
| | - Masashi Nemoto
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
| | - Hiroyasu Shimada
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
| | - Tetsuo Nishikawa
- Informatics Department, Reverse Proteomics Research Institute Co., Ltd.,
| |
Collapse
|
369
|
Lee BT, Song CM, Yeo BH, Chung CW, Chan YL, Lim TT, Chua YB, Loh MC, Ang BK, Vijayakumar P, Liew L, Lim J, Lim YP, Wong CH, Chuon D, Rajagopal G, Hill J. Gastric Cancer (Biomarkers) Knowledgebase (GCBKB): A Curated and Fully Integrated Knowledgebase of Putative Biomarkers Related to Gastric Cancer. Biomark Insights 2006. [DOI: 10.1177/117727190600100005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The Gastric Cancer (Biomarkers) Knowledgebase (GCBKB) ( http://biomarkers.bii.a-star.edu.sg/background/gastricCancerBiomarkersKb.php ) is a curated and fully integrated knowledgebase that provides data relating to putative biomarkers that may be used in the diagnosis and prognosis of gastric cancer. It is freely available to all users. The data contained in the knowledgebase was derived from a large literature source and the putative biomarkers therein have been annotated with data from the public domain. The knowledgebase is maintained by a curation team who update the data from a defined source. As well as mining data from the literature, the knowledgebase will also be populated with unpublished experimental data from investigators working in the gastric cancer biomarker discovery field. Users can perform searches to identify potential markers defined by experiment type, tissue type and disease state. Search results may be saved, manipulated and retrieved at a later date. As far as the authors are aware this is the first open access database dedicated to the discovery and investigation of gastric cancer biomarkers.
Collapse
Affiliation(s)
- Bernett T.K. Lee
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Chun Meng Song
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Boon Huat Yeo
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Cheuk Wang Chung
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Ying Leong Chan
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Teng Ting Lim
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Yen Bing Chua
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Marie C.S. Loh
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Boon Keong Ang
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Praveen Vijayakumar
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Lailing Liew
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Jiahao Lim
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Yun Ping Lim
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Chee Hong Wong
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Danny Chuon
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Gunaretnam Rajagopal
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| | - Jeffrey Hill
- Bioinformatics Institute, 30 Biopolis Street, #07–01 Matrix, Singapore 138671, Singapore
| |
Collapse
|
370
|
Green RF, Moore C. Incorporating genetic analyses into birth defects cluster investigations: Strategies for identifying candidate genes. ACTA ACUST UNITED AC 2006; 76:798-810. [PMID: 17036308 DOI: 10.1002/bdra.20280] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
BACKGROUND Incorporating genetic analyses into birth defect cluster investigations may increase understanding of both genetic and environmental risk factors for the defect. Current constraints of most birth defect cluster investigations make candidate gene selection the most feasible approach. Here, we describe strategies for choosing candidate genes for such investigations, which will also be applicable to more general gene-environment studies. METHODS We reviewed publicly available web-based resources for selection of candidate genes and identification of risk factors, as well as publications on different strategies for candidate gene selection. RESULTS Candidate gene selection requires consideration of available gene-disease databases, previous epidemiological studies, animal model research, linkage and expression studies, and other resources. We describe general considerations for utilizing available resources, as well as provide an example of a search for candidate genes related to gastroschisis. CONCLUSIONS Available web resources could facilitate selection of candidate genes, but selection of optimal candidates will still require a strong understanding of genetics and the pathogenesis of the defect, as well as careful consideration of previous epidemiological studies.
Collapse
Affiliation(s)
- Ridgely Fisk Green
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, USA.
| | | |
Collapse
|
371
|
Avruch J, Praskova M, Ortiz-Vega S, Liu M, Zhang XF. Nore1 and RASSF1 Regulation of Cell Proliferation and of the MST1/2 Kinases. Methods Enzymol 2006; 407:290-310. [PMID: 16757333 DOI: 10.1016/s0076-6879(05)07025-4] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The six human Nore1/RASSF genes encode a family of putative tumor suppressor proteins, each expressed as multiple mRNA splice variants. The predominant isoforms of these noncatalytic polypeptides are characterized by the presence in their carboxyterminal segments of a Ras-Association (RA) domain followed by a SARAH domain. The expression of the RASSF1A and Nore1A isoforms is extinguished selectively by gene loss and/or epigenetic mechanisms in a considerable fraction of epithelial cancers and cell lines derived therefrom, and reexpression usually suppresses the proliferation and tumorigenicity of these cells. RASSF1A/Nore1A can cause cell cycle delay in G1 and/or M and may promote apoptosis. The founding member, Nore1A, binds preferentially through its RA domain to the GTP-charged forms of Ras, Rap-1, and several other Ras subfamily GTPases with high affinity. By contrast, RASSF1, despite an RA domain 50% identical to Nore1, exhibits relatively low affinity for Ras-like GTPases but may associate with Ras-GTP indirectly. Each of the RASSF polypeptides, including the C. elegans ortholog encoded by T24F1.3, binds to the Ste20-related protein kinases MST1 and MST2 through the SARAH domains of each partner. The recombinant MST1/2 kinases, spontaneous dimers, autoactivate in vitro through an intradimer transphosphorylation of the activation loop, and the Nore1/RASSF1 polypeptides inhibit this process. Recombinant MST1 is strongly activated in vivo by recruitment to the membrane; the recombinant MST1 that is bound to RasG12V through Nore1A is activated; however, the bulk of MST1 is not. Endogenous complexes of MST1 with both Nore1A and RASSF1A are detectable, and Nore1A/MST1 can associate with endogenous Ras in response to serum addition. Nevertheless, the physiological functions of the Nore1/RASSF polypeptides in mammalian cells, as well as the role of the MST1/2 kinases in their growth-suppressive actions, remain to be established. The Drosophila MST1/2 ortholog hippo is a negative regulator of cell cycle progression and is necessary for developmental apoptosis. Overexpression of mammalian MST1 or MST2 promotes apoptosis, as does overexpression of mutant active Ki-Ras. Interference with the ability of endogenous MST1/2 to associate with the Nore1/RASSF polypeptides inhibits Ras-induced apoptosis. At present, however, the relevance of Ki-Ras-induced apoptosis to the physiological functions of c-Ras and to the growth-regulating actions of spontaneously occurring oncogenic Ras mutants is not known.
Collapse
Affiliation(s)
- Joseph Avruch
- Department of Molecular Biology and Diabetes Unit, Medical Services, Massachusetts General Hospital, USA
| | | | | | | | | |
Collapse
|
372
|
Fink JL, Aturaliya RN, Davis MJ, Zhang F, Hanson K, Teasdale MS, Kai C, Kawai J, Carninci P, Hayashizaki Y, Teasdale RD. LOCATE: a mouse protein subcellular localization database. Nucleic Acids Res 2006; 34:D213-7. [PMID: 16381849 PMCID: PMC1347432 DOI: 10.1093/nar/gkj069] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 10/08/2005] [Accepted: 10/08/2005] [Indexed: 11/14/2022] Open
Abstract
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for approximately 40% of the mouse proteome. It is available at http://locate.imb.uq.edu.au.
Collapse
Affiliation(s)
- J Lynn Fink
- ARC Centre in Bioinformatics, University of Queensland, St Lucia, Queensland 4072, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
373
|
Kim JJ, Zhang Z, Park JC, Ng SK. BioContrasts: extracting and exploiting protein–protein contrastive relations from biomedical literature. Bioinformatics 2005; 22:597-605. [PMID: 16368768 DOI: 10.1093/bioinformatics/btk016] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Contrasts are useful conceptual vehicles for learning processes and exploratory research of the unknown. For example, contrastive information between proteins can reveal what similarities, divergences and relations there are of the two proteins, leading to invaluable insights for better understanding about the proteins. Such contrastive information are found to be reported in the biomedical literature. However, there have been no reported attempts in current biomedical text mining work that systematically extract and present such useful contrastive information from the literature for exploitation. RESULTS Our BioContrasts system extracts protein-protein contrastive information from MEDLINE abstracts and presents the information to biologists in a web-application for exploitation. Contrastive information are identified in the text abstracts with contrastive negation patterns such as 'A but not B'. A total of 799 169 pairs of contrastive expressions were successfully extracted from 2.5 million MEDLINE abstracts. Using grounding of contrastive protein names to Swiss-Prot entries, we were able to produce 41 471 pieces of contrasts between Swiss-Prot protein entries. These contrastive pieces of information are then presented via a user-friendly interactive web portal that can be exploited for applications such as the refinement of biological pathways. AVAILABILITY BioContrasts can be accessed at http://biocontrasts.i2r.a-star.edu.sg. It is also mirrored at http://biocontrasts.biopathway.org. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jung-Jae Kim
- Computer Science Division & AITrc, Korea Advanced Institute of Science and Technology, Yuseong-gu, Daejeon 305-701, South Korea
| | | | | | | |
Collapse
|
374
|
Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG. Discovery of biological networks from diverse functional genomic data. Genome Biol 2005; 6:R114. [PMID: 16420673 PMCID: PMC1414113 DOI: 10.1186/gb-2005-6-13-r114] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Revised: 08/31/2005] [Accepted: 11/21/2005] [Indexed: 01/31/2023] Open
Abstract
BioPIXIE is a probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web.
Collapse
Affiliation(s)
- Chad L Myers
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Drew Robson
- Department of Mathematics, Princeton University, Washington Road, Princeton, NJ 08540, USA
| | - Adam Wible
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Matthew A Hibbs
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Camelia Chiriac
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Chandra L Theesfeld
- Department of Genetics, School of Medicine, Mailstop-S120, Stanford University, Stanford, CA 94305-5120, USA
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
375
|
Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MMS, Ling J, Xu T, Wasserman WW, Ouellette BFF. Ulysses - an application for the projection of molecular interactions across species. Genome Biol 2005; 6:R106. [PMID: 16356269 PMCID: PMC1414088 DOI: 10.1186/gb-2005-6-12-r106] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2005] [Revised: 08/03/2005] [Accepted: 11/08/2005] [Indexed: 11/21/2022] Open
Abstract
We developed Ulysses as a user-oriented system that uses a process called Interolog Analysis for the parallel analysis and display of protein interactions detected in various species. Ulysses was designed to perform such Interolog Analysis by the projection of model organism interaction data onto homologous human proteins, and thus serves as an accelerator for the analysis of uncharacterized human proteins. The relevance of projections was assessed and validated against published reference collections. All source code is freely available, and the Ulysses system can be accessed via a web interface http://www.cisreg.ca/ulysses.
Collapse
Affiliation(s)
- Danielle Kemmer
- Center for Genomics and Bioinformatics, Karolinska Institutet, 171 77 Stockholm, Sweden
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Yong Huang
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Sohrab P Shah
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
- Department of Computer Science, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Jonathan Lim
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Jochen Brumm
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
| | - Macaire MS Yuen
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - John Ling
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Tao Xu
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver V5Z 4H4, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - BF Francis Ouellette
- UBC Bioinformatics Centre, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver V6T 1Z4, BC, Canada
| |
Collapse
|
376
|
Gong Y, Zhang Z. Alternative signaling pathways: when, where and why? FEBS Lett 2005; 579:5265-74. [PMID: 16194539 DOI: 10.1016/j.febslet.2005.08.062] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2005] [Revised: 08/29/2005] [Accepted: 08/30/2005] [Indexed: 11/24/2022]
Abstract
Alternative cell signal transduction pathways have been demonstrated in some experimental systems. The importance of their existence has not been completely appreciated. In this review we present the cases of alternative pathways resulted from a survey of the available experimental data. The alternative pathways could show different relationships, i.e., synergistic, redundant, additive, opposite and competitive effects. They could have distinct time courses and cell, organ, sex or species specification. Further, they could happen during physiological or pathological situations, and display differentiated sensitivity. These case studies together imply that alternative signal pathways could be involved in the regulation of cell functions at the pathway level. In-depth understanding of the importance of the alternative pathways will rely on building and exploration of mathematical models.
Collapse
Affiliation(s)
- Yunchen Gong
- Banting and Best Department of Medical Research, University of Toronto 112 College, Canada.
| | | |
Collapse
|
377
|
Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD, Hood L. A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 2005; 102:17302-7. [PMID: 16301536 PMCID: PMC1297683 DOI: 10.1073/pnas.0508649102] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The integration of data from multiple global assays is essential to understanding dynamic spatiotemporal interactions within cells. In a companion paper, we reported a data integration methodology, designated Pointillist, that can handle multiple data types from technologies with different noise characteristics. Here we demonstrate its application to the integration of 18 data sets relating to galactose utilization in yeast. These data include global changes in mRNA and protein abundance, genome-wide protein-DNA interaction data, database information, and computational predictions of protein-DNA and protein-protein interactions. We divided the integration task to determine three network components: key system elements (genes and proteins), protein-protein interactions, and protein-DNA interactions. Results indicate that the reconstructed network efficiently focuses on and recapitulates the known biology of galactose utilization. It also provided new insights, some of which were verified experimentally. The methodology described here, addresses a critical need across all domains of molecular and cell biology, to effectively integrate large and disparate data sets.
Collapse
Affiliation(s)
- Daehee Hwang
- Institute for Systems Biology, Seattle, WA 98103, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
378
|
Lubec G, Afjehi-Sadat L, Yang JW, John JPP. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005; 77:90-127. [PMID: 16271823 DOI: 10.1016/j.pneurobio.2005.10.001] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2005] [Revised: 09/18/2005] [Accepted: 10/02/2005] [Indexed: 12/29/2022]
Abstract
A large part of mammalian proteomes is represented by hypothetical proteins (HP), i.e. proteins predicted from nucleic acid sequences only and protein sequences with unknown function. Databases are far from being complete and errors are expected. The legion of HP is awaiting experiments to show their existence at the protein level and subsequent bioinformatic handling in order to assign proteins a tentative function is mandatory. Two-dimensional gel-electrophoresis with subsequent mass spectrometrical identification of protein spots is an appropriate tool to search for HP in the high-throughput mode. Spots are identified by MS or by MS/MS measurements (MALDI-TOF, MALDI-TOF-TOF) and subsequent software as e.g. Mascot or ProFound. In many cases proteins can thus be unambiguously identified and characterised; if this is not the case, de novo sequencing or Q-TOF analysis is warranted. If the protein is not identified, the sequence is being sent to databases for BLAST searches to determine identities/similarities or homologies to known proteins. If no significant identity to known structures is observed, the protein sequence is examined for the presence of functional domains (databases PROSITE, PRINTS, InterPro, ProDom, Pfam and SMART), subjected to searches for motifs (ELM) and finally protein-protein interaction databases (InterWeaver, STRING) are consulted or predictions from conformations are performed. We here provide information about hypothetical proteins in terms of protein chemical analysis, independent of antibody availability and specificity and bioinformatic handling to contribute to the extension/completion of protein databases and include original work on HP in the brain to illustrate the processes of HP identification and functional assignment.
Collapse
Affiliation(s)
- Gert Lubec
- Department of Pediatrics, Division of Basic Sciences, Medical University of Vienna, Waehringer Guertel 18-20, A-1090, Vienna, Austria.
| | | | | | | |
Collapse
|
379
|
Feldman HJ, Dumontier M, Ling S, Haider N, Hogue CWV. CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules. FEBS Lett 2005; 579:4685-91. [PMID: 16098521 DOI: 10.1016/j.febslet.2005.07.039] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2005] [Revised: 07/21/2005] [Accepted: 07/21/2005] [Indexed: 11/20/2022]
Abstract
A novel chemical ontology based on chemical functional groups automatically, objectively assigned by a computer program, was developed to categorize small molecules. It has been applied to PubChem and the small molecule interaction database to demonstrate its utility as a basic pharmacophore search system. Molecules can be compared using a semantic similarity score based on functional group assignments rather than 3D shape, which succeeds in identifying small molecules known to bind a common binding site. This ontology will serve as a powerful tool for searching chemical databases and identifying key functional groups responsible for biological activities.
Collapse
Affiliation(s)
- Howard J Feldman
- The Blueprint Initiative of the Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Ave., Toronto, ON, Canada
| | | | | | | | | |
Collapse
|
380
|
Abstract
Biological networks are the representation of multiple interactions within a cell, a global view intended to help understand how relationships between molecules dictate cellular behavior. Recent advances in molecular and computational biology have made possible the study of intricate transcriptional regulatory networks that describe gene expression as a function of regulatory inputs specified by interactions between proteins and DNA. Here we review the properties of transcriptional regulatory networks and the rapidly evolving approaches that will enable the elucidation of their structure and dynamic behavior. Several recent studies illustrate how complementary approaches combine chromatin immunoprecipitation (ChIP)-on-chip, gene expression profiling, and computational methods to construct blueprints for the initiation and maintenance of complex cellular processes, including cell cycle progression, growth arrest, and differentiation. These approaches should allow us to elucidate complete transcriptional regulatory codes for yeast as well as mammalian cells.
Collapse
Affiliation(s)
- Alexandre Blais
- Department of Pathology, New York University Cancer Institute, New York University School of Medicine, New York, New York 10016, USA
| | | |
Collapse
|
381
|
Abstract
Experiments involving high-throughput methods for measuring transcripts, proteins and metabolites constitute the area of functional genomics. These experiments are highly context dependent and require much more detail about the experimental design, sample and protocols used than in genomics. Functional genomics databases are needed that follow established and emerging standards. Functional genomic databases are not yet very common; however, there are a few focused on microbial genomes and a couple integrative systems are available for setting up functional genomics databases.
Collapse
Affiliation(s)
- Christian J Stoeckert
- Center for Bioinformatics and Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
382
|
Uetz P, Finley RL. From protein networks to biological systems. FEBS Lett 2005; 579:1821-7. [PMID: 15763558 DOI: 10.1016/j.febslet.2005.02.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2005] [Accepted: 01/31/2005] [Indexed: 11/21/2022]
Abstract
A system-level understanding of any biological process requires a map of the relationships among the various molecules involved. Technologies to detect and predict protein interactions have begun to produce very large maps of protein interactions, some including most of an organism's proteins. These maps can be used to study how proteins work together to form molecular machines and regulatory pathways. They also provide a framework for constructing predictive models of how information and energy flow through biological networks. In many respects, protein interaction maps are an entrée into systems biology.
Collapse
Affiliation(s)
- Peter Uetz
- Research Center Karlsruhe, Institute of Genetics, P.O. Box 3640, D-76021 Karlsruhe, Germany.
| | | |
Collapse
|
383
|
Drabkin HJ, Hollenbeck C, Hill DP, Blake JA. Ontological visualization of protein-protein interactions. BMC Bioinformatics 2005; 6:29. [PMID: 15707487 PMCID: PMC550656 DOI: 10.1186/1471-2105-6-29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2004] [Accepted: 02/11/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cellular processes require the interaction of many proteins across several cellular compartments. Determining the collective network of such interactions is an important aspect of understanding the role and regulation of individual proteins. The Gene Ontology (GO) is used by model organism databases and other bioinformatics resources to provide functional annotation of proteins. The annotation process provides a mechanism to document the binding of one protein with another. We have constructed protein interaction networks for mouse proteins utilizing the information encoded in the GO annotations. The work reported here presents a methodology for integrating and visualizing information on protein-protein interactions. RESULTS GO annotation at Mouse Genome Informatics (MGI) captures 1318 curated, documented interactions. These include 129 binary interactions and 125 interaction involving three or more gene products. Three networks involve over 30 partners, the largest involving 109 proteins. Several tools are available at MGI to visualize and analyze these data. CONCLUSIONS Curators at the MGI database annotate protein-protein interaction data from experimental reports from the literature. Integration of these data with the other types of data curated at MGI places protein binding data into the larger context of mouse biology and facilitates the generation of new biological hypotheses based on physical interactions among gene products.
Collapse
Affiliation(s)
- Harold J Drabkin
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | | | - David P Hill
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | - Judith A Blake
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| |
Collapse
|