301
|
GONG Y, ZHANG Z. CellFrame: A Data Structure for Abstraction of Cell Biology Experiments and Construction of Perturbation Networks. Ann N Y Acad Sci 2007; 1115:249-66. [DOI: 10.1196/annals.1407.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
302
|
McIntosh T, Chawla S. High confidence rule mining for microarray analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:611-623. [PMID: 17975272 DOI: 10.1109/tcbb.2007.1050] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.
Collapse
|
303
|
Sanchez-Graillet O, Poesio M. Negation of protein-protein interactions: analysis and extraction. ACTA ACUST UNITED AC 2007; 23:i424-32. [PMID: 17646327 DOI: 10.1093/bioinformatics/btm184] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Negative information about protein-protein interactions--from uncertainty about the occurrence of an interaction to knowledge that it did not occur--is often of great use to biologists and could lead to important discoveries. Yet, to our knowledge, no proposals focusing on extracting such information have been proposed in the text mining literature. RESULTS In this work, we present an analysis of the types of negative information that is reported, and a heuristic-based system using a full dependency parser to extract such information. We performed a preliminary evaluation study that shows encouraging results of our system. Finally, we have obtained an initial corpus of negative protein-protein interactions as basis for the construction of larger ones. AVAILABILITY The corpus is available by request from the authors.
Collapse
|
304
|
Abstract
UNLABELLED Many tools exist for visually exploring biological networks including well-known examples such as Cytoscape, VisANT, Pathway Studio and Patika. These systems play a key role in the development of integrative biology, systems biology and integrative bioinformatics. The trend in the development of these tools is to go beyond 'static' representations of cellular state, towards a more dynamic model of cellular processes through the incorporation of gene expression data, subcellular localization information and time-dependent behavior. We provide a comprehensive review of the relative advantages and disadvantages of existing systems with two goals in mind: to aid researchers in efficiently identifying the appropriate existing tools for data visualization; to describe the necessary and realistic goals for the next generation of visualization tools. In view of the first goal, we provide in the Supplementary Material a systematic comparison of more than 35 existing tools in terms of over 25 different features. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew Suderman
- McGill Centre for Bioinformatics, 3775 University Street, Montreal, QCH3A 2B4, Canada.
| | | |
Collapse
|
305
|
Abstract
PROCOGNATE is a database of protein cognate ligands for the domains in enzyme structures as described by CATH, SCOP and Pfam, and is available as an interactive website or a flat file. This article gives an overview of the database and its generation and presents a new website front end, as well as recent increased coverage in our dataset via inclusion of Pfam domains. We also describe navigation of the website and its features. The current version (1.3) of PROCOGNATE covers 4123, 4536, 5876 structures and 377, 326, 695 superfamilies/families in CATH, SCOP and Pfam, respectively. PROCOGNATE can be accessed at: http://www.ebi.ac.uk/thornton-srv/databases/procognate/
Collapse
Affiliation(s)
- Matthew Bashton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | |
Collapse
|
306
|
Ramírez F, Schlicker A, Assenov Y, Lengauer T, Albrecht M. Computational analysis of human protein interaction networks. Proteomics 2007; 7:2541-52. [PMID: 17647236 DOI: 10.1002/pmic.200600924] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Large amounts of human protein interaction data have been produced by experiments and prediction methods. However, the experimental coverage of the human interactome is still low in contrast to predicted data. To gain insight into the value of publicly available human protein network data, we compared predicted datasets, high-throughput results from yeast two-hybrid screens, and literature-curated protein-protein interactions. This evaluation is not only important for further methodological improvements, but also for increasing the confidence in functional hypotheses derived from predictions. Therefore, we assessed the quality and the potential bias of the different datasets using functional similarity based on the Gene Ontology, structural iPfam domain-domain interactions, likelihood ratios, and topological network parameters. This analysis revealed major differences between predicted datasets, but some of them also scored at least as high as the experimental ones regarding multiple quality measures. Therefore, since only small pair wise overlap between most datasets is observed, they may be combined to enlarge the available human interactome data. For this purpose, we additionally studied the influence of protein length on data quality and the number of disease proteins covered by each dataset. We could further demonstrate that protein interactions predicted by more than one method achieve an elevated reliability.
Collapse
Affiliation(s)
- Fidel Ramírez
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
| | | | | | | | | |
Collapse
|
307
|
Shtatland T, Guettler D, Kossodo M, Pivovarov M, Weissleder R. PepBank--a database of peptides based on sequence text mining and public peptide data sources. BMC Bioinformatics 2007; 8:280. [PMID: 17678535 PMCID: PMC1976427 DOI: 10.1186/1471-2105-8-280] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Accepted: 08/01/2007] [Indexed: 12/04/2022] Open
Abstract
Background Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources. Description We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery. Conclusion We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on , and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN ().
Collapse
Affiliation(s)
- Timur Shtatland
- Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg. 149, 13Street, Room 5406, Charlestown, MA 02129, USA
| | - Daniel Guettler
- Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg. 149, 13Street, Room 5406, Charlestown, MA 02129, USA
| | - Misha Kossodo
- Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg. 149, 13Street, Room 5406, Charlestown, MA 02129, USA
- Northern Essex Community College, 100 Elliott Street, Haverhill, MA 01830, USA
| | - Misha Pivovarov
- Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg. 149, 13Street, Room 5406, Charlestown, MA 02129, USA
| | - Ralph Weissleder
- Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg. 149, 13Street, Room 5406, Charlestown, MA 02129, USA
| |
Collapse
|
308
|
Aragues R, Sali A, Bonet J, Marti-Renom MA, Oliva B. Characterization of protein hubs by inferring interacting motifs from protein interactions. PLoS Comput Biol 2007; 3:1761-71. [PMID: 17941705 PMCID: PMC1976338 DOI: 10.1371/journal.pcbi.0030178] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 07/27/2007] [Indexed: 12/19/2022] Open
Abstract
The characterization of protein interactions is essential for understanding biological systems. While genome-scale methods are available for identifying interacting proteins, they do not pinpoint the interacting motifs (e.g., a domain, sequence segments, a binding site, or a set of residues). Here, we develop and apply a method for delineating the interacting motifs of hub proteins (i.e., highly connected proteins). The method relies on the observation that proteins with common interaction partners tend to interact with these partners through a common interacting motif. The sole input for the method are binary protein interactions; neither sequence nor structure information is needed. The approach is evaluated by comparing the inferred interacting motifs with domain families defined for 368 proteins in the Structural Classification of Proteins (SCOP). The positive predictive value of the method for detecting proteins with common SCOP families is 75% at sensitivity of 10%. Most of the inferred interacting motifs were significantly associated with sequence patterns, which could be responsible for the common interactions. We find that yeast hubs with multiple interacting motifs are more likely to be essential than hubs with one or two interacting motifs, thus rationalizing the previously observed correlation between essentiality and the number of interacting partners of a protein. We also find that yeast hubs with multiple interacting motifs evolve slower than the average protein, contrary to the hubs with one or two interacting motifs. The proposed method will help us discover unknown interacting motifs and provide biological insights about protein hubs and their roles in interaction networks. Recent advances in experimental methods have produced a deluge of protein–protein interactions data. However, these methods do not supply information on which specific protein regions are physically in contact during the interactions. Identifying these regions (interfaces) is fundamental for scientific disciplines that require detailed characterizations of protein interactions. In this work, we present a computational method that identifies groups of proteins with similar interfaces. This is achieved by relying on the observation that proteins with common interaction partners tend to interact through similar interfaces. The proposed method retrieves protein interactions from public data repositories and groups proteins that share a sensible number of interacting partners. Proteins within the same group are then labeled with the same “interacting motif” identifier (iMotif). The evaluation performed using known protein domains and structural binding sites suggests that the method is better suited for proteins with multiple interacting partners (hubs). Using yeast data, we show that the cellular essentiality of a gene better correlates with the number of interacting motifs than with the absolute number of interactions.
Collapse
Affiliation(s)
- Ramon Aragues
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Andrej Sali
- Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biomedical Research, University of California San Francisco, San Francisco, California, United States of America
| | - Jaume Bonet
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Marc A Marti-Renom
- Structural Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencia, Spain
- * To whom correspondence should be addressed. E-mail: (MAMR); (BO)
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
- * To whom correspondence should be addressed. E-mail: (MAMR); (BO)
| |
Collapse
|
309
|
False positive reduction in protein-protein interaction predictions using gene ontology annotations. BMC Bioinformatics 2007; 8:262. [PMID: 17645798 PMCID: PMC1941744 DOI: 10.1186/1471-2105-8-262] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Accepted: 07/23/2007] [Indexed: 11/27/2022] Open
Abstract
Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO) annotations were used to reduce false positive protein-protein interactions (PPI) pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets) in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially remove false predicted PPI pairs. Removal of false positives from predicted datasets increases the true positive fractions of the datasets and improves the robustness of predicted pairs as compared to random protein pairing, and eventually results in better overlap with experimental results.
Collapse
|
310
|
Kiemer L, Costa S, Ueffing M, Cesareni G. WI-PHI: a weighted yeast interactome enriched for direct physical interactions. Proteomics 2007; 7:932-43. [PMID: 17285561 DOI: 10.1002/pmic.200600448] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
How is the yeast proteome wired? This important question, central in yeast systems biology, remains unanswered in spite of the abundance of protein interaction data from high-throughput experiments. Unfortunately, these large-scale studies show striking discrepancies in their results and coverage such that biologists scrutinizing the "interactome" are often confounded by a mix of established physical interactions, functional associations, and experimental artifacts. This stimulated early attempts to integrate the available information and produce a list of protein interactions ranked according to an estimated functional reliability. The recent publication of the results of two large protein interaction experiments and the completion of a comprehensive literature curation effort has more than doubled the available information on the wiring of the yeast proteome. This motivates a fresh approach to the compilation of a yeast interactome based purely on evidence of physical interaction. We present a procedure exploiting both heuristic and probabilistic strategies to draft the yeast interactome taking advantage of various heterogeneous data sources: application of tandem affinity purification coupled to MS (TAP-MS), large-scale yeast two-hybrid studies, and results of small-scale experiments stored in dedicated databases. The end result is WI-PHI, a weighted network encompassing a large majority of yeast proteins.
Collapse
Affiliation(s)
- Lars Kiemer
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, Rome, Italy
| | | | | | | |
Collapse
|
311
|
Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinformatics 2007; 8:239. [PMID: 17615067 PMCID: PMC1939716 DOI: 10.1186/1471-2105-8-239] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 07/05/2007] [Indexed: 11/24/2022] Open
Abstract
Background Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. Results The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. Conclusion The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.
Collapse
|
312
|
Myers CL, Troyanskaya OG. Context-sensitive data integration and prediction of biological networks. ACTA ACUST UNITED AC 2007; 23:2322-30. [PMID: 17599939 DOI: 10.1093/bioinformatics/btm332] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context. RESULTS We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios. AVAILABILITY A software implementation of our approach is available on request from the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at http://avis.princeton.edu/contextPIXIE/
Collapse
Affiliation(s)
- Chad L Myers
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
| | | |
Collapse
|
313
|
Abstract
Network analysis of living systems is an essential component of contemporary systems biology. It is targeted at assemblance of mutual dependences between interacting systems elements into an integrated view of whole-system functioning. In the following chapter we describe the existing classification of what is referred to as biological networks and show how complex interdependencies in biological systems can be represented in a simpler form of network graphs. Further structural analysis of the assembled biological network allows getting knowledge on the functioning of the entire biological system. Such aspects of network structure as connectivity of network elements and connectivity degree distribution, degree of node centralities, clustering coefficient, network diameter and average path length are touched. Networks are analyzed as static entities, or the dynamical behavior of underlying biological systems may be considered. The description of mathematical and computational approaches for determining the dynamics of regulatory networks is provided. Causality as another characteristic feature of a dynamically functioning biosystem can be also accessed in the reconstruction of biological networks; we give the examples of how this integration is accomplished. Further questions about network dynamics and evolution can be approached by means of network comparison. Network analysis gives rise to new global hypotheses on systems functionality and reductionist findings of novel molecular interactions, based on the reliability of network reconstructions, which has to be tested in the subsequent experiments. We provide a collection of useful links to be used for the analysis of biological networks.
Collapse
Affiliation(s)
- Victoria J Nikiforova
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | | |
Collapse
|
314
|
Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol 2007; 3:e42. [PMID: 17397251 PMCID: PMC1847991 DOI: 10.1371/journal.pcbi.0030042] [Citation(s) in RCA: 245] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
315
|
Kann MG, Jothi R, Cherukuri PF, Przytycka TM. Predicting protein domain interactions from coevolution of conserved regions. Proteins 2007; 67:811-20. [PMID: 17357158 DOI: 10.1002/prot.21347] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The knowledge of protein and domain interactions provide crucial insights into their function within a cell. Several computational methods have been proposed to detect interactions between proteins and their constitutive domains. In this work, we focus on approaches based on correlated evolution (coevolution) of sequences of interacting proteins. In this type of approach, often referred to as the mirrortree method, a high correlation of evolutionary histories of two proteins is used as an indicator to predict protein interactions. Recently, it has been observed that subtracting the underlying speciation process by separating coevolution due to common speciation divergence from that due to common function of interacting pairs greatly improves the predictive power of the mirrortree approach. In this article, we investigate possible improvements and limitations of this method. In particular, we demonstrate that the performance of the mirrortree method that can be further improved by restricting the coevolution analysis to the relatively conserved regions in the protein domain sequences (disregarding highly divergent regions). We provide a theoretical validation of our results leading to new insights into the interplay between coevolution and speciation of interacting proteins.
Collapse
Affiliation(s)
- Maricel G Kann
- Department of Health and Human Services, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | | | | | |
Collapse
|
316
|
Gardiner J, Barton D, Marc J, Overall R. Potential Role of Tubulin Acetylation and Microtubule-Based Protein Trafficking in Familial Dysautonomia. Traffic 2007; 8:1145-9. [PMID: 17605759 DOI: 10.1111/j.1600-0854.2007.00605.x] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Familial dysautonomia (FD), a disease of the autonomic and sensory nervous systems, involves mutations in the protein IkappaB kinase complex-associated protein, which is a component of the human Elongator acetylase complex. We suggest a hypothesis in which defects in tubulin acetylation and impairment of microtubule-based protein trafficking may be an underlying cause of FD. In addition, an Arabidopsis homolog of the Elongator subunit ELP3 has been found to bind to the alphabeta-tubulin heterodimer, suggesting that alpha-tubulin may be a cytoplasmic target of Elongator acetylase activity. Studies of synergistic double mutants in yeast indicate a novel role for Elongator in cytoskeletal dynamics, although this is probably because of an effect on actin rather than microtubules. Finally, we suggest that tubulin deacetylase inhibitors may prove useful in the treatment of FD.
Collapse
Affiliation(s)
- John Gardiner
- School of Biological Sciences, Macleay Building (A12), Science Road, The University of Sydney, Camperdown 2006, Australia.
| | | | | | | |
Collapse
|
317
|
Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics 2007; 8:141. [PMID: 17470296 PMCID: PMC1890302 DOI: 10.1186/1471-2105-8-141] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 04/30/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.
Collapse
|
318
|
Hollunder J, Beyer A, Wilhelm T. Protein subcomplexes--molecular machines with highly specialized functions. IEEE Trans Nanobioscience 2007; 6:86-93. [PMID: 17393854 DOI: 10.1109/tnb.2007.891884] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Complex cellular processes are accomplished by the concerted action of hierarchically organized functional modules. Protein complexes are major components which act as highly specialized molecular machines. Here we present a statistical procedure to find insightful substructures in protein complexes based on large-scale protein complex data: we identify statistically significant common protein subcomplexes (SCs) contained in different protein complexes. We analyze recently published data of the two model organisms Saccharomyces cerevisiae (four different data sets) and Escherichia coli, as well as human protein complex data. Our method identifies well-characterized protein assemblies with known functions which act as own functional entities in the cell. In addition, we also identified hitherto unknown functional entities that should be studied experimentally in future. We discuss two typical properties of protein subcomplexes: 1) subcomplexes are enriched with essential proteins (which implies that the whole SCs may be strongly conserved) and 2) SCs are functionally and spatially more homogeneous than the experimentally found protein assemblies. The latter property is exploited to propose functions for so far unknown proteins of S. cerevisiae.
Collapse
Affiliation(s)
- Jens Hollunder
- Leibniz Institute for Age Research-Fritz Lipmann Institute, Theoretical Systems Biology, D-07745 Jena, Germany.
| | | | | |
Collapse
|
319
|
Herbert A, Lenburg ME, Ulrich D, Gerry NP, Schlauch K, Christman MF. Open-access database of candidate associations from a genome-wide SNP scan of the Framingham Heart Study. Nat Genet 2007; 39:135-6. [PMID: 17262019 DOI: 10.1038/ng0207-135] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
320
|
Abstract
How can protein-interaction networks can be made more complete? We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.
Collapse
Affiliation(s)
- G Traver Hart
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712, USA
| | - Arun K Ramani
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712, USA
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712, USA
| |
Collapse
|
321
|
Kihara D, Yang YD, Hawkins T. Bioinformatics resources for cancer research with an emphasis on gene function and structure prediction tools. Cancer Inform 2007; 2:25-35. [PMID: 19458756 PMCID: PMC2675499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The immensely popular fields of cancer research and bioinformatics overlap in many different areas, e.g. large data repositories that allow for users to analyze data from many experiments (data handling, databases), pattern mining, microarray data analysis, and interpretation of proteomics data. There are many newly available resources in these areas that may be unfamiliar to most cancer researchers wanting to incorporate bioinformatics tools and analyses into their work, and also to bioinformaticians looking for real data to develop and test algorithms. This review reveals the interdependence of cancer research and bioinformatics, and highlight the most appropriate and useful resources available to cancer researchers. These include not only public databases, but general and specific bioinformatics tools which can be useful to the cancer researcher. The primary foci are function and structure prediction tools of protein genes. The result is a useful reference to cancer researchers and bioinformaticians studying cancer alike.
Collapse
Affiliation(s)
- Daisuke Kihara
- Department of Biological Sciences;,Department of Computer Science;,Markey Center for Structural Biology;,The Bindley Bioscience Center, College of Science, Purdue University, West Lafayette, IN, 47907, USA,Correspondence: Daisuke Kihara: e-mail
| | | | | |
Collapse
|
322
|
Lee BTK, Song CM, Yeo BH, Chung CW, Chan YL, Lim TT, Chua YB, Loh MCS, Ang BK, Vijayakumar P, Liew L, Lim J, Lim YP, Wong CH, Chuon D, Rajagopal G, Hill J. Gastric Cancer (Biomarkers) Knowledgebase (GCBKB): A Curated and Fully Integrated Knowledgebase of Putative Biomarkers Related to Gastric Cancer. Biomark Insights 2007; 1:135-41. [PMID: 19690644 PMCID: PMC2716787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Gastric Cancer (Biomarkers) Knowledgebase (GCBKB) (http://biomarkers.bii.a-star.edu.sg/background/gastricCancerBiomarkersKb.php) is a curated and fully integrated knowledgebase that provides data relating to putative biomarkers that may be used in the diagnosis and prognosis of gastric cancer. It is freely available to all users. The data contained in the knowledgebase was derived from a large literature source and the putative biomarkers therein have been annotated with data from the public domain. The knowledgebase is maintained by a curation team who update the data from a defined source. As well as mining data from the literature, the knowledgebase will also be populated with unpublished experimental data from investigators working in the gastric cancer biomarker discovery field. Users can perform searches to identify potential markers defined by experiment type, tissue type and disease state. Search results may be saved, manipulated and retrieved at a later date. As far as the authors are aware this is the first open access database dedicated to the discovery and investigation of gastric cancer biomarkers.
Collapse
|
323
|
Gaulton KJ, Mohlke KL, Vision TJ. A computational system to select candidate genes for complex human traits. Bioinformatics 2007; 23:1132-40. [PMID: 17237041 DOI: 10.1093/bioinformatics/btm001] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Identification of the genetic variation underlying complex traits is challenging. The wealth of information publicly available about the biology of complex traits and the function of individual genes permits the development of informatics-assisted methods for the selection of candidate genes for these traits. RESULTS We have developed a computational system named CAESAR that ranks all annotated human genes as candidates for a complex trait by using ontologies to semantically map natural language descriptions of the trait with a variety of gene-centric information sources. In a test of its effectiveness, CAESAR successfully selected 7 out of 18 (39%) complex human trait susceptibility genes within the top 2% of ranked candidates genome-wide, a subset that represents roughly 1% of genes in the human genome and provides sufficient enrichment for an association study of several hundred human genes. This approach can be applied to any well-documented mono- or multi-factorial trait in any organism for which an annotated gene set exists. AVAILABILITY CAESAR scripts and test data can be downloaded from http://visionlab.bio.unc.edu/caesar/
Collapse
Affiliation(s)
- Kyle J Gaulton
- Curriculum in Genetics and Molecular Biologly, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.
| | | | | |
Collapse
|
324
|
Gerke M, Bornberg-Bauer E, Jiang X, Fuellen G. Finding common protein interaction patterns across organisms. Evol Bioinform Online 2007; 2:45-52. [PMID: 19455201 PMCID: PMC2674656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Protein interactions are an important resource to obtain an understanding of cell function. Recently, researchers have compared networks of interactions in order to understand network evolution. While current methods first infer homologs and then compare topologies, we here present a method which first searches for interesting topologies and then looks for homologs. PINA (protein interaction network analysis) takes the protein interaction networks of two organisms, scans both networks for subnetworks deemed interesting, and then tries to find orthologs among the interesting subnetworks. The application is very fast because orthology investigations are restricted to subnetworks like hubs and clusters that fulfill certain criteria regarding neighborhood and connectivity. Finally, the hubs or clusters found to be related can be visualized and analyzed according to protein annotation.
Collapse
Affiliation(s)
- Mirco Gerke
- Division of Bioinformatics, Biology Department, Schlossplatz 4, D-48149 Münster, Germany;, Institut für Informatik, Fachbereich Mathematik und Informatik, Einsteinstr. 62, D- 48149 Münster, Germany
| | - Erich Bornberg-Bauer
- Division of Bioinformatics, Biology Department, Schlossplatz 4, D-48149 Münster, Germany
| | - Xiaoyi Jiang
- Institut für Informatik, Fachbereich Mathematik und Informatik, Einsteinstr. 62, D- 48149 Münster, Germany
| | - Georg Fuellen
- Division of Bioinformatics, Biology Department, Schlossplatz 4, D-48149 Münster, Germany;, Department of Medicine, AG Bioinformatics, Domagkstr. 3, D-48149 Münster, Germany,Correspondence: Georg Fuellen, Tel: +49 251 83 21637, Fax +49 251 83 21631,
| |
Collapse
|
325
|
Martin S, Brown WM, Faulon JL. Using product kernels to predict protein interactions. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2007; 110:215-45. [PMID: 17922100 DOI: 10.1007/10_2007_084] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
There is a wide variety of experimental methods for the identification of protein interactions. This variety has in turn spurred the development of numerous different computational approaches for modeling and predicting protein interactions. These methods range from detailed structure-based methods capable of operating on only a single pair of proteins at a time to approximate statistical methods capable of making predictions on multiple proteomes simultaneously. In this chapter, we provide a brief discussion of the relative merits of different experimental and computational methods available for identifying protein interactions. Then we focus on the application of our particular (computational) method using Support Vector Machine product kernels. We describe our method in detail and discuss the application of the method for predicting protein-protein interactions, beta-strand interactions, and protein-chemical interactions.
Collapse
Affiliation(s)
- Shawn Martin
- Computational Biology, Sandia National Laboratories, PO Box 5800, 87185-1316, Albuquerque, NM 87185-1316, USA.
| | | | | |
Collapse
|
326
|
Beltrao P, Serrano L. Specificity and evolvability in eukaryotic protein interaction networks. PLoS Comput Biol 2006; 3:e25. [PMID: 17305419 PMCID: PMC1797819 DOI: 10.1371/journal.pcbi.0030025] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Accepted: 12/27/2006] [Indexed: 12/31/2022] Open
Abstract
Progress in uncovering the protein interaction networks of several species has led to questions of what underlying principles might govern their organization. Few studies have tried to determine the impact of protein interaction network evolution on the observed physiological differences between species. Using comparative genomics and structural information, we show here that eukaryotic species have rewired their interactomes at a fast rate of approximately 10(-5) interactions changed per protein pair, per million years of divergence. For Homo sapiens this corresponds to 10(3) interactions changed per million years. Additionally we find that the specificity of binding strongly determines the interaction turnover and that different biological processes show significantly different link dynamics. In particular, human proteins involved in immune response, transport, and establishment of localization show signs of positive selection for change of interactions. Our analysis suggests that a small degree of molecular divergence can give rise to important changes at the network level. We propose that the power law distribution observed in protein interaction networks could be partly explained by the cell's requirement for different degrees of protein binding specificity.
Collapse
Affiliation(s)
- Pedro Beltrao
- European Molecular Biology Laboratory, Structures and Computational Biology Program, Heidelberg, Germany.
| | | |
Collapse
|
327
|
Wu X, Zhu L, Guo J, Fu C, Zhou H, Dong D, Li Z, Zhang DY, Lin K. SPIDer: Saccharomyces protein-protein interaction database. BMC Bioinformatics 2006; 7 Suppl 5:S16. [PMID: 17254300 PMCID: PMC1764472 DOI: 10.1186/1471-2105-7-s5-s16] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Since proteins perform their functions by interacting with one another and with other biomolecules, reconstructing a map of the protein-protein interactions of a cell, experimentally or computationally, is an important first step toward understanding cellular function and machinery of a proteome. Solely derived from the Gene Ontology (GO), we have defined an effective method of reconstructing a yeast protein interaction network by measuring relative specificity similarity (RSS) between two GO terms. Description Based on the RSS method, here, we introduce a predicted Saccharomyces protein-protein interaction database called SPIDer. It houses a gold standard positive dataset (GSP) with high confidence level that covered 79.2% of the high-quality interaction dataset. Our predicted protein-protein interaction network reconstructed from the GSPs consists of 92 257 interactions among 3600 proteins, and forms 23 connected components. It also provides general links to connect predicted protein-protein interactions with three other databases, DIP, BIND and MIPS. An Internet-based interface provides users with fast and convenient access to protein-protein interactions based on various search features (searching by protein information, GO term information or sequence similarity). In addition, the RSS value of two GO terms in the same ontology, and the inter-member interactions in a list of proteins of interest or in a protein complex could be retrieved. Furthermore, the database presents a user-friendly graphical interface which is created dynamically for visualizing an interaction sub-network. The database is accessible at . Conclusion SPIDer is a public database server for protein-protein interactions based on the yeast genome. It provides a variety of search options and graphical visualization of an interaction network. In particular, it will be very useful for the study of inter-member interactions among a list of proteins, especially the protein complex. In addition, based on the predicted interaction dataset, researchers could analyze the whole interaction network and associate the network topology with gene/protein properties based on a global or local topology view.
Collapse
Affiliation(s)
- Xiaomei Wu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Lei Zhu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Jie Guo
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Cong Fu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Hongjun Zhou
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Dong Dong
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Zhenbo Li
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Da-Yong Zhang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
328
|
Mathivanan S, Periaswamy B, Gandhi TKB, Kandasamy K, Suresh S, Mohmood R, Ramachandra YL, Pandey A. An evaluation of human protein-protein interaction data in the public domain. BMC Bioinformatics 2006; 7 Suppl 5:S19. [PMID: 17254303 PMCID: PMC1764475 DOI: 10.1186/1471-2105-7-s5-s19] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background Protein-protein interaction (PPI) databases have become a major resource for investigating biological networks and pathways in cells. A number of publicly available repositories for human PPIs are currently available. Each of these databases has their own unique features with a large variation in the type and depth of their annotations. Results We analyzed the major publicly available primary databases that contain literature curated PPI information for human proteins. This included BIND, DIP, HPRD, IntAct, MINT, MIPS, PDZBase and Reactome databases. The number of binary non-redundant human PPIs ranged from 101 in PDZBase and 346 in MIPS to 11,367 in MINT and 36,617 in HPRD. The number of genes annotated with at least one interactor was 9,427 in HPRD, 4,975 in MINT, 4,614 in IntAct, 3,887 in BIND and <1,000 in the remaining databases. The number of literature citations for the PPIs included in the databases was 43,634 in HPRD, 11,480 in MINT, 10,331 in IntAct, 8,020 in BIND and <2,100 in the remaining databases. Conclusion Given the importance of PPIs, we suggest that submission of PPIs to repositories be made mandatory by scientific journals at the time of manuscript submission as this will minimize annotation errors, promote standardization and help keep the information up to date. We hope that our analysis will help guide biomedical scientists in selecting the most appropriate database for their needs especially in light of the dramatic differences in their content.
Collapse
Affiliation(s)
- Suresh Mathivanan
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- McKusick-Nathans Institute of Genetic Medicine and the Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India
| | - Balamurugan Periaswamy
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- McKusick-Nathans Institute of Genetic Medicine and the Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India
| | - TKB Gandhi
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- McKusick-Nathans Institute of Genetic Medicine and the Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Kumaran Kandasamy
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India
| | - Shubha Suresh
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- McKusick-Nathans Institute of Genetic Medicine and the Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Riaz Mohmood
- Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India
| | - YL Ramachandra
- Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine and the Departments of Biological Chemistry, Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
329
|
Hulbert EM, Smink LJ, Adlem EC, Allen JE, Burdick DB, Burren OS, Cassen VM, Cavnor CC, Dolman GE, Flamez D, Friery KF, Healy BC, Killcoyne SA, Kutlu B, Schuilenburg H, Walker NM, Mychaleckyj J, Eizirik DL, Wicker LS, Todd JA, Goodman N. T1DBase: integration and presentation of complex data for type 1 diabetes research. Nucleic Acids Res 2006; 35:D742-6. [PMID: 17169983 PMCID: PMC1781218 DOI: 10.1093/nar/gkl933] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
T1DBase () [Smink et al. (2005) Nucleic Acids Res., 33, D544–D549; Burren et al. (2004) Hum. Genomics, 1, 98–109] is a public website and database that supports the type 1 diabetes (T1D) research community. T1DBase provides a consolidated T1D-oriented view of the complex data world that now confronts medical researchers and enables scientists to navigate from information they know to information that is new to them. Overview pages for genes and markers summarize information for these elements. The Gene Dossier summarizes information for a list of genes. GBrowse [Stein et al. (2002) Genome Res., 10, 1599–1610] displays genes and other features in their genomic context, and Cytoscape [Shannon et al. (2003) Genome Res., 13, 2498–2504] shows genes in the context of interacting proteins and genes. The Beta Cell Gene Atlas shows gene expression in β cells, islets, and related cell types and lines, and the Tissue Expression Viewer shows expression across other tissues. The Microarray Viewer shows expression from more than 20 array experiments. The Beta Cell Gene Expression Bank contains manually curated gene and pathway annotations for genes expressed in β cells. T1DMart is a query tool for markers and genotypes. PosterPages are ‘home pages’ about specific topics or datasets. The key challenge, now and in the future, is to provide powerful informatics capabilities to T1D scientists in a form they can use to enhance their research.
Collapse
|
330
|
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. IntAct--open source resource for molecular interaction data. Nucleic Acids Res 2006; 35:D561-5. [PMID: 17145710 PMCID: PMC1751531 DOI: 10.1093/nar/gkl958] [Citation(s) in RCA: 557] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from .
Collapse
Affiliation(s)
- S Kerrien
- EMBL Outstation-European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
331
|
Abstract
Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical research. Using a user-friendly web interface, genes can be searched by name, description, position, SNP ID or clone name. Several public databases are integrated, including gene information from Ensembl, protein features from Uniprot/SWISS-PROT, Pfam and DAS-CBS. Gene relationships are fetched from BIND, MINT, KEGG and are integrated with ortholog data from TreeFam to extend the current interaction networks. Integrated tools for primer-design and mis-splicing analysis have been developed to facilitate experimental analysis of individual genes with focus on their variation. Snap is available at and at .
Collapse
Affiliation(s)
- Shengting Li
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C. Denmark
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
| | - Lijia Ma
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
- Graduate University of the Chinese Academy of Sciences, Yuquan Road 19ABeijing 100049, China
| | - Heng Li
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C. Denmark
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
| | - Søren Vang
- Research Unit for Molecular Medicine, Aarhus University Hospital and Faculty of Health SciencesDK-8200 Aarhus N, Denmark
| | - Yafeng Hu
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
| | - Lars Bolund
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C. Denmark
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
| | - Jun Wang
- The Institute of Human Genetics, University of AarhusDK-8000 Aarhus C. Denmark
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS)Beijing Airport Industrial Zone B-6, Beijing 101300, China
- College of Life Sciences, Peking UniversityBeijing 100871, China
- To whom correspondence should be addressed. Tel: +86 10 80481552; Fax: +86 10 80498676;
| |
Collapse
|
332
|
Ceol A, Chatr-aryamontri A, Santonico E, Sacco R, Castagnoli L, Cesareni G. DOMINO: a database of domain-peptide interactions. Nucleic Acids Res 2006; 35:D557-60. [PMID: 17135199 PMCID: PMC1751533 DOI: 10.1093/nar/gkl961] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many protein interactions are mediated by small protein modules binding to short linear peptides. DOMINO () is an open-access database comprising more than 3900 annotated experiments describing interactions mediated by protein-interaction domains. DOMINO can be searched with a versatile search tool and the interaction networks can be visualized with a convenient graphic display applet that explicitly identifies the domains/sites involved in the interactions.
Collapse
Affiliation(s)
| | | | | | | | | | - Gianni Cesareni
- To whom correspondence should be addressed. Tel: +39 0672594315; Fax: +39 062023500;
| |
Collapse
|
333
|
Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G. MINT: the Molecular INTeraction database. Nucleic Acids Res 2006; 35:D572-4. [PMID: 17135203 PMCID: PMC1751541 DOI: 10.1093/nar/gkl950] [Citation(s) in RCA: 612] [Impact Index Per Article: 32.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The Molecular INTeraction database (MINT, http://mint.bio.uniroma2.it/mint/) aims at storing, in a structured format, information about molecular interactions (MIs) by extracting experimental details from work published in peer-reviewed journals. At present the MINT team focuses the curation work on physical interactions between proteins. Genetic or computationally inferred interactions are not included in the database. Over the past four years MINT has undergone extensive revision. The new version of MINT is based on a completely remodeled database structure, which offers more efficient data exploration and analysis, and is characterized by entries with a richer annotation. Over the past few years the number of curated physical interactions has soared to over 95 000. The whole dataset can be freely accessed online in both interactive and batch modes through web-based interfaces and an FTP server. MINT now includes, as an integrated addition, HomoMINT, a database of interactions between human proteins inferred from experiments with ortholog proteins in model organisms (http://mint.bio.uniroma2.it/mint/).
Collapse
Affiliation(s)
| | | | | | | | | | | | - Gianni Cesareni
- To whom correspondence should be addressed. Tel: +39 067 2594315; Fax: +39 062 023500;
| |
Collapse
|
334
|
Beassoni PR, Otero LH, Massimelli MJ, Lisa AT, Domenech CE. Critical active-site residues identified by site-directed mutagenesis in Pseudomonas aeruginosa phosphorylcholine phosphatase, a new member of the haloacid dehalogenases hydrolase superfamily. Curr Microbiol 2006; 53:534-9. [PMID: 17106798 DOI: 10.1007/s00284-006-0365-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2006] [Accepted: 08/17/2006] [Indexed: 11/27/2022]
Abstract
Pseudomonas aeruginosa phosphorylcholine phosphatase (PChP), the product of the PA5292 gene, is synthesized when the bacteria are grown with choline, betaine, dimethylglycine, or carnitine. In the presence of Mg(2+), PChP catalyzes the hydrolysis of both phosphorylcholine (PCh) and p-nitrophenylphosphate (p-NPP). PCh saturation curve analysis of the enzyme with or without the signal peptide indicated that the peptide was the fundamental factor responsible for decreasing the affinity of the second site of PChP for PCh, either at pH 5.0 or pH 7.4. PChP contained three conserved motifs characteristic of the haloacid dehalogenases superfamily. In the PChP without the signal peptide, motifs I, II, and III correspond to the residues (31)DMDNT(35), (166)SAA(168), and K(242)/(261)GDTPDSD(267), respectively. To determine the catalytic importance of the D31, D33, T35, S166, K242, D262, D265, and D267 on the enzyme activity, site-directed mutagenesis was performed. D31, D33, D262, and D267 were identified as the more important residues for catalysis. D265 and D267 may be involved in the stabilization of motif III, or might contribute to substrate specificity. The substitution of T35 by S35 resulted in an enzyme with a low PChP activity, but conserves the catalytic sites involved in the hydrolysis of PCh (K(m1) 0.03 mM: , K(m2) 0.5 mM: ) or p-NPP (K(m) 2.1 mM: ). Mutating either S166 or K242 revealed that these residues are also important to catalyze the hydrolysis of both substrates. The substitution of lysine by arginine or by glutamine revealed the importance of the positive charged group, either from the amino or guanidinium groups, because K242Q was inactive, whereas K242R was a functional enzyme.
Collapse
Affiliation(s)
- Paola R Beassoni
- Biologia Molecular, Universidad Nacional de Rio Cuarto, km 601, Ruta 36, Rio Cuarto, 5800, Argentina
| | | | | | | | | |
Collapse
|
335
|
von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Krüger B, Snel B, Bork P. STRING 7--recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 2006; 35:D358-62. [PMID: 17098935 PMCID: PMC1669762 DOI: 10.1093/nar/gkl825] [Citation(s) in RCA: 483] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Information on protein–protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques. The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources. The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms. Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files. As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 1.5 million proteins for which associations have been pre-computed. Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation. STRING is available at
Collapse
Affiliation(s)
- Christian von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
336
|
Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M. Resources for integrative systems biology: from data through databases to networks and dynamic system models. Brief Bioinform 2006; 7:318-30. [PMID: 17040977 DOI: 10.1093/bib/bbl036] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In systems biology, biologically relevant quantitative modelling of physiological processes requires the integration of experimental data from diverse sources. Recent developments in high-throughput methodologies enable the analysis of the transcriptome, proteome, interactome, metabolome and phenome on a previously unprecedented scale, thus contributing to the deluge of experimental data held in numerous public databases. In this review, we describe some of the databases and simulation tools that are relevant to systems biology and discuss a number of key issues affecting data integration and the challenges these pose to systems-level research.
Collapse
Affiliation(s)
- Aylwin Ng
- Bioinformatics and Systems Biology Group, Ludwig Institute for Cancer Research, University College London Branch, 91 Riding House Street, London W1W 7BS, UK
| | | | | | | | | |
Collapse
|
337
|
Yip KY, Yu H, Kim PM, Schultz M, Gerstein M. The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics 2006; 22:2968-70. [PMID: 17021160 DOI: 10.1093/bioinformatics/btl488] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Biological processes involve complex networks of interactions between molecules. Various large-scale experiments and curation efforts have led to preliminary versions of complete cellular networks for a number of organisms. To grapple with these networks, we developed TopNet-like Yale Network Analyzer (tYNA), a Web system for managing, comparing and mining multiple networks, both directed and undirected. tYNA efficiently implements methods that have proven useful in network analysis, including identifying defective cliques, finding small network motifs (such as feed-forward loops), calculating global statistics (such as the clustering coefficient and eccentricity), and identifying hubs and bottlenecks. It also allows one to manage a large number of private and public networks using a flexible tagging system, to filter them based on a variety of criteria, and to visualize them through an interactive graphical interface. A number of commonly used biological datasets have been pre-loaded into tYNA, standardized and grouped into different categories. AVAILABILITY The tYNA system can be accessed at http://networks.gersteinlab.org/tyna. The source code, JavaDoc API and WSDL can also be downloaded from the website. tYNA can also be accessed from the Cytoscape software using a plugin.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Computer Science, Yale University, New Haven, CT 06511, USA
| | | | | | | | | |
Collapse
|
338
|
Biron DG, Brun C, Lefevre T, Lebarbenchon C, Loxdale HD, Chevenet F, Brizard JP, Thomas F. The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics 2006; 6:5577-96. [PMID: 16991202 DOI: 10.1002/pmic.200600223] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The elucidation of the entire genomic sequence of various organisms, from viruses to complex metazoans, most recently man, is undoubtedly the greatest triumph of molecular biology since the discovery of the DNA double helix. Over the past two decades, the focus of molecular biology has gradually moved from genomes to proteomes, the intention being to discover the functions of the genes themselves. The postgenomic era stimulated the development of new techniques (e.g. 2-DE and MS) and bioinformatics tools to identify the functions, reactions, interactions and location of the gene products in tissues and/or cells of living organisms. Both 2-DE and MS have been very successfully employed to identify proteins involved in biological phenomena (e.g. immunity, cancer, host-parasite interactions, etc.), although recently, several papers have emphasised the pitfalls of 2-DE experiments, especially in relation to experimental design, poor statistical treatment and the high rate of 'false positive' results with regard to protein identification. In the light of these perceived problems, we review the advantages and misuses of bioinformatics tools - from realisation of 2-DE gels to the identification of candidate protein spots - and suggest some useful avenues to improve the quality of 2-DE experiments. In addition, we present key steps which, in our view, need to be to taken into consideration during such analyses. Lastly, we present novel biological entities named 'interactomes', and the bioinformatics tools developed to analyse the large protein-protein interaction networks they form, along with several new perspectives of the field.
Collapse
Affiliation(s)
- David G Biron
- GEMI, UMR CNRS/IRD 2724, Centre IRD, Montpellier, France.
| | | | | | | | | | | | | | | |
Collapse
|
339
|
Bashton M, Nobeli I, Thornton JM. Cognate ligand domain mapping for enzymes. J Mol Biol 2006; 364:836-52. [PMID: 17034815 DOI: 10.1016/j.jmb.2006.09.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2006] [Revised: 09/12/2006] [Accepted: 09/15/2006] [Indexed: 11/21/2022]
Abstract
Here, we present an automatic assignment of potential cognate ligands to domains of enzymes in the CATH and SCOP protein domain classifications on the basis of structural data available in the wwPDB. This procedure involves two steps; firstly, we assign the binding of particular ligands to particular domains; secondly, we compare the chemical similarity of the PDB ligands to ligands in KEGG in order to assign cognate ligands. We find that use of the Enzyme Commission (EC) numbers is necessary to enable efficient and accurate cognate ligand assignment. The PROCOGNATE database currently has cognate ligand mapping for 3277 (4118) protein structures and 351 (302) superfamilies, as described by the CATH and (SCOP) databases, respectively. We find that just under half of all ligands are only and always bound by a single domain, with 16% bound by more than one domain and the remainder of the ligands showing a variety of binding modes. This finding has implications for domain recombination and the evolution of new protein functions. Domain architecture or context is also found to affect substrate specificity of particular domains, and we discuss example cases. The most popular PDB ligands are all found to be generic components of crystallisation buffers, highlighting the non-cognate ligand problem inherent in the PDB. In contrast, the most popular cognate ligands are all found to be universal cellular currencies of reducing power and energy such as NADH, FADH2 and ATP, respectively, reflecting the fact that the vast majority of enzymatic reactions utilise one of these popular co-factors. These ligands all share a common adenine ribonucleotide moiety, suggesting that many different domain superfamilies have converged to bind this chemical framework.
Collapse
Affiliation(s)
- Matthew Bashton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | |
Collapse
|
340
|
Abstract
Agile Protein Interaction DataAnalyzer (APID) is an interactive bioinformatics web tool developed to integrate and analyze in a unified and comparative platform main currently known information about protein–protein interactions demonstrated by specific small-scale or large-scale experimental methods. At present, the application includes information coming from five main source databases enclosing an unified sever to explore >35 000 different proteins and 111 000 different proven interactions. The web includes search tools to query and browse upon the data, allowing selection of the interaction pairs based in calculated parameters that weight and qualify the reliability of each given protein interaction. Such parameters are for the ‘proteins’: connectivity, cluster coefficient, Gene Ontology (GO) functional environment, GO environment enrichment; and for the ‘interactions’: number of methods, GO overlapping, iPfam domain–domain interaction. APID also includes a graphic interactive tool to visualize selected sub-networks and to navigate on them or along the whole interaction network. The application is available open access at .
Collapse
Affiliation(s)
| | - Javier De Las Rivas
- To whom correspondence should be addressed. Tel.: +34 923 294819; Fax: +34 923 294743;
| |
Collapse
|
341
|
Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG. Finding function: evaluation methods for functional genomic data. BMC Genomics 2006; 7:187. [PMID: 16869964 PMCID: PMC1560386 DOI: 10.1186/1471-2164-7-187] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2006] [Accepted: 07/25/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate evaluation of the quality of genomic or proteomic data and computational methods is vital to our ability to use them for formulating novel biological hypotheses and directing further experiments. There is currently no standard approach to evaluation in functional genomics. Our analysis of existing approaches shows that they are inconsistent and contain substantial functional biases that render the resulting evaluations misleading both quantitatively and qualitatively. These problems make it essentially impossible to compare computational methods or large-scale experimental datasets and also result in conclusions that generalize poorly in most biological applications. RESULTS We reveal issues with current evaluation methods here and suggest new approaches to evaluation that facilitate accurate and representative characterization of genomic methods and data. Specifically, we describe a functional genomics gold standard based on curation by expert biologists and demonstrate its use as an effective means of evaluation of genomic approaches. Our evaluation framework and gold standard are freely available to the community through our website. CONCLUSION Proper methods for evaluating genomic data and computational approaches will determine how much we, as a community, are able to learn from the wealth of available data. We propose one possible solution to this problem here but emphasize that this topic warrants broader community discussion.
Collapse
Affiliation(s)
- Chad L Myers
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ, 08544, USA
| | - Daniel R Barrett
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ, 08544, USA
| | - Matthew A Hibbs
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ, 08544, USA
| | - Curtis Huttenhower
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ, 08544, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton NJ, 08544, USA
| |
Collapse
|
342
|
Vita R, Vaughan K, Zarebski L, Salimi N, Fleri W, Grey H, Sathiamurthy M, Mokili J, Bui HH, Bourne PE, Ponomarenko J, de Castro R, Chan RK, Sidney J, Wilson SS, Stewart S, Way S, Peters B, Sette A. Curation of complex, context-dependent immunological data. BMC Bioinformatics 2006; 7:341. [PMID: 16836764 PMCID: PMC1534061 DOI: 10.1186/1471-2105-7-341] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2006] [Accepted: 07/12/2006] [Indexed: 11/29/2022] Open
Abstract
Background The Immune Epitope Database and Analysis Resource (IEDB) is dedicated to capturing, housing and analyzing complex immune epitope related data . Description To identify and extract relevant data from the scientific literature in an efficient and accurate manner, novel processes were developed for manual and semi-automated annotation. Conclusion Formalized curation strategies enable the processing of a large volume of context-dependent data, which are now available to the scientific community in an accessible and transparent format. The experiences described herein are applicable to other databases housing complex biological data and requiring a high level of curation expertise.
Collapse
Affiliation(s)
- Randi Vita
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Kerrie Vaughan
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Laura Zarebski
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Nima Salimi
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Ward Fleri
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Howard Grey
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Muthu Sathiamurthy
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - John Mokili
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Huynh-Hoa Bui
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Philip E Bourne
- San Diego Supercomputer Center, P.O. Box 85608, San Diego, California, USA
- Department of Pharmacology, University of California, San Diego, 9500 Gilman Drive La Jolla California, USA
| | - Julia Ponomarenko
- San Diego Supercomputer Center, P.O. Box 85608, San Diego, California, USA
| | - Romulo de Castro
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Russell K Chan
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - John Sidney
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Stephen S Wilson
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Scott Stewart
- Science Applications International Corporation, 10260 Campus Point Drive, MS-A2F, San Diego, California, USA
| | - Scott Way
- Science Applications International Corporation, 10260 Campus Point Drive, MS-A2F, San Diego, California, USA
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, 3030 Bunker Hill Street, Suite 326, San Diego, California, USA
| |
Collapse
|
343
|
Hasan S, Daugelat S, Rao PSS, Schreiber M. Prioritizing genomic drug targets in pathogens: application to Mycobacterium tuberculosis. PLoS Comput Biol 2006; 2:e61. [PMID: 16789813 PMCID: PMC1475714 DOI: 10.1371/journal.pcbi.0020061] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2005] [Accepted: 04/21/2006] [Indexed: 11/18/2022] Open
Abstract
We have developed a software program that weights and integrates specific properties on the genes in a pathogen so that they may be ranked as drug targets. We applied this software to produce three prioritized drug target lists for Mycobacterium tuberculosis, the causative agent of tuberculosis, a disease for which a new drug is desperately needed. Each list is based on an individual criterion. The first list prioritizes metabolic drug targets by the uniqueness of their roles in the M. tuberculosis metabolome (“metabolic chokepoints”) and their similarity to known “druggable” protein classes (i.e., classes whose activity has previously been shown to be modulated by binding a small molecule). The second list prioritizes targets that would specifically impair M. tuberculosis, by weighting heavily those that are closely conserved within the Actinobacteria class but lack close homology to the host and gut flora. M. tuberculosis can survive asymptomatically in its host for many years by adapting to a dormant state referred to as “persistence.” The final list aims to prioritize potential targets involved in maintaining persistence in M. tuberculosis. The rankings of current, candidate, and proposed drug targets are highlighted with respect to these lists. Some features were found to be more accurate than others in prioritizing studied targets. It can also be shown that targets can be prioritized by using evolutionary programming to optimize the weights of each desired property. We demonstrate this approach in prioritizing persistence targets. The search for drugs to prevent or treat infections remains an urgent focus in infectious disease research. A new software program has been developed by the authors of this article that can be used to rank genes as potential drug targets in pathogens. Traditional prioritization approaches to drug target identification, such as searching the literature and trying to mentally integrate varied criteria, can quickly become overwhelming for the drug discovery researcher. Alternatively, one can computationally integrate different criteria to create a ranking function that can help to identify targets. The authors demonstrate the applicability of this approach on the genome of Mycobacterium tuberculosis, the organism that causes tuberculosis (TB), a disease for which new drug treatments are especially needed because of emerging drug-resistant strains. The experiences gained from this work will be useful for both wet-lab and informatics scientists working in infectious disease research; first, it demonstrates that ample public data already exist on the M. tuberculosis genome that can be tuned effectively for prioritizing drug targets. Second, the output from numerous freely available bioinformatics tools can be pushed to achieve these goals. Third, the methodology can easily be extended to other pathogens of interest. Currently studied TB targets are also highlighted in terms of the authors' ranking system, which should be useful for researchers focusing on TB drug discovery.
Collapse
Affiliation(s)
- Samiul Hasan
- Novartis Institute for Tropical Diseases (NITD), Chromos, Singapore
| | - Sabine Daugelat
- Novartis Institute for Tropical Diseases (NITD), Chromos, Singapore
| | | | - Mark Schreiber
- Novartis Institute for Tropical Diseases (NITD), Chromos, Singapore
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
344
|
Franke L, Bakel HV, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 2006; 78:1011-25. [PMID: 16685651 PMCID: PMC1474084 DOI: 10.1086/504300] [Citation(s) in RCA: 356] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2005] [Accepted: 03/14/2006] [Indexed: 02/02/2023] Open
Abstract
Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray co-expressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown.
Collapse
Affiliation(s)
- Lude Franke
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Harm van Bakel
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Like Fokkens
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Edwin D. de Jong
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Michael Egmont-Petersen
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | - Cisca Wijmenga
- Complex Genetics Section, Department of Biomedical Genetics–Department of Medical Genetics, University Medical Centre Utrecht, and Large Distributed Databases Group, Institute of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands; and Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
345
|
Abstract
Interaction networks, cartography and mapping, wiring and circuitry, whichever metaphor is invoked, the cardinal questions regarding cellular proteins are the same: What are they? How much is there? What do they do, and to whom do they do it? One of the more recent proteomics tools to pursue these lines of inquiry is the protein microarray, and a current report has unleashed this formidable technique upon a target of considerable biological interest, the PDZ domain family of signaling molecules.
Collapse
Affiliation(s)
- Mark R Spaller
- Department of Chemistry, Wayne State University, Detroit, Michigan 48202, USA.
| |
Collapse
|
346
|
Mika S, Rost B. Protein-protein interactions more conserved within species than across species. PLoS Comput Biol 2006; 2:e79. [PMID: 16854211 PMCID: PMC1513270 DOI: 10.1371/journal.pcbi.0020079] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Indexed: 11/21/2022] Open
Abstract
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions. http://www.rostlab.org/results/2006/ppi_homology/ The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.
Collapse
Affiliation(s)
- Sven Mika
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
347
|
Marsden RL, Ranea JAG, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S, Reeves GA, Dallman TJ, Orengo CA. Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci 2006; 361:425-40. [PMID: 16524831 PMCID: PMC1609337 DOI: 10.1098/rstb.2005.1801] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry, University College London Gower Street, London WC1E 6BT, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
348
|
Uddin RK, Singh SM. cis-Regulatory sequences of the genes involved in apoptosis, cell growth, and proliferation may provide a target for some of the effects of acute ethanol exposure. Brain Res 2006; 1088:31-44. [PMID: 16631145 DOI: 10.1016/j.brainres.2006.02.125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2005] [Revised: 01/31/2006] [Accepted: 02/26/2006] [Indexed: 01/22/2023]
Abstract
The physiological effects of alcohol are known to include drunkenness, toxicity, and addiction leading to alcohol-related health and societal problems. Some of these effects are mediated by regulation of expression of many genes involved in alcohol response pathways. Analysis of the regulatory elements and biological interaction of the genes that show coexpression in response to alcohol may give an insight into how they are regulated. Fifty-two ethanol-responsive (ER) genes displaying differential expression in mouse brain in response to acute ethanol exposure were subjected to bioinformatics analysis to identify known or putative transcription factor binding sites and cis-regulatory modules in the promoter regions that may be involved in their responsiveness to alcohol. Functional interactions of these genes were also examined to assess their cumulative contribution to metabolomic pathways. Clustering and promoter sequence analysis of the ER genes revealed the DNA binding site for nuclear transcription factor Y (NFY) as the most significant. NFY also take part in the proposed biological association network of a number of ER genes, where these genes interact with themselves and other cellular components, and may generate a major cumulative effect on apoptosis, cell survival, and proliferation in response to alcohol. NFY has the potential to play a critical role in mediating the expression of a set of ER genes whose interactions contribute to apoptosis, cell survival, and proliferation, which in turn may affect alcohol-related behaviors.
Collapse
Affiliation(s)
- Raihan K Uddin
- Department of Biology and Division of Medical Genetics, The University of Western Ontario, London, Ontario, Canada N6A 5B7.
| | | |
Collapse
|
349
|
Affiliation(s)
- Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, 80045, USA.
| | | |
Collapse
|
350
|
Mougeot JLC, Bahrani-Mostafavi Z, Vachris JC, McKinney KQ, Gurlov S, Zhang J, Naumann RW, Higgins RV, Hall JB. Gene Expression Profiling of Ovarian Tissues for Determination of Molecular Pathways Reflective of Tumorigenesis. J Mol Biol 2006; 358:310-29. [PMID: 16503337 DOI: 10.1016/j.jmb.2006.01.092] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Revised: 01/25/2006] [Accepted: 01/27/2006] [Indexed: 02/01/2023]
Abstract
Ovarian cancer is the fourth leading cause of gynecological cancer death among women in the United States. Early detection is a critical prerequisite to initiating effective cancer therapy. Gene microarray technology and proteomics have provided much of the biomarkers with potential use for diagnosis. However, more research is needed to fully understand disease onset and progression. To this end, we have performed microarray analysis with the goal of identifying molecular interaction networks defining tumor growth. Microarray analysis was performed on a limited set of ovarian tissues with various pathological diagnoses using Human Genome Focus Array (HGFA) for the detection of approximately 8500 human transcripts. Hierarchical clustering identified groups of ovarian tissues reflective of low malignant potential/early cancer onset and possible pre-cancerous stages involving small molecule, cytokine and/or hormone-dependent feed-back responses specific to the pelvic reproductive system and a priori initiated tumor suppression mechanisms. ANOVA followed by post hoc Scheffe confirmed our hypotheses. Moreover, we established a protein/protein interaction database associated with HGFA probe sets. This database was used to build and visualize molecular networks integrating small but significant changes in gene expression. In conclusion, we were able for the first time to delineate an intersecting genetic pattern linking ovarian tissues reflective of low potential malignancy/early cancer onset stages via long distance signaling between tissues of gynecological origin.
Collapse
Affiliation(s)
- Jean-Luc C Mougeot
- Cannon Research Center, Department of Research Services, Carolinas Medical Center, P.O. Box 32861, Charlotte, NC 28232-2861, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|