201
|
Pattin KA, Moore JH. Role for protein-protein interaction databases in human genetics. Expert Rev Proteomics 2010; 6:647-59. [PMID: 19929610 DOI: 10.1586/epr.09.86] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteomics and the study of protein-protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein-protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein-protein interactions in human genetics and genetic epidemiology. Since protein-protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.
Collapse
Affiliation(s)
- Kristine A Pattin
- Computational Genetics Laboratory and Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA.
| | | |
Collapse
|
202
|
Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Nat Biotechnol 2010; 28:149-56. [PMID: 20118918 DOI: 10.1038/nbt.1603] [Citation(s) in RCA: 252] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Accepted: 12/23/2009] [Indexed: 11/08/2022]
Abstract
We introduce a rational approach for associating genes with plant traits by combined use of a genome-scale functional network and targeted reverse genetic screening. We present a probabilistic network (AraNet) of functional associations among 19,647 (73%) genes of the reference flowering plant Arabidopsis thaliana. AraNet associations are predictive for diverse biological pathways, and outperform predictions derived only from literature-based protein interactions, achieving 21% precision for 55% of genes. AraNet prioritizes genes for limited-scale functional screening, resulting in a hit-rate tenfold greater than screens of random insertional mutants, when applied to early seedling development as a test case. By interrogating network neighborhoods, we identify AT1G80710 (now DROUGHT SENSITIVE 1; DRS1) and AT3G05090 (now LATERAL ROOT STIMULATOR 1; LRS1) as regulators of drought sensitivity and lateral root development, respectively. AraNet (http://www.functionalnet.org/aranet/) provides a resource for plant gene function identification and genetic dissection of plant traits.
Collapse
|
203
|
Khalyfa A, Gharib SA, Kim J, Dayyat E, Snow AB, Bhattacharjee R, Kheirandish-Gozal L, Goldman JL, Gozal D. Transcriptomic analysis identifies phosphatases as novel targets for adenotonsillar hypertrophy of pediatric obstructive sleep apnea. Am J Respir Crit Care Med 2010; 181:1114-20. [PMID: 20093640 DOI: 10.1164/rccm.200909-1398oc] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
RATIONALE Obstructive sleep apnea (OSA) is a highly prevalent disorder in children, in which enlarged adenotonsillar tissues (AT) play a major pathophysiologic role. Mechanisms leading to the proliferation and hypertrophy of AT in children who subsequently develop OSA remain unknown, and surgical extirpation of AT is associated with potential morbidity and mortality. OBJECTIVES We hypothesized that a computationally based analysis of gene expression in tonsils from children with OSA and children with recurrent tonsillitis without OSA can identify putative mechanistic pathways associated with tonsillar proliferation and hypertrophy in OSA. METHODS Palatine tonsils from children with either polysomnographically documented OSA or recurrent infectious tonsillitis were subjected to whole-genome microarray and functional enrichment analyses followed by significance score ranking based on gene interaction networks. The latter enabled identification and confirmation of a candidate list of tonsil-proliferative genes in OSA. MEASUREMENTS AND MAIN RESULTS In vitro studies using a mixed tonsil cell culture system targeting one of these candidates, phosphoserine phosphatase, revealed that it was more abundantly expressed in tonsils of children with OSA, and that pharmacological inhibition of phosphoserine phosphatase led to marked reductions in T- and B-lymphocyte cell proliferation and increased apoptosis. CONCLUSIONS A systems biology approach revealed a restricted set of candidate genes potentially underlying the heightened proliferative properties of AT in children with OSA. Furthermore, functional studies confirm a novel role for protein phosphatases in AT hypertrophy, and may provide a promising strategy for discovery of novel, nonsurgical therapeutic targets in pediatric OSA.
Collapse
Affiliation(s)
- Abdelnaby Khalyfa
- Department of Pediatrics, University of Chicago, 5721 S. Maryland Avenue, Chicago, IL 60637, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
204
|
Pearson BJ, Sánchez Alvarado A. A planarian p53 homolog regulates proliferation and self-renewal in adult stem cell lineages. Development 2010; 137:213-21. [PMID: 20040488 DOI: 10.1242/dev.044297] [Citation(s) in RCA: 142] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The functions of adult stem cells and tumor suppressor genes are known to intersect. However, when and how tumor suppressors function in the lineages produced by adult stem cells is unknown. With a large population of stem cells that can be manipulated and studied in vivo, the freshwater planarian is an ideal system with which to investigate these questions. Here, we focus on the tumor suppressor p53, homologs of which have no known role in stem cell biology in any invertebrate examined thus far. Planaria have a single p53 family member, Smed-p53, which is predominantly expressed in newly made stem cell progeny. When Smed-p53 is targeted by RNAi, the stem cell population increases at the expense of progeny, resulting in hyper-proliferation. However, ultimately the stem cell population fails to self-renew. Our results suggest that prior to the vertebrates, an ancestral p53-like molecule already had functions in stem cell proliferation control and self-renewal.
Collapse
Affiliation(s)
- Bret J Pearson
- Department of Neurobiology and Anatomy, Howard Hughes Medical Institute, University of Utah, Salt Lake City, UT 84132, USA
| | | |
Collapse
|
205
|
RamachandraRao SP, Talwar P, Ravasi T, Sharma K. Novel systems biology insights using antifibrotic approaches for diabetic kidney disease. Expert Rev Endocrinol Metab 2010; 5:127-135. [PMID: 30934387 DOI: 10.1586/eem.09.72] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Although several interventions slow the progression of diabetic nephropathy, current therapies do not halt progression completely. Recent preclinical studies suggested that pirfenidone (PFD) prevents fibrosis in various diseases, but the mechanisms underlying its antifibrotic action are incompletely understood. To explore the therapeutic potential of PFD, we studied the PFD-treated db/db diabetic mouse kidney by liquid chromatography-tandem mass spectrometry proteomics. A total of 21 proteins unique to PFD-treated diabetic kidneys were identified. Analysis of gene ontology and protein-protein interactions of these proteins suggested that PFD may regulate RNA translation. Two key proteins involved in mRNA translation initiation and elongation were further evaluated and found to be regulated by PFD at the level of phosphorylation. In conclusion, insights from combining proteomics and bioinformatics improve the likelihood of rapid advancement of novel clinical therapies focused on reducing inflammation and fibrosis for diabetic complications.
Collapse
Affiliation(s)
- Satish P RamachandraRao
- a Veterans Administration San Diego Healthcare System, La Jolla, CA, USA and Center for Renal Translational Medicine, Division of Nephrology and Hypertension, Department of Medicine, 407 Stein Clinical Research Building, Mail Box #0711, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Priti Talwar
- b Center for Renal Translational Medicine, Division of Nephrology and Hypertension, Department of Medicine, 407 Stein Clinical Research Building, Mail Box #0711, University of California, San Diego, La Jolla, CA 92093, USA and Department of Bioengineering, Jacobs School of Engineering, University of California, San Diego, CA, USA.
| | - Timothy Ravasi
- c Division of Life Sciences and Engineering, Computational Bioscience Research Center (CBRC), King Abdullah University for Science and Technology (KAUST), Jeddah, Saudi Arabia and Department of Bioengineering, Jacobs School of Engineering, University of California, San Diego, CA, USA and The Scripps NeuroAIDS Preclinical Studies Centre, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| | - Kumar Sharma
- d Director, Center for Renal Translational Medicine, UCSD/VA San Diego Health System, La Jolla, CA 92093-0711, USA.
| |
Collapse
|
206
|
Zhao J, Jiang P, Zhang W. Molecular networks for the study of TCM pharmacology. Brief Bioinform 2009; 11:417-30. [PMID: 20038567 DOI: 10.1093/bib/bbp063] [Citation(s) in RCA: 163] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
To target complex, multi-factorial diseases more effectively, there has been an emerging trend of multi-target drug development based on network biology, as well as an increasing interest in traditional Chinese medicine (TCM) that applies a more holistic treatment to diseases. Thousands of years' clinic practices in TCM have accumulated a considerable number of formulae that exhibit reliable in vivo efficacy and safety. However, the molecular mechanisms responsible for their therapeutic effectiveness are still unclear. The development of network-based systems biology has provided considerable support for the understanding of the holistic, complementary and synergic essence of TCM in the context of molecular networks. This review introduces available sources and methods that could be utilized for the network-based study of TCM pharmacology, proposes a workflow for network-based TCM pharmacology study, and presents two case studies on applying these sources and methods to understand the mode of action of TCM recipes.
Collapse
Affiliation(s)
- Jing Zhao
- Department of Natural Medicinal Chemistry, Second Military Medical University, PR China
| | | | | |
Collapse
|
207
|
Gharib SA, Nguyen E, Altemeier WA, Shaffer SA, Doneanu CE, Goodlett DR, Schnapp LM. Of mice and men: comparative proteomics of bronchoalveolar fluid. Eur Respir J 2009; 35:1388-95. [PMID: 20032019 DOI: 10.1183/09031936.00089409] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We hypothesised that comparing the protein mixture in bronchoalveolar lavage fluid (BALF) between humans and mice may lead to mechanistic insights into common and divergent pathways that evolved in each species. BALF from four humans and six mice was pooled separately and underwent identical shotgun proteomic analysis. Functional and network analysis was applied to identify overlapping and distinct pathways enriched in the BALF. Follow-up experiments using Western analysis in unpooled BALF samples were performed. We identified 91 unique proteins in human and 117 unique proteins in mouse BALF samples. Functional analysis of the proteins revealed conservation of several key processes between the species, including defence response. Oxidative stress response, however, was selectively enriched only in mouse BALF. Differences in the expression of peroxiredoxin-1, a key member of the defence pathway against oxidative injury, were confirmed between normal human and mouse BALF and in models of lung injury. A computational proteomics approach of mouse and human BALF confirms the conservation of immune and defence-mediated pathways while highlighting differences in response to oxidative stress. These observations suggest that the use of mice models to study human lung disorders should be undertaken with an appreciation of interspecies variability.
Collapse
Affiliation(s)
- S A Gharib
- Center for Lung Biology, Division of Pulmonary and Critical Care Medicine, Seattle, WA, USA.
| | | | | | | | | | | | | |
Collapse
|
208
|
Abstract
Structural information on interacting proteins is important for understanding life processes at the molecular level. Genome-wide docking database is an integrated resource for structural studies of protein-protein interactions on the genome scale, which combines the available experimental data with models obtained by docking techniques. Current database version (August 2009) contains 25 559 experimental and modeled 3D structures for 771 organisms spanned over the entire universe of life from viruses to humans. Data are organized in a relational database with user-friendly search interface allowing exploration of the database content by a number of parameters. Search results can be interactively previewed and downloaded as PDB-formatted files, along with the information relevant to the specified interactions. The resource is freely available at http://gwidd.bioinformatics.ku.edu.
Collapse
Affiliation(s)
- Petras J Kundrotas
- Department of Molecular Biosciences, The University of Kansas, Center for Bioinformatics and Lawrence, KS 66047, USA
| | | | | |
Collapse
|
209
|
Iossifov I, Rodriguez-Esteban R, Mayzus I, Millen KJ, Rzhetsky A. Looking at cerebellar malformations through text-mined interactomes of mice and humans. PLoS Comput Biol 2009; 5:e1000559. [PMID: 19893633 PMCID: PMC2767227 DOI: 10.1371/journal.pcbi.1000559] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 10/07/2009] [Indexed: 12/11/2022] Open
Abstract
We have generated and made publicly available two very large networks of molecular interactions: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration. We described and made publicly available the largest existing set of text-mined statements; we also presented its application to an important biological problem. We have extracted and purified two large molecular networks, one for humans and one for mouse. We characterized the data sets, described the methods we used to generate them, and presented a novel biological application of the networks to study the etiology of five cerebellum phenotypes. We demonstrated quantitatively that the development-related malformations differ in their system-level properties from degeneration-related genes. We showed that there is a high degree of overlap among the genes implicated in the developmental malformations, that these genes have a strong tendency to be highly connected within the molecular network, and that they also tend to be clustered together, forming a compact molecular network neighborhood. In contrast, the genes involved in malformations due to degeneration do not have a high degree of connectivity, are not strongly clustered in the network, and do not overlap significantly with the development related genes. In addition, taking into account the above-mentioned system-level properties and the gene-specific network interactions, we made highly confident predictions about novel genes that are likely also involved in the etiology of the analyzed phenotypes.
Collapse
Affiliation(s)
- Ivan Iossifov
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Raul Rodriguez-Esteban
- Biotherapeutics and Integrative Biology, Boehringer Ingelheim, Ridgefield, Connecticut, United States of America
| | - Ilya Mayzus
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Kathleen J. Millen
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Andrey Rzhetsky
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Medicine, Institute for Genomics and Systems Biology, Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
210
|
Alves R, Rodriguez-Baena DS, Aguilar-Ruiz JS. Gene association analysis: a survey of frequent pattern mining from gene expression data. Brief Bioinform 2009; 11:210-24. [PMID: 19815645 DOI: 10.1093/bib/bbp042] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Establishing an association between variables is always of interest in genomic studies. Generation of DNA microarray gene expression data introduces a variety of data analysis issues not encountered in traditional molecular biology or medicine. Frequent pattern mining (FPM) has been applied successfully in business and scientific data for discovering interesting association patterns, and is becoming a promising strategy in microarray gene expression analysis. We review the most relevant FPM strategies, as well as surrounding main issues when devising efficient and practical methods for gene association analysis (GAA). We observed that, so far, scalability achieved by efficient methods does not imply biological soundness of the discovered association patterns, and vice versa. Ideally, GAA should employ a balanced mining model taking into account best practices employed by methods reviewed in this survey. Integrative approaches, in which biological knowledge plays an important role within the mining process, are becoming more reliable.
Collapse
Affiliation(s)
- Ronnie Alves
- Institute of Developmental Biology and Cancer, Centre de Biochimie, Faculte des Sciences, the University of Nice, 06108 Nice cedex 2.
| | | | | |
Collapse
|
211
|
Ng KL, Liu HC, Lee SC. ncRNAppi--a tool for identifying disease-related miRNA and siRNA targeting pathways. Bioinformatics 2009; 25:3199-201. [DOI: 10.1093/bioinformatics/btp574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
|
212
|
Li L, Zhang K, Lee J, Cordes S, Davis DP, Tang Z. Discovering cancer genes by integrating network and functional properties. BMC Med Genomics 2009; 2:61. [PMID: 19765316 PMCID: PMC2758898 DOI: 10.1186/1755-8794-2-61] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2008] [Accepted: 09/19/2009] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Identification of novel cancer-causing genes is one of the main goals in cancer research. The rapid accumulation of genome-wide protein-protein interaction (PPI) data in humans has provided a new basis for studying the topological features of cancer genes in cellular networks. It is important to integrate multiple genomic data sources, including PPI networks, protein domains and Gene Ontology (GO) annotations, to facilitate the identification of cancer genes. METHODS Topological features of the PPI network, as well as protein domain compositions, enrichment of gene ontology categories, sequence and evolutionary conservation features were extracted and compared between cancer genes and other genes. The predictive power of various classifiers for identification of cancer genes was evaluated by cross validation. Experimental validation of a subset of the prediction results was conducted using siRNA knockdown and viability assays in human colon cancer cell line DLD-1. RESULTS Cross validation demonstrated advantageous performance of classifiers based on support vector machines (SVMs) with the inclusion of the topological features from the PPI network, protein domain compositions and GO annotations. We then applied the trained SVM classifier to human genes to prioritize putative cancer genes. siRNA knock-down of several SVM predicted cancer genes displayed greatly reduced cell viability in human colon cancer cell line DLD-1. CONCLUSION Topological features of PPI networks, protein domain compositions and GO annotations are good predictors of cancer genes. The SVM classifier integrates multiple features and as such is useful for prioritizing candidate cancer genes for experimental validations.
Collapse
Affiliation(s)
- Li Li
- Department of Bioinformatics, Genentech Inc,, 1 DNA Way, South San Francisco, CA 94080, USA.
| | | | | | | | | | | |
Collapse
|
213
|
Lin M, Hu B, Chen L, Sun P, Fan Y, Wu P, Chen X. Computational identification of potential molecular interactions in Arabidopsis. PLANT PHYSIOLOGY 2009; 151:34-46. [PMID: 19592425 PMCID: PMC2735983 DOI: 10.1104/pp.109.141317] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 07/06/2009] [Indexed: 05/21/2023]
Abstract
Knowledge of the protein interaction network is useful to assist molecular mechanism studies. Several major repositories have been established to collect and organize reported protein interactions. Many interactions have been reported in several model organisms, yet a very limited number of plant interactions can thus far be found in these major databases. Computational identification of potential plant interactions, therefore, is desired to facilitate relevant research. In this work, we constructed a support vector machine model to predict potential Arabidopsis (Arabidopsis thaliana) protein interactions based on a variety of indirect evidence. In a 100-iteration bootstrap evaluation, the confidence of our predicted interactions was estimated to be 48.67%, and these interactions were expected to cover 29.02% of the entire interactome. The sensitivity of our model was validated with an independent evaluation data set consisting of newly reported interactions that did not overlap with the examples used in model training and testing. Results showed that our model successfully recognized 28.91% of the new interactions, similar to its expected sensitivity (29.02%). Applying this model to all possible Arabidopsis protein pairs resulted in 224,206 potential interactions, which is the largest and most accurate set of predicted Arabidopsis interactions at present. In order to facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource, with detailed annotations and more specific per interaction confidence measurements. This database and related documents are freely accessible at http://www.cls.zju.edu.cn/pair/.
Collapse
Affiliation(s)
- Mingzhi Lin
- Department of Bioinformatics, Zhejiang University, Hangzhou, People's Republic of China, 310058
| | | | | | | | | | | | | |
Collapse
|
214
|
Li X, Cai H, Xu J, Ying S, Zhang Y. A mouse protein interactome through combined literature mining with multiple sources of interaction evidence. Amino Acids 2009; 38:1237-52. [PMID: 19669079 DOI: 10.1007/s00726-009-0335-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 07/24/2009] [Indexed: 11/25/2022]
Abstract
Protein-protein interactions (PPIs) play crucial roles in a number of biological processes. Recently, protein interaction networks (PINs) for several model organisms and humans have been generated, but few large-scale researches for mice have ever been made neither experimentally nor computationally. In the work, we undertook an effort to map a mouse PIN, in which protein interactions are hidden in enormous amount of biomedical literatures. Following a co-occurrence-based text-mining approach, a probabilistic model--naïve Bayesian was used to filter false-positive interactions by integrating heterogeneous kinds of evidence from genomic and proteomic datasets. A support vector machine algorithm was further used to choose protein pairs with physical interactions. By comparing with the currently available PPI datasets from several model organisms and humans, it showed that the derived mouse PINs have similar topological properties at the global level, but a high local divergence. The mouse protein interaction dataset is stored in the Mouse protein-protein interaction DataBase (MppDB) that is useful source of information for system-level understanding of gene function and biological processes in mammals. Access to the MppDB database is public available at http://bio.scu.edu.cn/mppi.
Collapse
Affiliation(s)
- Xiao Li
- Sichuan Key Laboratory of Molecular Biology and Biotechnology, Ministry of Education Key Laboratory for Bio-resource and Eco-environment, College of Life Sciences, Sichuan University, 610065, Chengdu, People's Republic of China.
| | | | | | | | | |
Collapse
|
215
|
Saha S, Harrison SH, Chen JY. Dissecting the human plasma proteome and inflammatory response biomarkers. Proteomics 2009; 9:470-84. [PMID: 19105179 DOI: 10.1002/pmic.200800507] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A central focus of clinical proteomics is to search for biomarkers in plasma for diagnostic and therapeutic use. We studied a set of plasma proteins accessed from the Healthy Human Individual's Integrated Plasma Proteome (HIP(2)) database, a larger set of curated human proteins, and a subset of inflammatory proteins, for overlap with sets of known protein biomarkers, drug targets, and secreted proteins. Most inflammatory proteins were found to occur in plasma, and over three times the level of biomarkers were found in inflammatory plasma proteins and their interacting protein neighbors compared to the sets of plasma and curated human proteins. Percentage overlaps with Gene Ontology terms were similar between the curated human set and plasma protein set, yet the set of inflammatory plasma proteins had a distinct ontology-based profile. Most of the major hub proteins within protein-protein interaction networks of tissue-specific sets of inflammatory proteins were found to occur in disease pathways. The present study presents a systematic approach for profiling a plasma subproteome's relationship to both its potential range of clinical application and its overlap with complex disease.
Collapse
Affiliation(s)
- Sudipto Saha
- School of Informatics, Indiana University, Indianapolis, IN 46202-3103, USA
| | | | | |
Collapse
|
216
|
Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics 2009; 10 Suppl 1:S16. [PMID: 19594875 PMCID: PMC2709259 DOI: 10.1186/1471-2164-10-s1-s16] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Human protein-protein interaction (PPIs) data are the foundation for understanding molecular signalling networks and the functional roles of biomolecules. Several human PPI databases have become available; however, comparisons of these datasets have suggested limited data coverage and poor data quality. Ongoing collection and integration of human PPIs from different sources, both experimentally and computationally, can enable disease-specific network biology modelling in translational bioinformatics studies. Results We developed a new web-based resource, the Human Annotated and Predicted Protein Interaction (HAPPI) database, located at . The HAPPI database was created by extracting and integrating publicly available protein interaction databases, including HPRD, BIND, MINT, STRING, and OPHID, using database integration techniques. We designed a unified entity-relationship data model to resolve semantic level differences of diverse concepts involved in PPI data integration. We applied a unified scoring model to give each PPI a measure of its reliability that can place each PPI at one of the five star rank levels from 1 to 5. We assessed the quality of PPIs contained in the new HAPPI database, using evolutionary conserved co-expression pairs called "MetaGene" pairs to measure the extent of MetaGene pair and PPI pair overlaps. While the overall quality of the HAPPI database across all star ranks is comparable to the overall qualities of HPRD or IntNetDB, the subset of the HAPPI database with star ranks between 3 and 5 has a much higher average quality than all other human PPI databases. As of summer 2008, the database contains 142,956 non-redundant, medium to high-confidence level human protein interaction pairs among 10,592 human proteins. The HAPPI database web application also provides …” should be “The HAPPI database web application also provides hyperlinked information of genes, pathways, protein domains, protein structure displays, and sequence feature maps for interactive exploration of PPI data in the database. Conclusion HAPPI is by far the most comprehensive public compilation of human protein interaction information. It enables its users to fully explore PPI data with quality measures and annotated information necessary for emerging network biology studies.
Collapse
Affiliation(s)
- Jake Yue Chen
- School of Informatics, Indiana University - Purdue University, Indianapolis, IN, USA.
| | | | | |
Collapse
|
217
|
RamachandraRao SP, Zhu Y, Ravasi T, McGowan TA, Toh I, Dunn SR, Okada S, Shaw MA, Sharma K. Pirfenidone is renoprotective in diabetic kidney disease. J Am Soc Nephrol 2009; 20:1765-75. [PMID: 19578007 DOI: 10.1681/asn.2008090931] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Although several interventions slow the progression of diabetic nephropathy, current therapies do not halt progression completely. Recent preclinical studies suggested that pirfenidone (PFD) prevents fibrosis in various diseases, but the mechanisms underlying its antifibrotic action are incompletely understood. Here, we evaluated the role of PFD in regulation of the extracellular matrix. In mouse mesangial cells, PFD decreased TGF-beta promoter activity, reduced TGF-beta protein secretion, and inhibited TGF-beta-induced Smad2-phosphorylation, 3TP-lux promoter activity, and generation of reactive oxygen species. To explore the therapeutic potential of PFD, we administered PFD to 17-wk-old db/db mice for 4 wk. PFD treatment significantly reduced mesangial matrix expansion and expression of renal matrix genes but did not affect albuminuria. Using liquid chromatography with subsequent electrospray ionization tandem mass spectrometry, we identified 21 proteins unique to PFD-treated diabetic kidneys. Analysis of gene ontology and protein-protein interactions of these proteins suggested that PFD may regulate RNA processing. Immunoblotting demonstrated that PFD promotes dosage-dependent dephosphorylation of eukaryotic initiation factor, potentially inhibiting translation of mRNA. In conclusion, PFD is renoprotective in diabetic kidney disease and may exert its antifibrotic effects, in part, via inhibiting RNA processing.
Collapse
Affiliation(s)
- Satish P RamachandraRao
- Center for Renal Translational Medicine, Division of Nephrology and Hypertension, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
218
|
Li GG, Wang ZZ. Evaluation of similarity measures for gene expression data and their correspondent combined measures. Interdiscip Sci 2009; 1:72-80. [DOI: 10.1007/s12539-008-0005-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Revised: 08/10/2008] [Accepted: 08/10/2008] [Indexed: 11/30/2022]
|
219
|
Bacha J, Brodie JS, Loose MW. myGRN: a database and visualisation system for the storage and analysis of developmental genetic regulatory networks. BMC DEVELOPMENTAL BIOLOGY 2009; 9:33. [PMID: 19500400 PMCID: PMC2702357 DOI: 10.1186/1471-213x-9-33] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Accepted: 06/06/2009] [Indexed: 11/23/2022]
Abstract
BACKGROUND Biological processes are regulated by complex interactions between transcription factors and signalling molecules, collectively described as Genetic Regulatory Networks (GRNs). The characterisation of these networks to reveal regulatory mechanisms is a long-term goal of many laboratories. However compiling, visualising and interacting with such networks is non-trivial. Current tools and databases typically focus on GRNs within simple, single celled organisms. However, data is available within the literature describing regulatory interactions in multi-cellular organisms, although not in any systematic form. This is particularly true within the field of developmental biology, where regulatory interactions should also be tagged with information about the time and anatomical location of development in which they occur. DESCRIPTION We have developed myGRN (http://www.myGRN.org), a web application for storing and interrogating interaction data, with an emphasis on developmental processes. Users can submit interaction and gene expression data, either curated from published sources or derived from their own unpublished data. All interactions associated with publications are publicly visible, and unpublished interactions can only be shared between collaborating labs prior to publication. Users can group interactions into discrete networks based on specific biological processes. Various filters allow dynamic production of network diagrams based on a range of information including tissue location, developmental stage or basic topology. Individual networks can be viewed using myGRV, a tool focused on displaying developmental networks, or exported in a range of formats compatible with third party tools. Networks can also be analysed for the presence of common network motifs. We demonstrate the capabilities of myGRN using a network of zebrafish interactions integrated with expression data from the zebrafish database, ZFIN. CONCLUSION Here we are launching myGRN as a community-based repository for interaction networks, with a specific focus on developmental networks. We plan to extend its functionality, as well as use it to study networks involved in embryonic development in the future.
Collapse
Affiliation(s)
- Jamil Bacha
- Institute of Genetics, University of Nottingham, Nottingham, UK
| | - James S Brodie
- Institute of Genetics, University of Nottingham, Nottingham, UK
| | - Matthew W Loose
- Institute of Genetics, University of Nottingham, Nottingham, UK
| |
Collapse
|
220
|
Chakicherla A, Ecale Zhou CL, Dang ML, Rodriguez V, Hansen JN, Zemla A. SpaK/SpaR two-component system characterized by a structure-driven domain-fusion method and in vitro phosphorylation studies. PLoS Comput Biol 2009; 5:e1000401. [PMID: 19503843 PMCID: PMC2686270 DOI: 10.1371/journal.pcbi.1000401] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Accepted: 05/04/2009] [Indexed: 12/23/2022] Open
Abstract
Here we introduce a quantitative structure-driven computational domain-fusion
method, which we used to predict the structures of proteins believed to be
involved in regulation of the subtilin pathway in Bacillus
subtilis, and used to predict a protein-protein complex formed by
interaction between the proteins. Homology modeling of SpaK and SpaR yielded
preliminary structural models based on a best template for SpaK comprising a
dimer of a histidine kinase, and for SpaR a response regulator protein. Our LGA
code was used to identify multi-domain proteins with structure homology to both
modeled structures, yielding a set of domain-fusion templates then used to model
a hypothetical SpaK/SpaR complex. The models were used to identify putative
functional residues and residues at the protein-protein interface, and
bioinformatics was used to compare functionally and structurally relevant
residues in corresponding positions among proteins with structural homology to
the templates. Models of the complex were evaluated in light of known properties
of the functional residues within two-component systems involving His-Asp
phosphorelays. Based on this analysis, a phosphotransferase complexed with a
beryllofluoride was selected as the optimal template for modeling a SpaK/SpaR
complex conformation. In vitro phosphorylation studies
performed using wild type and site-directed SpaK mutant proteins validated the
predictions derived from application of the structure-driven domain-fusion
method: SpaK was phosphorylated in the presence of 32P-ATP and the
phosphate moiety was subsequently transferred to SpaR, supporting the hypothesis
that SpaK and SpaR function as sensor and response regulator, respectively, in a
two-component signal transduction system, and furthermore suggesting that the
structure-driven domain-fusion approach correctly predicted a physical
interaction between SpaK and SpaR. Our domain-fusion algorithm leverages
quantitative structure information and provides a tool for generation of
hypotheses regarding protein function, which can then be tested using empirical
methods. Because proteins so frequently function in coordination with other proteins,
identification and characterization of the interactions among proteins are
essential for understanding how proteins work. Computational methods for
identification of protein-protein interactions have been limited by the degree
to which proteins are similar in sequence. However, methods that leverage
structure information can overcome this limitation of sequence-based methods;
the three-dimensional information provided by structure enables identification
of related proteins even when their sequences are dissimilar. In this work we
present a quantitative method for identification of protein interacting
partners, and we demonstrate its use in modeling the structure of a hypothetical
complex between two proteins that function in a bacterial signaling system. This
quantitative approach comprises a tool for generation of hypotheses regarding
protein function, which can then be tested using empirical methods, and provides
a basis for high-throughput prediction of protein-protein interactions, which
could be applied on a whole-genome scale.
Collapse
Affiliation(s)
- Anu Chakicherla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| | - Carol L. Ecale Zhou
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
- * E-mail:
| | | | - Virginia Rodriguez
- Genome Technology Branch, National Human Genome Research Institute,
National Institutes of Health, Bethesda, Maryland, United States of
America
| | - J. Norman Hansen
- Department of Chemistry and Biochemistry, University of Maryland, College
Park, Maryland, United States of America
| | - Adam Zemla
- Computing Applications and Research Department, Lawrence Livermore
National Laboratory, Livermore, California, United States of America
| |
Collapse
|
221
|
Ravi D, Wiles AM, Bhavani S, Ruan J, Leder P, Bishop AJR. A network of conserved damage survival pathways revealed by a genomic RNAi screen. PLoS Genet 2009; 5:e1000527. [PMID: 19543366 PMCID: PMC2688755 DOI: 10.1371/journal.pgen.1000527] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2009] [Accepted: 05/19/2009] [Indexed: 11/18/2022] Open
Abstract
Damage initiates a pleiotropic cellular response aimed at cellular survival when appropriate. To identify genes required for damage survival, we used a cell-based RNAi screen against the Drosophila genome and the alkylating agent methyl methanesulphonate (MMS). Similar studies performed in other model organisms report that damage response may involve pleiotropic cellular processes other than the central DNA repair components, yet an intuitive systems level view of the cellular components required for damage survival, their interrelationship, and contextual importance has been lacking. Further, by comparing data from different model organisms, identification of conserved and presumably core survival components should be forthcoming. We identified 307 genes, representing 13 signaling, metabolic, or enzymatic pathways, affecting cellular survival of MMS-induced damage. As expected, the majority of these pathways are involved in DNA repair; however, several pathways with more diverse biological functions were also identified, including the TOR pathway, transcription, translation, proteasome, glutathione synthesis, ATP synthesis, and Notch signaling, and these were equally important in damage survival. Comparison with genomic screen data from Saccharomyces cerevisiae revealed no overlap enrichment of individual genes between the species, but a conservation of the pathways. To demonstrate the functional conservation of pathways, five were tested in Drosophila and mouse cells, with each pathway responding to alkylation damage in both species. Using the protein interactome, a significant level of connectivity was observed between Drosophila MMS survival proteins, suggesting a higher order relationship. This connectivity was dramatically improved by incorporating the components of the 13 identified pathways within the network. Grouping proteins into "pathway nodes" qualitatively improved the interactome organization, revealing a highly organized "MMS survival network." We conclude that identification of pathways can facilitate comparative biology analysis when direct gene/orthologue comparisons fail. A biologically intuitive, highly interconnected MMS survival network was revealed after we incorporated pathway data in our interactome analysis.
Collapse
Affiliation(s)
- Dashnamoorthy Ravi
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
| | - Amy M. Wiles
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
| | - Selvaraj Bhavani
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
| | - Jianhua Ruan
- Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas, United States of America
| | - Philip Leder
- Harvard Medical School, Department of Genetics, Harvard University, Boston, Massachusetts, United States of America
| | - Alexander J. R. Bishop
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
- Harvard Medical School, Department of Genetics, Harvard University, Boston, Massachusetts, United States of America
| |
Collapse
|
222
|
Krüger B, Dandekar T. Bioinformatical Approaches to Detect and Analyze Protein Interactions. Proteomics 2009; 564:401-31. [DOI: 10.1007/978-1-60761-157-8_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
223
|
Gautier VW, Gu L, O'Donoghue N, Pennington S, Sheehy N, Hall WW. In vitro nuclear interactome of the HIV-1 Tat protein. Retrovirology 2009; 6:47. [PMID: 19454010 PMCID: PMC2702331 DOI: 10.1186/1742-4690-6-47] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Accepted: 05/19/2009] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND One facet of the complexity underlying the biology of HIV-1 resides not only in its limited number of viral proteins, but in the extensive repertoire of cellular proteins they interact with and their higher-order assembly. HIV-1 encodes the regulatory protein Tat (86-101aa), which is essential for HIV-1 replication and primarily orchestrates HIV-1 provirus transcriptional regulation. Previous studies have demonstrated that Tat function is highly dependent on specific interactions with a range of cellular proteins. However they can only partially account for the intricate molecular mechanisms underlying the dynamics of proviral gene expression. To obtain a comprehensive nuclear interaction map of Tat in T-cells, we have designed a proteomic strategy based on affinity chromatography coupled with mass spectrometry. RESULTS Our approach resulted in the identification of a total of 183 candidates as Tat nuclear partners, 90% of which have not been previously characterised. Subsequently we applied in silico analysis, to validate and characterise our dataset which revealed that the Tat nuclear interactome exhibits unique signature(s). First, motif composition analysis highlighted that our dataset is enriched for domains mediating protein, RNA and DNA interactions, and helicase and ATPase activities. Secondly, functional classification and network reconstruction clearly depicted Tat as a polyvalent protein adaptor and positioned Tat at the nexus of a densely interconnected interaction network involved in a range of biological processes which included gene expression regulation, RNA biogenesis, chromatin structure, chromosome organisation, DNA replication and nuclear architecture. CONCLUSION We have completed the in vitro Tat nuclear interactome and have highlighted its modular network properties and particularly those involved in the coordination of gene expression by Tat. Ultimately, the highly specialised set of molecular interactions identified will provide a framework to further advance our understanding of the mechanisms of HIV-1 proviral gene silencing and activation.
Collapse
Affiliation(s)
- Virginie W Gautier
- UCD-Centre for Research in Infectious Diseases, School of Medicine and Medical Science, University College Dublin (UCD), Belfield, Dublin 4, Ireland.
| | | | | | | | | | | |
Collapse
|
224
|
Song CM, Lim SJ, Tong JC. Recent advances in computer-aided drug design. Brief Bioinform 2009; 10:579-91. [PMID: 19433475 DOI: 10.1093/bib/bbp023] [Citation(s) in RCA: 175] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Modern drug discovery is characterized by the production of vast quantities of compounds and the need to examine these huge libraries in short periods of time. The need to store, manage and analyze these rapidly increasing resources has given rise to the field known as computer-aided drug design (CADD). CADD represents computational methods and resources that are used to facilitate the design and discovery of new therapeutic solutions. Digital repositories, containing detailed information on drugs and other useful compounds, are goldmines for the study of chemical reactions capabilities. Design libraries, with the potential to generate molecular variants in their entirety, allow the selection and sampling of chemical compounds with diverse characteristics. Fold recognition, for studying sequence-structure homology between protein sequences and structures, are helpful for inferring binding sites and molecular functions. Virtual screening, the in silico analog of high-throughput screening, offers great promise for systematic evaluation of huge chemical libraries to identify potential lead candidates that can be synthesized and tested. In this article, we present an overview of the most important data sources and computational methods for the discovery of new molecular entities. The workflow of the entire virtual screening campaign is discussed, from data collection through to post-screening analysis.
Collapse
Affiliation(s)
- Chun Meng Song
- Institute for Infocomm Research, Connexis South Tower, Singapore 138632
| | | | | |
Collapse
|
225
|
PCOPGene-Net: holistic characterisation of cellular states from microarray data based on continuous and non-continuous analysis of gene-expression relationships. BMC Bioinformatics 2009; 10:138. [PMID: 19426548 PMCID: PMC2688515 DOI: 10.1186/1471-2105-10-138] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 05/09/2009] [Indexed: 11/18/2022] Open
Abstract
Background Microarray technology is so expensive and powerful that it is essential to extract maximum value from microarray data, specially from large-sample-series microarrays. Our web tools attempt to respond to these researchers' needs by facilitating the possibility to test and formulate from a hypothesis to entire models under a holistic point of view. Results PCOPGene-Net is a web application for facilitating the study of the relationships among gene expressions under microarray conditions, to classify these conditions and to study their effect on expression relationships. Furthermore, the system guides the researcher in the navigation through the microarray data by providing the most suitable genes and information for the researcher's interests at each moment. We achieve all of these by means of the zoom-out operation, the zoom-in operation, the non-continuous analysis and crossing the PCOPGene results with external data-servers. Conclusion PCOPGene-Net helps to identify cellular states and the genes involved in these. All of that is accomplished in a flexible way, guided by the researcher's interests and taking advantage of the ability of our system to relate gene expressions, even when these relationships are non-continuous and cannot be found using linear or non-linear analytical methods. Currently, our tools are used for tumour-progression study from a holistic point of view.
Collapse
|
226
|
Barton D, Braet F, Marc J, Overall R, Gardiner J. ELP3 localises to mitochondria and actin-rich domains at edges of HeLa cells. Neurosci Lett 2009; 455:60-4. [DOI: 10.1016/j.neulet.2009.03.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2009] [Revised: 02/20/2009] [Accepted: 03/03/2009] [Indexed: 11/29/2022]
|
227
|
Gharib SA, Vaisar T, Aitken ML, Park DR, Heinecke JW, Fu X. Mapping the Lung Proteome in Cystic Fibrosis. J Proteome Res 2009; 8:3020-8. [DOI: 10.1021/pr900093j] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Sina A. Gharib
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| | - Tomas Vaisar
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| | - Moira L. Aitken
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| | - David R. Park
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| | - Jay W. Heinecke
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| | - Xiaoyun Fu
- Center for Lung Biology and, Department of Medicine, University of Washington, Seattle, Washington 98195, and Puget Sound Blood Center, Seattle, Washington 98104
| |
Collapse
|
228
|
Moreland RT, Ryan JF, Pan C, Baxevanis AD. The Homeodomain Resource: a comprehensive collection of sequence, structure, interaction, genomic and functional information on the homeodomain protein family. Database (Oxford) 2009; 2009:bap004. [PMID: 20157477 PMCID: PMC2790301 DOI: 10.1093/database/bap004] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2008] [Accepted: 03/14/2009] [Indexed: 01/15/2023]
Abstract
The Homeodomain Resource is a curated collection of sequence, structure, interaction, genomic and functional information on the homeodomain family. The current version builds upon previous versions by the addition of new, complete sets of homeodomain sequences from fully sequenced genomes, the expansion of existing curated homeodomain information and the improvement of data accessibility through better search tools and more complete data integration. This release contains 1534 full-length homeodomain-containing sequences, 93 experimentally derived homeodomain structures, 101 homeodomain protein-protein interactions, 107 homeodomain DNA-binding sites and 206 homeodomain proteins implicated in human genetic disorders.Database URL: The Homeodomain Resource is freely available and can be accessed at http://research.nhgri.nih.gov/homeodomain/
Collapse
Affiliation(s)
| | | | | | - Andreas D. Baxevanis
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
229
|
Chowdhary R, Zhang J, Liu JS. Bayesian inference of protein-protein interactions from biological literature. ACTA ACUST UNITED AC 2009; 25:1536-42. [PMID: 19369495 DOI: 10.1093/bioinformatics/btp245] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
MOTIVATION Protein-protein interaction (PPI) extraction from published biological articles has attracted much attention because of the importance of protein interactions in biological processes. Despite significant progress, mining PPIs from literatures still rely heavily on time- and resource-consuming manual annotations. RESULTS In this study, we developed a novel methodology based on Bayesian networks (BNs) for extracting PPI triplets (a PPI triplet consists of two protein names and the corresponding interaction word) from unstructured text. The method achieved an overall accuracy of 87% on a cross-validation test using manually annotated dataset. We also showed, through extracting PPI triplets from a large number of PubMed abstracts, that our method was able to complement human annotations to extract large number of new PPIs from literature. AVAILABILITY Programs/scripts we developed/used in the study are available at http://stat.fsu.edu/~jinfeng/datasets/Bio-SI-programs-Bayesian-chowdhary-zhang-liu.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rajesh Chowdhary
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
230
|
Fornes O, Aragues R, Espadaler J, Marti-Renom MA, Sali A, Oliva B. ModLink+: improving fold recognition by using protein-protein interactions. ACTA ACUST UNITED AC 2009; 25:1506-12. [PMID: 19357100 DOI: 10.1093/bioinformatics/btp238] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
MOTIVATION Several strategies have been developed to predict the fold of a target protein sequence, most of which are based on aligning the target sequence to other sequences of known structure. Previously, we demonstrated that the consideration of protein-protein interactions significantly increases the accuracy of fold assignment compared with PSI-BLAST sequence comparisons. A drawback of our method was the low number of proteins to which a fold could be assigned. Here, we present an improved version of the method that addresses this limitation. We also compare our method to other state-of-the-art fold assignment methodologies. RESULTS Our approach (ModLink+) has been tested on 3716 proteins with domain folds classified in the Structural Classification Of Proteins (SCOP) as well as known interacting partners in the Database of Interacting Proteins (DIP). For this test set, the ratio of success [positive predictive value (PPV)] on fold assignment increases from 75% for PSI-BLAST, 83% for HHSearch and 81% for PRC to >90% for ModLink+at the e-value cutoff of 10(-3). Under this e-value, ModLink+can assign a fold to 30-45% of the proteins in the test set, while our previous method could cover <25%. When applied to 6384 proteins with unknown fold in the yeast proteome, ModLink+combined with PSI-BLAST assigns a fold for domains in 3738 proteins, while PSI-BLAST alone covers only 2122 proteins, HHSearch 2969 and PRC 2826 proteins, using a threshold e-value that would represent a PPV >82% for each method in the test set. AVAILABILITY The ModLink+server is freely accessible in the World Wide Web at http://sbi.imim.es/modlink/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oriol Fornes
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Catalonia, Spain.
| | | | | | | | | | | |
Collapse
|
231
|
Linking high-resolution metabolic flux phenotypes and transcriptional regulation in yeast modulated by the global regulator Gcn4p. Proc Natl Acad Sci U S A 2009; 106:6477-82. [PMID: 19346491 DOI: 10.1073/pnas.0811091106] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Genome sequencing dramatically increased our ability to understand cellular response to perturbation. Integrating system-wide measurements such as gene expression with networks of protein-protein interactions and transcription factor binding revealed critical insights into cellular behavior. However, the potential of systems biology approaches is limited by difficulties in integrating metabolic measurements across the functional levels of the cell despite their being most closely linked to cellular phenotype. To address this limitation, we developed a model-based approach to correlate mRNA and metabolic flux data that combines information from both interaction network models and flux determination models. We started by quantifying 5,764 mRNAs, 54 metabolites, and 83 experimental (13)C-based reaction fluxes in continuous cultures of yeast under stress in the absence or presence of global regulator Gcn4p. Although mRNA expression alone did not directly predict metabolic response, this correlation improved through incorporating a network-based model of amino acid biosynthesis (from r = 0.07 to 0.80 for mRNA-flux agreement). The model provides evidence of general biological principles: rewiring of metabolic flux (i.e., use of different reaction pathways) by transcriptional regulation and metabolite interaction density (i.e., level of pairwise metabolite-protein interactions) as a key biosynthetic control determinant. Furthermore, this model predicted flux rewiring in studies of follow-on transcriptional regulators that were experimentally validated with additional (13)C-based flux measurements. As a first step in linking metabolic control and genetic regulatory networks, this model underscores the importance of integrating diverse data types in large-scale cellular models. We anticipate that an integrated approach focusing on metabolic measurements will facilitate construction of more realistic models of cellular regulation for understanding diseases and constructing strains for industrial applications.
Collapse
|
232
|
Gharib SA, Liles WC, Klaff LS, Altemeier WA. Noninjurious mechanical ventilation activates a proinflammatory transcriptional program in the lung. Physiol Genomics 2009; 37:239-48. [PMID: 19276240 DOI: 10.1152/physiolgenomics.00027.2009] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Mechanical ventilation is a life-saving intervention in patients with respiratory failure. However, human and animal studies have demonstrated that mechanical ventilation using large tidal volumes (>or=12 ml/kg) induces a potent inflammatory response and can cause acute lung injury. We hypothesized that mechanical ventilation with a "noninjurious" tidal volume of 10 ml/kg would still activate a transcriptional program that places the lung at risk for severe injury. To identify key regulators of this transcriptional response, we integrated gene expression data obtained from whole lungs of spontaneously breathing mice and mechanically ventilated mice with computational network analysis. Topological analysis of the gene product interaction network identified Jun and Fos families of proteins as potential regulatory hubs. Electrophoretic mobility gel shift assay confirmed protein binding to activator protein-1 (AP-1) consensus sequences, and supershift experiments identified JunD and FosB as components of ventilation-induced AP-1 binding. Specific recruitment of JunD to the regulatory region of the F3 gene by mechanical ventilation was confirmed by chromatin immunoprecipitation assay. In conclusion, we demonstrate a novel computational framework to systematically dissect transcriptional programs activated by mechanical ventilation in the lung, and show that noninjurious mechanical ventilation initiates a response that can prime the lung for injury from a subsequent insult.
Collapse
Affiliation(s)
- Sina A Gharib
- Center for Lung Biology, Department of Medicine, University of Washington, Seattle, Washington, USA.
| | | | | | | |
Collapse
|
233
|
Törmä A, Pitkänen JP, Huopaniemi L, Mattila P, Renkonen R. Concordant gene regulation related to perturbations of three GDP-mannose-related genes. FEMS Yeast Res 2009; 9:63-72. [PMID: 19133071 DOI: 10.1111/j.1567-1364.2008.00461.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Glycosylation of proteins is one of the most crucial post-translational modifications. In order to access system-level and state-dependent data related to the regulation of glycosylation events, we cultivated yeast cell strains each harboring a selected conditional knockdown construct for a gene (either SEC53, VRG4 or DPM1) related to GDP-mannose synthesis or its utilization in glycan biosynthesis. In order to carry this out efficiently, we developed automated sampling from bioreactor cultivations, a collection of in silico workflows for data analysis as well as their integration into a large data warehouse. Using the above-mentioned approaches, we could show that conditional knocking down of transcripts related to GDP-mannose synthesis or transportation led to altered levels of over 300 transcripts. These transcripts and their corresponding proteins were characterized by their gene ontology (GO) annotations, and their putative transcriptional regulation was analyzed. Furthermore, novel pathways were generated indicating interactions between GO categories with common proteins, putative transcriptional regulators of such induced GO categories, and the large protein-protein interaction network among the proteins whose transcripts indicated altered expression levels. When these results are always added to an ever-expanding data warehouse as annotations, they will incrementally increase the knowledge of biological systems.
Collapse
|
234
|
Care MA, Bradford JR, Needham CJ, Bulpitt AJ, Westhead DR. Combining the interactome and deleterious SNP predictions to improve disease gene identification. Hum Mutat 2009; 30:485-92. [PMID: 19156842 DOI: 10.1002/humu.20917] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A method has been developed for the prediction of proteins involved in genetic disorders. This involved combining deleterious SNP prediction with a system based on protein interactions and phenotype distances; this is the first time that deleterious SNP prediction has been used to make predictions across linkage-intervals. At each step we tested and selected the best procedure, revealing that the computationally expensive method of assigning medical meta-terms to create a phenotype distance matrix was outperformed by a simple word counting technique. We carried out in-depth benchmarking with increasingly stringent data sets, reaching precision values of up to 75% (19% recall) for 10-Mb linkage-intervals (averaging 100 genes). For the most stringent (worst-case) data we attained an overall recall of 6%, yet still achieved precision values of up to 90% (4% recall). At all levels of stringency and precision the addition of predicted deleterious SNPs was shown to increase recall.
Collapse
Affiliation(s)
- M A Care
- Institute of Molecular and Cellular Biology, University of Leeds, Leeds, West Yorkshire, United Kingdom
| | | | | | | | | |
Collapse
|
235
|
Teber ET, Liu JY, Ballouz S, Fatkin D, Wouters MA. Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies. BMC Bioinformatics 2009; 10 Suppl 1:S69. [PMID: 19208173 PMCID: PMC2648789 DOI: 10.1186/1471-2105-10-s1-s69] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Automated candidate gene prediction systems allow geneticists to hone in on disease genes more rapidly by identifying the most probable candidate genes linked to the disease phenotypes under investigation. Here we assessed the ability of eight different candidate gene prediction systems to predict disease genes in intervals previously associated with type 2 diabetes by benchmarking their performance against genes implicated by recent genome-wide association studies. Results Using a search space of 9556 genes, all but one of the systems pruned the genome in favour of genes associated with moderate to highly significant SNPs. Of the 11 genes associated with highly significant SNPs identified by the genome-wide association studies, eight were flagged as likely candidates by at least one of the prediction systems. A list of candidates produced by a previous consensus approach did not match any of the genes implicated by 706 moderate to highly significant SNPs flagged by the genome-wide association studies. We prioritized genes associated with medium significance SNPs. Conclusion The study appraises the relative success of several candidate gene prediction systems against independent genetic data. Even when confronted with challengingly large intervals, the candidate gene prediction systems can successfully select likely disease genes. Furthermore, they can be used to filter statistically less-well-supported genetic data to select more likely candidates. We suggest consensus approaches fail because they penalize novel predictions made from independent underlying databases. To realize their full potential further work needs to be done on prioritization and annotation of genes.
Collapse
Affiliation(s)
- Erdahl T Teber
- Victor Chang Cardiac Research Institute, 384 Victoria St, Darlinghurst, 2010, NSW, Australia.
| | | | | | | | | |
Collapse
|
236
|
Kim J, Huang DS, Han K. Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting. BMC Bioinformatics 2009; 10 Suppl 1:S57. [PMID: 19208160 PMCID: PMC2648735 DOI: 10.1186/1471-2105-10-s1-s57] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data, so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data. RESULTS We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human and virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at http://virus.hpid.org/interactions.aspx. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at http://virus.hpid.org/PDB_IDs.html. CONCLUSION When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data, which was trained with the balanced data sets generated via the bootstrapping method.
Collapse
Affiliation(s)
- Jisu Kim
- School of Computer Science and Engineering, Inha University, Incheon, South Korea.
| | | | | |
Collapse
|
237
|
Heap GA, Trynka G, Jansen RC, Bruinenberg M, Swertz MA, Dinesen LC, Hunt KA, Wijmenga C, vanHeel DA, Franke L. Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Med Genomics 2009; 2:1. [PMID: 19128478 PMCID: PMC2628677 DOI: 10.1186/1755-8794-2-1] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2008] [Accepted: 01/07/2009] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Genome wide association studies have been hugely successful in identifying disease risk variants, yet most variants do not lead to coding changes and how variants influence biological function is usually unknown. METHODS We correlated gene expression and genetic variation in untouched primary leucocytes (n = 110) from individuals with celiac disease - a common condition with multiple risk variants identified. We compared our observations with an EBV-transformed HapMap B cell line dataset (n = 90), and performed a meta-analysis to increase power to detect non-tissue specific effects. RESULTS In celiac peripheral blood, 2,315 SNP variants influenced gene expression at 765 different transcripts (< 250 kb from SNP, at FDR = 0.05, cis expression quantitative trait loci, eQTLs). 135 of the detected SNP-probe effects (reflecting 51 unique probes) were also detected in a HapMap B cell line published dataset, all with effects in the same allelic direction. Overall gene expression differences within the two datasets predominantly explain the limited overlap in observed cis-eQTLs. Celiac associated risk variants from two regions, containing genes IL18RAP and CCR3, showed significant cis genotype-expression correlations in the peripheral blood but not in the B cell line datasets. We identified 14 genes where a SNP affected the expression of different probes within the same gene, but in opposite allelic directions. By incorporating genetic variation in co-expression analyses, functional relationships between genes can be more significantly detected. CONCLUSION In conclusion, the complex nature of genotypic effects in human populations makes the use of a relevant tissue, large datasets, and analysis of different exons essential to enable the identification of the function for many genetic risk variants in common diseases.
Collapse
Affiliation(s)
- Graham A Heap
- Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
| | - Gosia Trynka
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
- Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, the Netherlands
| | - Ritsert C Jansen
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
- Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, NL-9751 NN Haren, the Netherlands
| | - Marcel Bruinenberg
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
| | - Morris A Swertz
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
| | - Lotte C Dinesen
- Gastroenterology Unit, University of Oxford, Oxford OX3 7BN, UK
| | - Karen A Hunt
- Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
| | - Cisca Wijmenga
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
- Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, the Netherlands
| | - David A vanHeel
- Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
| | - Lude Franke
- Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, London, E1 2AT, UK
- Genetics Department, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
- Complex Genetics Section, DBG-Department of Medical Genetics, University Medical Centre Utrecht, 3584 CG Utrecht, the Netherlands
| |
Collapse
|
238
|
Cusick ME, Yu H, Smolyar A, Venkatesan K, Carvunis AR, Simonis N, Rual JF, Borick H, Braun P, Dreze M, Vandenhaute J, Galli M, Yazaki J, Hill DE, Ecker JR, Roth FP, Vidal M. Literature-curated protein interaction datasets. Nat Methods 2009; 6:39-46. [PMID: 19116613 PMCID: PMC2683745 DOI: 10.1038/nmeth.1284] [Citation(s) in RCA: 213] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
High-quality datasets are needed to understand how global and local properties of protein-protein interaction, or 'interactome', networks relate to biological mechanisms, and to guide research on individual proteins. In an evaluation of existing curation of protein interaction experiments reported in the literature, we found that curation can be error-prone and possibly of lower quality than commonly assumed.
Collapse
Affiliation(s)
- Michael E Cusick
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
239
|
Han Y, Sun CH, Kim MS, Yi GS. Combined Database System for Binary Protein Interaction and Co-complex Association. 2009 INTERNATIONAL ASSOCIATION OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY - SPRING CONFERENCE 2009:538-542. [DOI: 10.1109/iacsit-sc.2009.42] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
240
|
Kumar P, Han BC, Shi Z, Jia J, Wang YP, Zhang YT, Liang L, Liu QF, Ji ZL, Chen YZ. Update of KDBI: Kinetic Data of Bio-molecular Interaction database. Nucleic Acids Res 2009; 37:D636-41. [PMID: 18971255 PMCID: PMC2686478 DOI: 10.1093/nar/gkn839] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Knowledge of the kinetics of biomolecular interactions is important for facilitating the study of cellular processes and underlying molecular events, and is essential for quantitative study and simulation of biological systems. Kinetic Data of Bio-molecular Interaction database (KDBI) has been developed to provide information about experimentally determined kinetic data of protein-protein, protein-nucleic acid, protein-ligand, nucleic acid-ligand binding or reaction events described in the literature. To accommodate increasing demand for studying and simulating biological systems, numerous improvements and updates have been made to KDBI, including new ways to access data by pathway and molecule names, data file in System Biology Markup Language format, more efficient search engine, access to published parameter sets of simulation models of 63 pathways, and 2.3-fold increase of data (19,263 entries of 10,532 distinctive biomolecular binding and 11,954 interaction events, involving 2635 proteins/protein complexes, 847 nucleic acids, 1603 small molecules and 45 multi-step processes). KDBI is publically available at http://bidd.nus.edu.sg/group/kdbi/kdbi.asp.
Collapse
Affiliation(s)
- Pankaj Kumar
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - B. C. Han
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Z. Shi
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - J. Jia
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Y. P. Wang
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Y. T. Zhang
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - L. Liang
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Q. F. Liu
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Z. L. Ji
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
| | - Y. Z. Chen
- Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543 and Bioinformatics Research Group, School of Life Sciences, Xiamen University, Xiamen 361005, FuJian Province, P. R. China
- *To whom correspondence should be addressed. Tel: +65 6516 6877; Fax: +65 6774 6756;
| |
Collapse
|
241
|
Kremer H, Cremers FPM. Positional cloning of deafness genes. Methods Mol Biol 2009; 493:215-238. [PMID: 18839350 DOI: 10.1007/978-1-59745-523-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The identification of the majority of the known causative genes involved in nonsyndromic sensorineural hearing loss (NSHL) started with linkage analysis as part of a positional cloning procedure. The human and mouse genome projects in combination with technical developments on genotyping, transcriptomics, proteomics, and the creation of animal models have greatly enhanced the speed of disease gene identification. In the present chapter, we first discuss the possibilities for exclusion of known NSHL loci and genes. Subsequently, methods are described to determine the genomic regions that contain the genetic defects. These include linkage analysis with genotyping and statistical evaluation and the determination of copy number variations. In the case of a large genomic region, candidate genes are selected and prioritized using gene expression analysis, protein network data, and phenotypes of animal models. A number of algorithms are described to automate the process of candidate gene selection. The novel high-throughput sequencing techniques might make gene selection and prioritization unnecessary in the near future. Once genetic variants are identified, questions on pathogenicity need to be addressed, which is the topic of the last section of this chapter.
Collapse
Affiliation(s)
- Hannie Kremer
- Department of Otorhinolaryngology, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | |
Collapse
|
242
|
Fu W, Sanders-Beer BE, Katz KS, Maglott DR, Pruitt KD, Ptak RG. Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res 2009; 37:D417-22. [PMID: 18927109 PMCID: PMC2686594 DOI: 10.1093/nar/gkn708] [Citation(s) in RCA: 183] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2008] [Revised: 09/26/2008] [Accepted: 09/29/2008] [Indexed: 11/15/2022] Open
Abstract
The 'Human Immunodeficiency Virus Type 1 (HIV-1), Human Protein Interaction Database', available through the National Library of Medicine at www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions, was created to catalog all interactions between HIV-1 and human proteins published in the peer-reviewed literature. The database serves the scientific community exploring the discovery of novel HIV vaccine candidates and therapeutic targets. To facilitate this discovery approach, the following information for each HIV-1 human protein interaction is provided and can be retrieved without restriction by web-based downloads and ftp protocols: Reference Sequence (RefSeq) protein accession numbers, Entrez Gene identification numbers, brief descriptions of the interactions, searchable keywords for interactions and PubMed identification numbers (PMIDs) of journal articles describing the interactions. Currently, 2589 unique HIV-1 to human protein interactions and 5135 brief descriptions of the interactions, with a total of 14,312 PMID references to the original articles reporting the interactions, are stored in this growing database. In addition, all protein-protein interactions documented in the database are integrated into Entrez Gene records and listed in the 'HIV-1 protein interactions' section of Entrez Gene reports. The database is also tightly linked to other databases through Entrez Gene, enabling users to search for an abundance of information related to HIV pathogenesis and replication.
Collapse
Affiliation(s)
- William Fu
- Southern Research Institute, Frederick, MD 21701, USA.
| | | | | | | | | | | |
Collapse
|
243
|
McDowall MD, Scott MS, Barton GJ. PIPs: human protein-protein interaction prediction database. Nucleic Acids Res 2009; 37:D651-6. [PMID: 18988626 PMCID: PMC2686497 DOI: 10.1093/nar/gkn870] [Citation(s) in RCA: 186] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 09/25/2008] [Accepted: 10/18/2008] [Indexed: 12/14/2022] Open
Abstract
The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein-protein interactions in human. It contains predictions of >37,000 high probability interactions of which >34,000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein-protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.
Collapse
Affiliation(s)
| | | | - Geoffrey J. Barton
- School of Life Sciences Research, College of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| |
Collapse
|
244
|
Zhu Y, Pan W, Shen X. Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data. STATISTICS AND ITS INTERFACE 2009; 2:257-269. [PMID: 20401316 PMCID: PMC2854644 DOI: 10.4310/sii.2009.v2.n3.a1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
With the availability of genetic pathways or networks and accumulating knowledge on genes with variants predisposing to diseases (disease genes), we propose a disease-gene-centric support vector machine (DGC-SVM) that directly incorporates these two sources of prior information into building microarray-based classifiers for binary classification problems. DGC-SVM aims to detect the genes clustering together and around some key disease genes in a gene network. To achieve this goal, we propose a penalty over suitably defined groups of genes. A hierarchy is imposed on an undirected gene network to facilitate the definition of such gene groups. Our proposed DGC-SVM utilizes the hinge loss penalized by a sum of the L(infinity)-norm being applied to each group. The simulation studies show that DGC-SVM not only detects more disease genes along pathways than the existing standard SVM and SVM with an L(1)-penalty (L1-SVM), but also captures disease genes that potentially affect the outcome only weakly. Two real data applications demonstrate that DGC-SVM improves gene selection with predictive performance comparable to the standard-SVM and L1-SVM. The proposed method has the potential to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.
Collapse
Affiliation(s)
- Yanni Zhu
- Division of Biostatistics, School of Public Health, University of Minnesota
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota
| | | |
Collapse
|
245
|
Prasad TSK, Kandasamy K, Pandey A. Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology. Methods Mol Biol 2009; 577:67-79. [PMID: 19718509 DOI: 10.1007/978-1-60761-232-2_6] [Citation(s) in RCA: 211] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Although high-throughput technologies used in biology have resulted in the accumulation of vast amounts of data in the literature, it is becoming difficult for individual investigators to directly benefit from this data because they are not easily accessible. Databases have assumed a crucial role in assimilating and storing information that could enable future discoveries. To this end, our group has developed two resources - Human Protein Reference Database (HPRD) and Human Proteinpedia - that provide integrated information pertaining to human proteins. These databases contain information on a number of features of proteins that have been discovered using various experimental methods. Human Proteinpedia was developed as a portal for community participation to annotate and share proteomic data using HPRD as the scaffold. It allows proteomic investigators to even share unpublished data and provides an effective medium for data sharing. As proteomic information reflects a direct view of cellular systems, proteomics is expected to complement other areas of biology such as genomics, transcriptomics, classical genetics, and chemical genetics in understanding the relationships among genome, gene functions, and living systems.
Collapse
|
246
|
Wong P, Althammer S, Hildebrand A, Kirschner A, Pagel P, Geissler B, Smialowski P, Blöchl F, Oesterheld M, Schmidt T, Strack N, Theis FJ, Ruepp A, Frishman D. An evolutionary and structural characterization of mammalian protein complex organization. BMC Genomics 2008; 9:629. [PMID: 19108706 PMCID: PMC2645396 DOI: 10.1186/1471-2164-9-629] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2008] [Accepted: 12/23/2008] [Indexed: 12/25/2022] Open
Abstract
Background We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. Results As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. Conclusion We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.
Collapse
Affiliation(s)
- Philip Wong
- Helmholtz Center Munich-German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology, Ingolstädter Landstrasse 1, Neuherberg, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
247
|
Vizcaíno JA, Mueller M, Hermjakob H, Martens L. Charting online OMICS resources: A navigational chart for clinical researchers. Proteomics Clin Appl 2008; 3:18-29. [PMID: 21136933 DOI: 10.1002/prca.200800082] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2008] [Indexed: 12/22/2022]
Abstract
The life sciences have sprouted several popular and successful OMICS technologies that span all levels of biological information transfer. Ever since the start of the Human Genome Project, the then revolutionary idea to make all resulting data publicly available has been central to all of the efforts across OMICS technologies. As a result, a great variety of publicly available data repositories and resources is currently available to the research community. This widespread availability of data does come at the price of increased confusion on the part of the users, especially for those that see the OMICS technologies as tools to help unravel a larger biological or clinical question. We therefore provide a comprehensive overview of the available resources across OMICS fields, with a special emphasis on those databases that are relevant to the study of proteins. Additionally, we also describe various integrative systems that have been established, and highlight new developments in the field that can revolutionize the way in which live data integration is achieved over the internet.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
248
|
Richardson CJ, Gao Q, Mitsopoulous C, Zvelebil M, Pearl LH, Pearl FMG. MoKCa database--mutations of kinases in cancer. Nucleic Acids Res 2008; 37:D824-31. [PMID: 18986996 PMCID: PMC2686448 DOI: 10.1093/nar/gkn832] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data.
Collapse
Affiliation(s)
- Christopher J Richardson
- Section of Structural Biology, Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London SW3 6JB, UK
| | | | | | | | | | | |
Collapse
|
249
|
Navratil V, de Chassey B, Meyniel L, Delmotte S, Gautier C, André P, Lotteau V, Rabourdin-Combe C. VirHostNet: a knowledge base for the management and the analysis of proteome-wide virus-host interaction networks. Nucleic Acids Res 2008; 37:D661-8. [PMID: 18984613 PMCID: PMC2686459 DOI: 10.1093/nar/gkn794] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Infectious diseases caused by viral agents kill millions of people every year. The improvement of prevention and treatment of viral infections and their associated diseases remains one of the main public health challenges. Towards this goal, deciphering virus-host molecular interactions opens new perspectives to understand the biology of infection and for the design of new antiviral strategies. Indeed, modelling of an infection network between viral and cellular proteins will provide a conceptual and analytic framework to efficiently formulate new biological hypothesis at the proteome scale and to rationalize drug discovery. Therefore, we present the first release of VirHostNet (Virus-Host Network), a public knowledge base specialized in the management and analysis of integrated virus-virus, virus-host and host-host interaction networks coupled to their functional annotations. VirHostNet integrates an extensive and original literature-curated dataset of virus-virus and virus-host interactions (2671 non-redundant interactions) representing more than 180 distinct viral species and one of the largest human interactome (10,672 proteins and 68,252 non-redundant interactions) reconstructed from publicly available data. The VirHostNet Web interface provides appropriate tools that allow efficient query and visualization of this infected cellular network. Public access to the VirHostNet knowledge-based system is available at http://pbildb1.univ-lyon1.fr/virhostnet.
Collapse
Affiliation(s)
- Vincent Navratil
- Université de Lyon, INRA, UMR754, Ecole Nationale Vétérinaire de Lyon, INSERM, U851, 21 avenue Tony Garnier, Lyon, F-69007, France.
| | | | | | | | | | | | | | | |
Collapse
|
250
|
Abstract
Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of genes and gene products in any organism. Although one might initially wonder what relevance a ‘controlled vocabulary’ might have for cardiovascular science, such a resource is proving highly useful for researchers investigating complex cardiovascular disease phenotypes as well as those interpreting results from high-throughput methodologies. GO enables the current functional knowledge of individual genes to be used to annotate genomic or proteomic datasets. In this way, the GO data provides a very effective way of linking biological knowledge with the analysis of the large datasets of post-genomics research. Consequently, users of high-throughput methodologies such as expression arrays or proteomics will be the main beneficiaries of such annotation sets. However, as GO annotations increase in quality and quantity, groups using small-scale approaches will gradually begin to benefit too. For example, genome wide association scans for coronary heart disease are identifying novel genes, with previously unknown connections to cardiovascular processes, and the comprehensive annotation of these novel genes might provide clues to their cardiovascular link. At least 4000 genes, to date, have been implicated in cardiovascular processes and an initiative is underway to focus on annotating these genes for the benefit of the cardiovascular community. In this article we review the current uses of Gene Ontology annotation to highlight why Gene Ontology should be of interest to all those involved in cardiovascular research.
Collapse
|