2451
|
Abstract
A comprehensive analysis of enriched functional categories in differentially expressed genes is important to extract the underlying biological processes of genome-wide expression profiles. Moreover, identification of the network of significant functional modules in these dynamic processes is an interesting challenge. This study introduces DynaMod, a web-based application that identifies significant functional modules reflecting the change of modularity and differential expressions that are correlated with gene expression profiles under different conditions. DynaMod allows the inspection of a wide variety of functional modules such as the biological pathways, transcriptional factor–target gene groups, microRNA–target gene groups, protein complexes and hub networks involved in protein interactome. The statistical significance of dynamic functional modularity is scored based on Z-statistics from the average of mutual information (MI) changes of involved gene pairs under different conditions. Significantly correlated gene pairs among the functional modules are used to generate a correlated network of functional categories. In addition to these main goals, this scoring strategy supports better performance to detect significant genes in microarray analyses, as the scores of correlated genes show the superior characteristics of the significance analysis compared with those of individual genes. DynaMod also offers cross-comparison between different analysis outputs. DynaMod is freely accessible at http://piech.kaist.ac.kr/dynamod.
Collapse
Affiliation(s)
- Choong-Hyun Sun
- Department of Computer Science, KAIST, Daejeon 305-701, South Korea
| | | | | | | |
Collapse
|
2452
|
Imamura H, Yachie N, Saito R, Ishihama Y, Tomita M. Towards the systematic discovery of signal transduction networks using phosphorylation dynamics data. BMC Bioinformatics 2010; 11:232. [PMID: 20459641 PMCID: PMC2875242 DOI: 10.1186/1471-2105-11-232] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Accepted: 05/07/2010] [Indexed: 01/23/2023] Open
Abstract
Background Phosphorylation is a ubiquitous and fundamental regulatory mechanism that controls signal transduction in living cells. The number of identified phosphoproteins and their phosphosites is rapidly increasing as a result of recent mass spectrometry-based approaches. Results We analyzed time-course phosphoproteome data obtained previously by liquid chromatography mass spectrometry with the stable isotope labeling using amino acids in cell culture (SILAC) method. This provides the relative phosphorylation activities of digested peptides at each of five time points after stimulating HeLa cells with epidermal growth factor (EGF). We initially calculated the correlations between the phosphorylation dynamics patterns of every pair of peptides and connected the strongly correlated pairs to construct a network. We found that peptides extracted from the same intracellular fraction (nucleus vs. cytoplasm) tended to be close together within this phosphorylation dynamics-based network. The network was then analyzed using graph theory and compared with five known signal-transduction pathways. The dynamics-based network was correlated with known signaling pathways in the NetPath and Phospho.ELM databases, and especially with the EGF receptor (EGFR) signaling pathway. Although the phosphorylation patterns of many proteins were drastically changed by the EGF stimulation, our results suggest that only EGFR signaling transduction was both strongly activated and precisely controlled. Conclusions The construction of a phosphorylation dynamics-based network provides a useful overview of condition-specific intracellular signal transduction using quantitative time-course phosphoproteome data under specific experimental conditions. Detailed prediction of signal transduction based on phosphoproteome dynamics remains challenging. However, since the phosphorylation profiles of kinase-substrate pairs on the specific pathway were localized in the dynamics-based network, our method will be a complementary strategy to explore new components of protein signaling pathways in combination with previous methods (including software) of predicting direct kinase-substrate relationships.
Collapse
Affiliation(s)
- Haruna Imamura
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | | | | | | | | |
Collapse
|
2453
|
Cunningham DL, Sweet SMM, Cooper HJ, Heath JK. Differential phosphoproteomics of fibroblast growth factor signaling: identification of Src family kinase-mediated phosphorylation events. J Proteome Res 2010; 9:2317-28. [PMID: 20225815 PMCID: PMC2950672 DOI: 10.1021/pr9010475] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Indexed: 01/12/2023]
Abstract
Activation of signal transduction by the receptor tyrosine kinase, fibroblast growth factor receptor (FGFR), results in a cascade of protein-protein interactions that rely on the occurrence of specific tyrosine phosphorylation events. One such protein recruited to the activated receptor complex is the nonreceptor tyrosine kinase, Src, which is involved in both initiation and termination of further signaling events. To gain a further understanding of the tyrosine phosphorylation events that occur during FGF signaling, with a specific focus on those that are dependent on Src family kinase (SFK) activity, we have applied SILAC combined with chemical inhibition of SFK activity to search for phosphorylation events that are dependent on SFK activity in FGF stimulated cells. In addition, we used a more targeted approach to carry out high coverage phosphopeptide mapping of one Src substrate protein, the multifunctional adaptor Dok1, and to identify SFK-dependent Dok1 binding partners. From these analyses we identify 80 SFK-dependent phosphorylation events on 40 proteins. We further identify 18 SFK-dependent Dok1 interactions and 9 SFK-dependent Dok1 phosphorylation sites, 6 of which had not previously been known to be SFK-dependent.
Collapse
Affiliation(s)
| | | | - Helen J. Cooper
- School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - John K. Heath
- To whom correspondence should be addressed. Prof. John K. Heath, School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K. Telephone: +44 (0)121 414 7533. Fax: +44 (0)121 414 5925.
| |
Collapse
|
2454
|
Gu J, Chen Y, Li S, Li Y. Identification of responsive gene modules by network-based gene clustering and extending: application to inflammation and angiogenesis. BMC SYSTEMS BIOLOGY 2010; 4:47. [PMID: 20406493 PMCID: PMC2873318 DOI: 10.1186/1752-0509-4-47] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 04/21/2010] [Indexed: 12/20/2022]
Abstract
BACKGROUND Cell responses to environmental stimuli are usually organized as relatively separate responsive gene modules at the molecular level. Identification of responsive gene modules rather than individual differentially expressed (DE) genes will provide important information about the underlying molecular mechanisms. Most of current methods formulate module identification as an optimization problem: find the active sub-networks in the genome-wide gene network by maximizing the objective function considering the gene differential expression and/or the gene-gene co-expression information. Here we presented a new formulation of this task: a group of closely-connected and co-expressed DE genes in the gene network are regarded as the signatures of the underlying responsive gene modules; the modules can be identified by finding the signatures and then recovering the "missing parts" by adding the intermediate genes that connect the DE genes in the gene network. RESULTS ClustEx, a two-step method based on the new formulation, was developed and applied to identify the responsive gene modules of human umbilical vein endothelial cells (HUVECs) in inflammation and angiogenesis models by integrating the time-course microarray data and genome-wide PPI data. It shows better performance than several available module identification tools by testing on the reference responsive gene sets. Gene set analysis of KEGG pathways, GO terms and microRNAs (miRNAs) target gene sets further supports the ClustEx predictions. CONCLUSION Taking the closely-connected and co-expressed DE genes in the condition-specific gene network as the signatures of the underlying responsive gene modules provides a new strategy to solve the module identification problem. The identified responsive gene modules of HUVECs and the corresponding enriched pathways/miRNAs provide useful resources for understanding the inflammatory and angiogenic responses of vascular systems.
Collapse
Affiliation(s)
- Jin Gu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Tsinghua National Laboratory for Information Science and Technology (TNLIST) and Department of Automation, Tsinghua University, Beijing 100084, China
| | | | | | | |
Collapse
|
2455
|
Huan T, Wu X, Chen JY. Systems biology visualization tools for drug target discovery. Expert Opin Drug Discov 2010; 5:425-39. [DOI: 10.1517/17460441003725102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
2456
|
Daemen A, Signoretto M, Gevaert O, Suykens JAK, De Moor B. Improved microarray-based decision support with graph encoded interactome data. PLoS One 2010; 5:e10225. [PMID: 20419106 PMCID: PMC2856685 DOI: 10.1371/journal.pone.0010225] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 03/28/2010] [Indexed: 12/31/2022] Open
Abstract
In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting (microRNA.org) outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method.
Collapse
Affiliation(s)
- Anneleen Daemen
- Department of Electrical Engineering ESAT/SCD, Katholieke Universiteit Leuven, Leuven, Belgium.
| | | | | | | | | |
Collapse
|
2457
|
Barton ER. Restoration of gamma-sarcoglycan localization and mechanical signal transduction are independent in murine skeletal muscle. J Biol Chem 2010; 285:17263-70. [PMID: 20371873 DOI: 10.1074/jbc.m109.063990] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Limb girdle muscular dystrophy 2C is caused by mutations in the gamma-sarcoglycan gene (gsg) that results in loss of this protein, and disruption of the sarcoglycan (SG) complex. Signal transduction after mechanical perturbation is mediated, in part, through the SG complex and leads to phosphorylation of tyrosines on the intracellular portions of the sarcoglycans. This study tested if the Tyr(6) in the intracellular region of gamma-sarcoglycan protein (gamma-SG) was necessary for proper localization of the protein in skeletal muscle membranes or for the normal pattern of ERK1/2 phosphorylation after eccentric contractions. Viral mediated gene transfer of wild type gsg (WTgsg) and mutant gsg lacking Tyr(6) (Y6Agsg) was performed into the muscles of gsg(-/-) mice. Muscles were examined for production and stability of the gamma-SG, as well as the level of ERK1/2 phosphorylation before and after eccentric contraction. Sarcolemmal localization of gamma-SG was achieved regardless of which construct was expressed. However, only expression of WTgsg corrected the aberrant ERK1/2 phosphorylation associated with the absence of gamma-SG, whereas Y6Agsg failed to have any effect. This study shows that localization of gamma-SG does not require Tyr(6), but localization alone is insufficient for restoration of normal signal transduction patterns after mechanical perturbation.
Collapse
Affiliation(s)
- Elisabeth R Barton
- Department of Anatomy and Cell Biology, School of Dental Medicine, and Pennsylvania Muscle Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| |
Collapse
|
2458
|
Hijikata A, Raju R, Keerthikumar S, Ramabadran S, Balakrishnan L, Ramadoss SK, Pandey A, Mohan S, Ohara O. Mutation@A Glance: an integrative web application for analysing mutations from human genetic diseases. DNA Res 2010; 17:197-208. [PMID: 20360267 PMCID: PMC2885273 DOI: 10.1093/dnares/dsq010] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Although mutation analysis serves as a key part in making a definitive diagnosis about a genetic disease, it still remains a time-consuming step to interpret their biological implications through integration of various lines of archived information about genes in question. To expedite this evaluation step of disease-causing genetic variations, here we developed Mutation@A Glance (http://rapid.rcai.riken.jp/mutation/), a highly integrated web-based analysis tool for analysing human disease mutations; it implements a user-friendly graphical interface to visualize about 40 000 known disease-associated mutations and genetic polymorphisms from more than 2600 protein-coding human disease-causing genes. Mutation@A Glance locates already known genetic variation data individually on the nucleotide and the amino acid sequences and makes it possible to cross-reference them with tertiary and/or quaternary protein structures and various functional features associated with specific amino acid residues in the proteins. We showed that the disease-associated missense mutations had a stronger tendency to reside in positions relevant to the structure/function of proteins than neutral genetic variations. From a practical viewpoint, Mutation@A Glance could certainly function as a ‘one-stop’ analysis platform for newly determined DNA sequences, which enables us to readily identify and evaluate new genetic variations by integrating multiple lines of information about the disease-causing candidate genes.
Collapse
Affiliation(s)
- Atsushi Hijikata
- Laboratory for Immunogenomics, RIKEN Research Center for Allergy and Immunology, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
2459
|
Analysis of diverse regulatory networks in a hierarchical context shows consistent tendencies for collaboration in the middle levels. Proc Natl Acad Sci U S A 2010; 107:6841-6. [PMID: 20351254 DOI: 10.1073/pnas.0910867107] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Gene regulatory networks have been shown to share some common aspects with commonplace social governance structures. Thus, we can get some intuition into their organization by arranging them into well-known hierarchical layouts. These hierarchies, in turn, can be placed between the extremes of autocracies, with well-defined levels and clear chains of command, and democracies, without such defined levels and with more co-regulatory partnerships between regulators. In general, the presence of partnerships decreases the variation in information flow amongst nodes within a level, more evenly distributing stress. Here we study various regulatory networks (transcriptional, modification, and phosphorylation) for five diverse species, Escherichia coli to human. We specify three levels of regulators--top, middle, and bottom--which collectively govern the non-regulator targets lying in the lowest fourth level. We define quantities for nodes, levels, and entire networks that measure their degree of collaboration and autocratic vs. democratic character. We show individual regulators have a range of partnership tendencies: Some regulate their targets in combination with other regulators in local instantiations of democratic structure, whereas others regulate mostly in isolation, in more autocratic fashion. Overall, we show that in all networks studied the middle level has the highest collaborative propensity and coregulatory partnerships occur most frequently amongst midlevel regulators, an observation that has parallels in corporate settings where middle managers must interact most to ensure organizational effectiveness. There is, however, one notable difference between networks in different species: The amount of collaborative regulation and democratic character increases markedly with overall genomic complexity.
Collapse
|
2460
|
Tranchevent LC, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, Moreau Y. A guide to web tools to prioritize candidate genes. Brief Bioinform 2010; 12:22-32. [PMID: 21278374 DOI: 10.1093/bib/bbq007] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
2461
|
Interaction networks as a tool to investigate the mechanisms of aging. Biogerontology 2010; 11:463-73. [PMID: 20213321 DOI: 10.1007/s10522-010-9268-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2009] [Accepted: 11/23/2009] [Indexed: 01/15/2023]
|
2462
|
Newman AM, Cooper JB. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics 2010; 11:117. [PMID: 20202218 PMCID: PMC2846907 DOI: 10.1186/1471-2105-11-117] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2009] [Accepted: 03/04/2010] [Indexed: 12/25/2022] Open
Abstract
Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.
Collapse
Affiliation(s)
- Aaron M Newman
- Biomolecular Science and Engineering Program, University of California, Santa Barbara, CA 93106, USA
| | | |
Collapse
|
2463
|
Liu S. Increasing alternative promoter repertories is positively associated with differential expression and disease susceptibility. PLoS One 2010; 5:e9482. [PMID: 20208995 PMCID: PMC2830428 DOI: 10.1371/journal.pone.0009482] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2009] [Accepted: 01/07/2010] [Indexed: 12/03/2022] Open
Abstract
Background Alternative Promoter (AP) usages have been shown to enable diversified transcriptional regulation of individual gene in a context-specific (e.g., pathway, cell lineage, tissue type, and development stage et. ac.) way. Aberrant uses of APs have been directly linked to mechanism of certain human diseases. However, whether or not there exists a general link between a gene's AP repertoire and its expression diversity is currently unknown. The general relation between a gene's AP repertoire and its disease susceptibility also remains largely unexplored. Methodology/Principal Findings Based on the differential expression ratio inferred from all human microarray data in NCBI GEO and the list of disease genes curated in public repositories, we systemically analyzed the general relation of AP repertoire with expression diversity and disease susceptibility. We found that genes with APs are more likely to be differentially expressed and/or disease associated than those with Single Promoter (SP), and genes with more APs are more likely differentially expressed and disease susceptible than those with less APs. Further analysis showed that genes with increased number of APs tend to have increased length in all aspects of gene structure including 3′ UTR, be associated with increased duplicability, and have increased connectivity in protein-protein interaction network. Conclusions Our genome-wide analysis provided evidences that increasing alternative promoter repertories is positively associated with differential expression and disease susceptibility.
Collapse
Affiliation(s)
- Song Liu
- Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, New York, United States of America.
| |
Collapse
|
2464
|
Malik R, Dulla K, Nigg EA, Körner R. From proteome lists to biological impact--tools and strategies for the analysis of large MS data sets. Proteomics 2010; 10:1270-1283. [PMID: 20077408 DOI: 10.1002/pmic.200900365] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Accepted: 11/16/2009] [Indexed: 01/03/2025]
Abstract
MS has become a method-of-choice for proteome analysis, generating large data sets, which reflect proteome-scale protein-protein interaction and PTM networks. However, while a rapid growth in large-scale proteomics data can be observed, the sound biological interpretation of these results clearly lags behind. Therefore, combined efforts of bioinformaticians and biologists have been made to develop strategies and applications to help experimentalists perform this crucial task. This review presents an overview of currently available analytical strategies and tools to extract biologically relevant information from large protein lists. Moreover, we also present current research publications making use of these tools as examples of how the presented strategies may be incorporated into proteomic workflows. Emphasis is placed on the analysis of Gene Ontology terms, interaction networks, biological pathways and PTMs. In addition, topics including domain analysis and text mining are reviewed in the context of computational analysis of proteomic results. We expect that these types of analyses will significantly contribute to a deeper understanding of the role of individual proteins, protein networks and pathways in complex systems.
Collapse
Affiliation(s)
- Rainer Malik
- Max Planck Institute of Biochemistry, Department of Cell Biology, Martinsried, Germany
| | | | | | | |
Collapse
|
2465
|
The NetAge database: a compendium of networks for longevity, age-related diseases and associated processes. Biogerontology 2010; 11:513-22. [PMID: 20186480 DOI: 10.1007/s10522-010-9265-8] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2010] [Accepted: 02/09/2010] [Indexed: 12/11/2022]
|
2466
|
Beisser D, Klau GW, Dandekar T, Müller T, Dittrich MT. BioNet: an R-Package for the functional analysis of biological networks. Bioinformatics 2010; 26:1129-30. [PMID: 20189939 DOI: 10.1093/bioinformatics/btq089] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Increasing quantity and quality of data in transcriptomics and interactomics create the need for integrative approaches to network analysis. Here, we present a comprehensive R-package for the analysis of biological networks including an exact and a heuristic approach to identify functional modules. RESULTS The BioNet package provides an extensive framework for integrated network analysis in R. This includes the statistics for the integration of transcriptomic and functional data with biological networks, the scoring of nodes as well as methods for network search and visualization. AVAILABILITY The BioNet package and a tutorial are available from http://bionet.bioapps.biozentrum.uni-wuerzburg.de.
Collapse
Affiliation(s)
- Daniela Beisser
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, 97074 Würzburg, Germany
| | | | | | | | | |
Collapse
|
2467
|
Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. ACTA ACUST UNITED AC 2010; 26:1057-63. [PMID: 20185403 PMCID: PMC2853684 DOI: 10.1093/bioinformatics/btq076] [Citation(s) in RCA: 233] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Motivation: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques. Results: We assessed the utility of physical protein interactions for determining gene–disease associations by examining the performance of seven recently developed computational methods (plus several of their variants). We found that random-walk approaches individually outperform clustering and neighborhood approaches, although most methods make predictions not made by any other method. We show how combining these methods into a consensus method yields Pareto optimal performance. We also quantified how a diffuse topological distribution of disease-related proteins negatively affects prediction quality and are thus able to identify diseases especially amenable to network-based predictions and others for which additional information sources are absolutely required. Availability: The predictions made by each algorithm considered are available online at http://www.cbcb.umd.edu/DiseaseNet Contact:carlk@cs.umd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Saket Navlakha
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies and Department of Computer Science, University of Maryland College Park, College Park, MD 20742, USA
| | | |
Collapse
|
2468
|
Raman K. Construction and analysis of protein-protein interaction networks. AUTOMATED EXPERIMENTATION 2010; 2:2. [PMID: 20334628 PMCID: PMC2834675 DOI: 10.1186/1759-4499-2-2] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 02/15/2010] [Indexed: 12/28/2022]
Abstract
Protein–protein interactions form the basis for a vast majority of cellular events, including signal transduction and transcriptional regulation. It is now understood that the study of interactions between cellular macromolecules is fundamental to the understanding of biological systems. Interactions between proteins have been studied through a number of high-throughput experiments and have also been predicted through an array of computational methods that leverage the vast amount of sequence data generated in the last decade. In this review, I discuss some of the important computational methods for the prediction of functional linkages between proteins. I then give a brief overview of some of the databases and tools that are useful for a study of protein–protein interactions. I also present an introduction to network theory, followed by a discussion of the parameters commonly used in analysing networks, important network topologies, as well as methods to identify important network components, based on perturbations.
Collapse
Affiliation(s)
- Karthik Raman
- Department of Biochemistry, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.
| |
Collapse
|
2469
|
Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol 2010; 6:e1000662. [PMID: 20140234 PMCID: PMC2816673 DOI: 10.1371/journal.pcbi.1000662] [Citation(s) in RCA: 229] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 12/30/2009] [Indexed: 11/18/2022] Open
Abstract
Current work in elucidating relationships between diseases has largely been based on pre-existing knowledge of disease genes. Consequently, these studies are limited in their discovery of new and unknown disease relationships. We present the first quantitative framework to compare and contrast diseases by an integrated analysis of disease-related mRNA expression data and the human protein interaction network. We identified 4,620 functional modules in the human protein network and provided a quantitative metric to record their responses in 54 diseases leading to 138 significant similarities between diseases. Fourteen of the significant disease correlations also shared common drugs, supporting the hypothesis that similar diseases can be treated by the same drugs, allowing us to make predictions for new uses of existing drugs. Finally, we also identified 59 modules that were dysregulated in at least half of the diseases, representing a common disease-state "signature". These modules were significantly enriched for genes that are known to be drug targets. Interestingly, drugs known to target these genes/proteins are already known to treat significantly more diseases than drugs targeting other genes/proteins, highlighting the importance of these core modules as prime therapeutic opportunities.
Collapse
Affiliation(s)
- Silpa Suthram
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Joel T. Dudley
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Annie P. Chiang
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Rong Chen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| | - Trevor J. Hastie
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Atul J. Butte
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, United States of America
- Department of Pediatrics, Stanford University, Stanford, California, United States of America
- Lucile Packard Children's Hospital, Palo Alto, California, United States of America
| |
Collapse
|
2470
|
Gong X, Wu R, Zhang Y, Zhao W, Cheng L, Gu Y, Zhang L, Wang J, Zhu J, Guo Z. Extracting consistent knowledge from highly inconsistent cancer gene data sources. BMC Bioinformatics 2010; 11:76. [PMID: 20137077 PMCID: PMC2832783 DOI: 10.1186/1471-2105-11-76] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 02/05/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency. RESULTS First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census. CONCLUSIONS Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
Collapse
Affiliation(s)
- Xue Gong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
2471
|
Wu CC, Asgharzadeh S, Triche TJ, D'Argenio DZ. Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. ACTA ACUST UNITED AC 2010; 26:807-13. [PMID: 20134029 DOI: 10.1093/bioinformatics/btq044] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Three major problems confront the construction of a human genetic network from heterogeneous genomics data using kernel-based approaches: definition of a robust gold-standard negative set, large-scale learning and massive missing data values. RESULTS The proposed graph-based approach generates a robust GSN for the training process of genetic network construction. The RVM-based ensemble model that combines AdaBoost and reduced-feature yields improved performance on large-scale learning problems with massive missing values in comparison to Naïve Bayes. CONTACT dargenio@bmsr.usc.edu SUPPLEMENTARY INFORMATION Supplementary material is available at Bioinformatics online.
Collapse
Affiliation(s)
- Chia-Chin Wu
- Department of Biomedical Engineering, University of Southern California, Los Angeles, 90089, USA
| | | | | | | |
Collapse
|
2472
|
Pattin KA, Moore JH. Role for protein-protein interaction databases in human genetics. Expert Rev Proteomics 2010; 6:647-59. [PMID: 19929610 DOI: 10.1586/epr.09.86] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteomics and the study of protein-protein interactions are becoming increasingly important in our effort to understand human diseases on a system-wide level. Thanks to the development and curation of protein-interaction databases, up-to-date information on these interaction networks is accessible and publicly available to the scientific community. As our knowledge of protein-protein interactions increases, it is important to give thought to the different ways that these resources can impact biomedical research. In this article, we highlight the importance of protein-protein interactions in human genetics and genetic epidemiology. Since protein-protein interactions demonstrate one of the strongest functional relationships between genes, combining genomic data with available proteomic data may provide us with a more in-depth understanding of common human diseases. In this review, we will discuss some of the fundamentals of protein interactions, the databases that are publicly available and how information from these databases can be used to facilitate genome-wide genetic studies.
Collapse
Affiliation(s)
- Kristine A Pattin
- Computational Genetics Laboratory and Department of Genetics, Dartmouth Medical School, Lebanon, NH, USA.
| | | |
Collapse
|
2473
|
Guarracino MR, Nebbia A, Manna V, Chinchuluun A, Pardalos PM. Efficient Prediction of Protein-Protein Interactions Using Sequence Information. 2010 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS 2010:677-682. [DOI: 10.1109/cisis.2010.161] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2474
|
Garcia-Garcia J, Guney E, Aragues R, Planas-Iglesias J, Oliva B. Biana: a software framework for compiling biological interactions and analyzing networks. BMC Bioinformatics 2010; 11:56. [PMID: 20105306 PMCID: PMC3098100 DOI: 10.1186/1471-2105-11-56] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2009] [Accepted: 01/27/2010] [Indexed: 12/13/2022] Open
Abstract
Background The analysis and usage of biological data is hindered by the spread of information across multiple repositories and the difficulties posed by different nomenclature systems and storage formats. In particular, there is an important need for data unification in the study and use of protein-protein interactions. Without good integration strategies, it is difficult to analyze the whole set of available data and its properties. Results We introduce BIANA (Biologic Interactions and Network Analysis), a tool for biological information integration and network management. BIANA is a Python framework designed to achieve two major goals: i) the integration of multiple sources of biological information, including biological entities and their relationships, and ii) the management of biological information as a network where entities are nodes and relationships are edges. Moreover, BIANA uses properties of proteins and genes to infer latent biomolecular relationships by transferring edges to entities sharing similar properties. BIANA is also provided as a plugin for Cytoscape, which allows users to visualize and interactively manage the data. A web interface to BIANA providing basic functionalities is also available. The software can be downloaded under GNU GPL license from http://sbi.imim.es/web/BIANA.php. Conclusions BIANA's approach to data unification solves many of the nomenclature issues common to systems dealing with biological data. BIANA can easily be extended to handle new specific data repositories and new specific data types. The unification protocol allows BIANA to be a flexible tool suitable for different user requirements: non-expert users can use a suggested unification protocol while expert users can define their own specific unification rules.
Collapse
Affiliation(s)
- Javier Garcia-Garcia
- Structural Bioinformatics Lab, Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine, Barcelona, Catalonia, Spain
| | | | | | | | | |
Collapse
|
2475
|
Huang Y, Li S. Detection of characteristic sub pathway network for angiogenesis based on the comprehensive pathway network. BMC Bioinformatics 2010; 11 Suppl 1:S32. [PMID: 20122205 PMCID: PMC3009504 DOI: 10.1186/1471-2105-11-s1-s32] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Background Pathways in biological system often cooperate with each other to function. Changes of interactions among pathways tightly associate with alterations in the properties and functions of the cell and hence alterations in the phenotype. So, the pathway interactions and especially their changes over time corresponding to specific phenotype are critical to understanding cell functions and phenotypic plasticity. Methods With prior-defined pathways and incorporated protein-protein interaction (PPI) data, we counted PPIs between corresponding gene sets of each pair of distinct pathways to construct a comprehensive pathway network. Then we proposed a novel concept, characteristic sub pathway network (CSPN), to realize the phenotype-specific pathway interactions. By adding gene expression data regarding a given phenotype, angiogenesis, active PPIs corresponding to stimulation of interleukin-1 (IL-1) and tumor necrosis factor α (TNF-α) on human umbilical vein endothelial cells (HUVECs) respectively were derived. Two kinds of CSPN, namely the static or the dynamic CSPN, were detected by counting active PPIs. Results A comprehensive pathway network containing 37 signalling pathways as nodes and 263 pathway interactions were obtained. Two phenotype-specific CSPNs for angiogenesis, corresponding to stimulation of IL-1 and TNF-α on HUVEC respectively, were addressed. From phenotype-specific CSPNs, a static CSPN involving interactions among B cell receptor, T cell receptor, Toll-like receptor, MAPK, VEGF, and ErbB signalling pathways, and a dynamic CSPN involving interactions among TGF-β, Wnt, p53 signalling pathways and cell cycle pathway, were detected for angiogenesis on HUVEC after stimulation of IL-1 and TNF-α respectively. We inferred that, in certain case, the static CSPN maintains related basic functions of the cells, whereas the dynamic CSPN manifests the cells' plastic responses to stimulus and therefore reflects the cells' phenotypic plasticity. Conclusion The comprehensive pathway network helps us realize the cooperative behaviours among pathways. Moreover, two kinds of potential CSPNs found in this work, the static CSPN and the dynamic CSPN, are helpful to deeply understand the specific function of HUVEC and its phenotypic plasticity in regard to angiogenesis.
Collapse
Affiliation(s)
- Yezhou Huang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Div, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, PR China.
| | | |
Collapse
|
2476
|
Abstract
BACKGROUND Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database. RESULTS We have developed a new method for representing protein-protein interactions and the Gene Ontology (GO) using modified Gödel numbers. This representation is hidden from users but enables a search engine using the representation to efficiently search protein-protein interactions in a biologically meaningful way. Given a query protein with optional search conditions expressed in one or more GO terms, the search engine finds all the interaction partners of the query protein by unique prime factorization of the modified Gödel numbers representing the query protein and the search conditions. CONCLUSION Representing the biological relations of proteins and their GO annotations by modified Gödel numbers makes a search engine efficiently find all protein-protein interactions by prime factorization of the numbers. Keyword matching or ID matching search methods often miss the interactions involving a protein that has no explicit annotations matching the search condition, but our search engine retrieves such interactions as well if they satisfy the search condition with a more specific term in the ontology.
Collapse
Affiliation(s)
- Byungkyu Park
- School of Computer Science and Engineering, Inha University, Incheon 402-751, South Korea
| | - Kyungsook Han
- School of Computer Science and Engineering, Inha University, Incheon 402-751, South Korea
| |
Collapse
|
2477
|
Kandasamy K, Mohan SS, Raju R, Keerthikumar S, Kumar GSS, Venugopal AK, Telikicherla D, Navarro JD, Mathivanan S, Pecquet C, Gollapudi SK, Tattikota SG, Mohan S, Padhukasahasram H, Subbannayya Y, Goel R, Jacob HKC, Zhong J, Sekhar R, Nanjappa V, Balakrishnan L, Subbaiah R, Ramachandra YL, Rahiman BA, Prasad TSK, Lin JX, Houtman JCD, Desiderio S, Renauld JC, Constantinescu SN, Ohara O, Hirano T, Kubo M, Singh S, Khatri P, Draghici S, Bader GD, Sander C, Leonard WJ, Pandey A. NetPath: a public resource of curated signal transduction pathways. Genome Biol 2010; 11:R3. [PMID: 20067622 PMCID: PMC2847715 DOI: 10.1186/gb-2010-11-1-r3] [Citation(s) in RCA: 351] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Revised: 11/02/2009] [Accepted: 01/12/2010] [Indexed: 12/18/2022] Open
Abstract
NetPath, a novel community resource of curated human signaling pathways is presented and its utility demonstrated using immune signaling data. We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches.
Collapse
Affiliation(s)
- Kumaran Kandasamy
- Institute of Bioinformatics, International Tech Park, Bangalore 560066, India.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2478
|
Lin J, Xie Z, Zhu H, Qian J. Understanding protein phosphorylation on a systems level. Brief Funct Genomics 2010; 9:32-42. [PMID: 20056723 DOI: 10.1093/bfgp/elp045] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Protein kinase phosphorylation is central to the regulation and control of protein and cellular function. Over the past decade, the development of many high-throughput approaches has revolutionized the understanding of protein phosphorylation and allowed rapid and unbiased surveys of phosphoproteins and phosphorylation events. In addition to this technological advancement, there have also been computational improvements; recent studies on network models of protein phosphorylation have provided many insights into the cellular processes and pathways regulated by phosphorylation. This article gives an overview of experimental and computational techniques for identifying and analyzing protein phosphorylation on a systems level.
Collapse
Affiliation(s)
- Jimmy Lin
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | | | | |
Collapse
|
2479
|
Dong H, Hong S, Xu X, Xiao Y, Jin L, Xiong M. Meta-analysis and Network Analysis of Five Ovarian Cancer Gene Expression Dataset. 2010 THIRD INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCE AND OPTIMIZATION 2010:242-246. [DOI: 10.1109/cso.2010.245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2480
|
Vandin F, Upfal E, Raphael BJ. Algorithms for Detecting Significantly Mutated Pathways in Cancer. LECTURE NOTES IN COMPUTER SCIENCE 2010:506-521. [DOI: 10.1007/978-3-642-12683-3_33] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2481
|
Hernandez M, Lachmann A, Zhao S, Xiao K, Ma'ayan A. Inferring the Sign of Kinase-Substrate Interactions by Combining Quantitative Phosphoproteomics with a Literature-Based Mammalian Kinome Network. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2010; 2010:180-184. [PMID: 21552464 PMCID: PMC3087296 DOI: 10.1109/bibe.2010.75] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein phosphorylation is a reversible post-translational modification commonly used by cell signaling networks to transmit information about the extracellular environment into intracellular organelles for the regulation of the activity and sorting of proteins within the cell. For this study we reconstructed a literature-based mammalian kinase-substrate network from several online resources. The interactions within this directed graph network connect kinases to their substrates, through specific phosphosites including kinasekinase regulatory interactions. However, the "signs" of links, activation or inhibition of the substrate upon phosphorylation, within this network are mostly unknown. Here we show how we can infer the "signs" indirectly using data from quantitative phosphoproteomics experiments applied to mammalian cells combined with the literature-based kinase-substrate network. Our inference method was able to predict the sign for 321 links and 153 phosphosites on 120 kinases, resulting in signed and directed subnetwork of mammalian kinase-kinase interactions. Such an approach can rapidly advance the reconstruction of cell signaling pathways and networks regulating mammalian cells.
Collapse
Affiliation(s)
- Marylens Hernandez
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Alexander Lachmann
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Shan Zhao
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, NY 10029, USA
| | - Kunhong Xiao
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA
| | - Avi Ma'ayan
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, NY 10029, USA
| |
Collapse
|
2482
|
Iacucci E, Moreau Y. Towards Better Receptor-Ligand Prioritization: How Machine Learning on Protein-Protein Interaction Data Can Provide Insight Into Receptor-Ligand Pairs. LECTURE NOTES IN COMPUTER SCIENCE 2010:267-271. [DOI: 10.1007/978-3-642-15819-3_35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2483
|
Ochs MF. Knowledge-based data analysis comes of age. Brief Bioinform 2010; 11:30-9. [PMID: 19854753 PMCID: PMC3700349 DOI: 10.1093/bib/bbp044] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Revised: 09/03/2009] [Indexed: 12/16/2022] Open
Abstract
The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimates may not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models. We show that novel biological insights have been gained using these techniques.
Collapse
Affiliation(s)
- Michael F Ochs
- Division of Oncology Biostatistics and Bioinformatics, 550 North Broadway, Suite 1103, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
2484
|
Joughin BA, Cheung E, Karuturi RKM, Saez-Rodriguez J, Lauffenburger DA, Liu ET. Cellular Regulatory Networks. SYSTEMS BIOMEDICINE 2010:57-108. [DOI: 10.1016/b978-0-12-372550-9.00004-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
2485
|
Liu GG, Fong E, Zeng X. GNCPro: navigate human genes and relationships through net-walking. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2010; 680:253-9. [PMID: 20865508 DOI: 10.1007/978-1-4419-5913-3_29] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
UNLABELLED The use of computational applications in biological research is significantly lagging behind other scientific research areas such as physics, mathematics, and geology; more in silico tools are needed. The increasing complexity of biological data makes it more and more difficult for scientists to verify their hypotheses and results against existing discoveries. GNCPro is a free data integration and visualization tool for gaining comprehensive overviews of such complicated biological knowledge. In particular, GNCPro warehouses and encodes biological information as binary relationships. When represented graphically, these binary relationships take on the form of edges that connect the genes and proteins, which are represented by nodes. By using distinguishing features such as colors, shape, and opacity, GNCPro provides a stimulating visual experience in which the user can quickly identify groups of genes by annotations and the types of relationships involved. GNCPro integrates human gene expressions, regulations, gene product modifications, and interactions into one platform while delivering a simple and powerful user interface for systems biology study. AVAILABILITY http://GNCPro.sabiosciences.com.
Collapse
Affiliation(s)
- Guozhen Gordon Liu
- SABiosciences Corporation, 6951 Executive Way, Frederick, MD 21703, USA.
| | | | | |
Collapse
|
2486
|
Syed AS, D’Antonio M, Ciccarelli FD. Network of Cancer Genes: a web resource to analyze duplicability, orthology and network properties of cancer genes. Nucleic Acids Res 2010; 38:D670-5. [PMID: 19906700 PMCID: PMC2808873 DOI: 10.1093/nar/gkp957] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Revised: 10/02/2009] [Accepted: 10/13/2009] [Indexed: 01/19/2023] Open
Abstract
The Network of Cancer Genes (NCG) collects and integrates data on 736 human genes that are mutated in various types of cancer. For each gene, NCG provides information on duplicability, orthology, evolutionary appearance and topological properties of the encoded protein in a comprehensive version of the human protein-protein interaction network. NCG also stores information on all primary interactors of cancer proteins, thus providing a complete overview of 5357 proteins that constitute direct and indirect determinants of human cancer. With the constant delivery of results from the mutational screenings of cancer genomes, NCG represents a versatile resource for retrieving detailed information on particular cancer genes, as well as for identifying common properties of precompiled lists of cancer genes. NCG is freely available at: http://bio.ifom-ieo-campus.it/ncg.
Collapse
Affiliation(s)
| | | | - Francesca D. Ciccarelli
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| |
Collapse
|
2487
|
Zhao J, Jiang P, Zhang W. Molecular networks for the study of TCM pharmacology. Brief Bioinform 2009; 11:417-30. [PMID: 20038567 DOI: 10.1093/bib/bbp063] [Citation(s) in RCA: 163] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
To target complex, multi-factorial diseases more effectively, there has been an emerging trend of multi-target drug development based on network biology, as well as an increasing interest in traditional Chinese medicine (TCM) that applies a more holistic treatment to diseases. Thousands of years' clinic practices in TCM have accumulated a considerable number of formulae that exhibit reliable in vivo efficacy and safety. However, the molecular mechanisms responsible for their therapeutic effectiveness are still unclear. The development of network-based systems biology has provided considerable support for the understanding of the holistic, complementary and synergic essence of TCM in the context of molecular networks. This review introduces available sources and methods that could be utilized for the network-based study of TCM pharmacology, proposes a workflow for network-based TCM pharmacology study, and presents two case studies on applying these sources and methods to understand the mode of action of TCM recipes.
Collapse
Affiliation(s)
- Jing Zhao
- Department of Natural Medicinal Chemistry, Second Military Medical University, PR China
| | | | | |
Collapse
|
2488
|
Yang G, Li Q, Ren S, Lu X, Fang L, Zhou W, Zhang F, Xu F, Zhang Z, Zeng R, Lottspeich F, Chen Z. Proteomic, functional and motif-based analysis of C-terminal Src kinase-interacting proteins. Proteomics 2009; 9:4944-61. [PMID: 19743411 DOI: 10.1002/pmic.200800762] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
C-terminal Src kinase (Csk) that functions as an essential negative regulator of Src family tyrosine kinases (SFKs) interacts with tyrosine-phosphorylated molecules through its Src homology 2 (SH2) domain, allowing it targeting to the sites of SFKs and concomitantly enhancing its kinase activity. Identification of additional Csk-interacting proteins is expected to reveal potential signaling targets and previously undescribed functions of Csk. In this study, using a direct proteomic approach, we identified 151 novel potential Csk-binding partners, which are associated with a wide range of biological functions. Bioinformatics analysis showed that the majority of identified proteins contain one or several Csk-SH2 domain-binding motifs, indicating a potentially direct interaction with Csk. The interactions of Csk with four proteins (partitioning defective 3 (Par3), DDR1, SYK and protein kinase C iota) were confirmed using biochemical approaches and phosphotyrosine 1127 of Par3 C-terminus was proved to directly bind to Csk-SH2 domain, which was consistent with predictions from in silico analysis. Finally, immunofluorescence experiments revealed co-localization of Csk with Par3 in tight junction (TJ) in a tyrosine phosphorylation-dependent manner and overexpression of Csk, but not its SH2-domain mutant lacking binding to phosphotyrosine, promoted the TJ assembly in Madin-Darby canine kidney cells, implying the involvement of Csk-SH2 domain in regulating cellular TJs. In conclusion, the newly identified potential interacting partners of Csk provided new insights into its functional diversity in regulation of numerous cellular events, in addition to controlling the SFK activity.
Collapse
Affiliation(s)
- Guang Yang
- State Key Laboratory of Molecular Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, PR China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2489
|
Ramírez F, Albrecht M. Finding scaffold proteins in interactomes. Trends Cell Biol 2009; 20:2-4. [PMID: 20005715 DOI: 10.1016/j.tcb.2009.11.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Revised: 11/10/2009] [Accepted: 11/16/2009] [Indexed: 11/29/2022]
|
2490
|
Annibale A, Coolen A, Fernandes L, Fraternali F, Kleinjung J. Tailored graph ensembles as proxies or null models for real networks I: tools for quantifying structure. JOURNAL OF PHYSICS A: MATHEMATICAL AND GENERAL 2009; 42:485001. [PMID: 20844594 PMCID: PMC2938474 DOI: 10.1088/1751-8113/42/48/485001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
We study the tailoring of structured random graph ensembles to real networks, with the objective of generating precise and practical mathematical tools for quantifying and comparing network topologies macroscopically, beyond the level of degree statistics. Our family of ensembles can produce graphs with any prescribed degree distribution and any degree-degree correlation function, its control parameters can be calculated fully analytically, and as a result we can calculate (asymptotically) formulae for entropies and complexities, and for information-theoretic distances between networks, expressed directly and explicitly in terms of their measured degree distribution and degree correlations.
Collapse
Affiliation(s)
- A Annibale
- Department of Mathematics, King's College London, The Strand, London WC2R 2LS, United Kingdom
| | | | | | | | | |
Collapse
|
2491
|
Yang JO, Kim WY, Jeong SY, Oh JH, Jho S, Bhak J, Kim NS. PDbase: a database of Parkinson's disease-related genes and genetic variation using substantia nigra ESTs. BMC Genomics 2009; 10 Suppl 3:S32. [PMID: 19958497 PMCID: PMC2788386 DOI: 10.1186/1471-2164-10-s3-s32] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Background Parkinson's disease (PD) is one of the most common neurodegenerative disorders, clinically characterized by impaired motor function. Since the etiology of PD is diverse and complex, many researchers have created PD-related research resources. However, resources for brain and PD studies are still lacking. Therefore, we have constructed a database of PD-related gene and genetic variations using the substantia nigra (SN) in PD and normal tissues. In addition, we integrated PD-related information from several resources. Results We collected the 6,130 SN expressed sequenced tags (ESTs) from brain SN normal tissues and PD patients SN tissues using full-cDNA library and normalized cDNA library construction methods from our previous study. The SN ESTs were clustered in 2,951 unigene clusters and assigned in 2,678 genes. We then found up-regulated 57 genes and down-regulated 48 genes by comparing normal and PD SN ESTs frequencies with over 0.9 cut-off probability of differential expression based on the Audic and Claverie method. In addition, we integrated disease-related information from public resources. To examine the characteristics of these PD-related genes, we analyzed alternative splicing events, single nucleotide polymorphism (SNP) markers located in the gene regions, repeat elements, gene regulation elements, and pathways and protein-protein interaction networks. Conclusion We constructed the PDbase database to capture the PD-related gene, genetic variation, and functional elements. This database contains 2,698 PD-related genes through ESTs discovered from human normal and PD patients SN tissues, and through integrating several public resources. PDbase provides the mitochondrion proteins, microRNA gene regulation elements, single nucleotide polymorphisms (SNPs) markers within PD-related gene structures, repeat elements, and pathways and networks with protein-protein interaction information. The PDbase information can aid in understanding the causation of PD. It is available at http://bioportal.kobic.re.kr/PDbase/. Supplementary data is available at http://bioportal.kobic.re.kr/PDbase/suppl.jsp
Collapse
Affiliation(s)
- Jin Ok Yang
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), 111 Gwahangno, Yuseong-gu, Daejeon 305-806, Korea.
| | | | | | | | | | | | | |
Collapse
|
2492
|
Zhao M, Qu H. Human liver rate-limiting enzymes influence metabolic flux via branch points and inhibitors. BMC Genomics 2009; 10 Suppl 3:S31. [PMID: 19958496 PMCID: PMC2788385 DOI: 10.1186/1471-2164-10-s3-s31] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Rate-limiting enzymes, because of their relatively low velocity, are believed to influence metabolic flux in pathways. To investigate their regulatory role in metabolic networks, we look at the global organization and interactions between rate-limiting enzymes and compounds such as branch point metabolites and enzyme inhibitors in human liver. Results Based on 96 rate-limiting enzymes and 132 branch point compounds from human liver, we found that rate-limiting enzymes surrounded 76.5% of branch points. In a compound conversion network from human liver, the 128 branch points involved showed a dramatically higher average degree, betweenness centrality and closeness centrality as a whole. Nearly half of the in vivo inhibitors were products of rate-limiting enzymes, and covered 75.34% of the inhibited targets in metabolic inhibitory networks. Conclusion From global topological organization, rate-limiting enzymes as a whole surround most of the branch points; so they can influence the flux through branch points. Since nearly half of the in vivo enzyme inhibitors are produced by rate-limiting enzymes in human liver, these enzymes can initiate inhibitory regulation and then influence metabolic flux through their natural products.
Collapse
Affiliation(s)
- Min Zhao
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, PR China.
| | | |
Collapse
|
2493
|
MicroRNAs: potential regulators involved in human anencephaly. Int J Biochem Cell Biol 2009; 42:367-74. [PMID: 19962448 DOI: 10.1016/j.biocel.2009.11.023] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Revised: 11/05/2009] [Accepted: 11/11/2009] [Indexed: 11/21/2022]
Abstract
MicroRNAs (miRNAs) are posttranscriptional regulators of messenger RNA activity. Neural tube defects (NTDs) are severe congenital anomalies that substantially impact an infant's morbidity and mortality. The miRNAs are known to be dynamically regulated during neurodevelopment; their role in human NTDs, however, is still unknown. In this study, we show the presence of a specific miRNA expression profile from tissues of fetuses with anencephaly, one of the most severe forms of NTDs. Furthermore, we map the target genes of these miRNAs in the human genome. In comparison to healthy human fetal brain tissues, tissues from fetuses with anencephaly exhibited 97 down-regulated and 116 up-regulated miRNAs. The microarray findings were extended using real-time qRT-PCR for nine miRNAs. Specifically, of these validated miRNAs, miR-126, miR-198, and miR-451 were up-regulated, while miR-9, miR-212, miR-124, miR-138, and miR-103/107 were down-regulated in the tissues of fetuses with anencephaly. A bioinformatic analysis showed 881 potential target genes that are regulated by the validated miRNAs. Seventy-nine of these potential genes are involved in a protein interaction network. There were 6 co-occurrence annotations within the GOSlim process and 7 co-occurrence annotations within the GOSlim function found by GeneCodis 2.0. Our results suggest that miRNA dysregulation is possibly involved in the pathogenesis of anencephaly.
Collapse
|
2494
|
Ali W, Deane CM. Functionally guided alignment of protein interaction networks for module detection. Bioinformatics 2009; 25:3166-73. [PMID: 19797409 PMCID: PMC2778333 DOI: 10.1093/bioinformatics/btp569] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2009] [Revised: 09/25/2009] [Accepted: 09/29/2009] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Functional module detection within protein interaction networks is a challenging problem due to the sparsity of data and presence of errors. Computational techniques for this task range from purely graph theoretical approaches involving single networks to alignment of multiple networks from several species. Current network alignment methods all rely on protein sequence similarity to map proteins across species. RESULTS Here we carry out network alignment using a protein functional similarity measure. We show that using functional similarity to map proteins across species improves network alignment in terms of functional coherence and overlap with experimentally verified protein complexes. Moreover, the results from functional similarity-based network alignment display little overlap (<15%) with sequence similarity-based alignment. Our combined approach integrating sequence and function-based network alignment alongside graph clustering properties offers a 200% increase in coverage of experimental datasets and comparable accuracy to current network alignment methods. AVAILABILITY Program binaries and source code is freely available at http://www.stats.ox.ac.uk/research/bioinfo/resources. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Waqar Ali
- Department of Statistics, University of Oxford, OX1 3TG, UK.
| | | |
Collapse
|
2495
|
Amanchy R, Zhong J, Hong R, Kim JH, Gucek M, Cole RN, Molina H, Pandey A. Identification of c-Src tyrosine kinase substrates in platelet-derived growth factor receptor signaling. Mol Oncol 2009; 3:439-50. [PMID: 19632164 PMCID: PMC2783305 DOI: 10.1016/j.molonc.2009.07.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2009] [Revised: 06/17/2009] [Accepted: 07/04/2009] [Indexed: 11/20/2022] Open
Abstract
c-Src non-receptor tyrosine kinase is an important component of the platelet-derived growth factor (PDGF) receptor signaling pathway. c-Src has been shown to mediate the mitogenic response to PDGF in fibroblasts. However, the exact components of PDGF receptor signaling pathway mediated by c-Src remain unclear. Here, we used stable isotope labeling with amino acids in cell culture (SILAC) coupled with mass spectrometry to identify Src-family kinase substrates involved in PDGF signaling. Using SILAC, we were able to detect changes in tyrosine phosphorylation patterns of 43 potential c-Src kinase substrates in PDGF receptor signaling. This included 23 known c-Src kinase substrates, of which 16 proteins have known roles in PDGF signaling while the remaining 7 proteins have not previously been implicated in PDGF receptor signaling. Importantly, our analysis also led to identification of 20 novel Src-family kinase substrates, of which 5 proteins were previously reported as PDGF receptor signaling pathway intermediates while the remaining 15 proteins represent novel signaling intermediates in PDGF receptor signaling. In validation experiments, we demonstrated that PDGF indeed induced the phosphorylation of a subset of candidate Src-family kinase substrates - Calpain 2, Eps15 and Trim28 - in a c-Src-dependent fashion.
Collapse
Affiliation(s)
- Ramars Amanchy
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Jun Zhong
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Rosa Hong
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - James H. Kim
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Marjan Gucek
- Institute of Basic Biomedical Sciences, Mass Spectrometry/Proteomics Facility, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Robert N. Cole
- Institute of Basic Biomedical Sciences, Mass Spectrometry/Proteomics Facility, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Henrik Molina
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Departments of Biological Chemistry, Oncology and Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
2496
|
Barrenas F, Chavali S, Holme P, Mobini R, Benson M. Network properties of complex human disease genes identified through genome-wide association studies. PLoS One 2009; 4:e8090. [PMID: 19956617 PMCID: PMC2779513 DOI: 10.1371/journal.pone.0008090] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 11/03/2009] [Indexed: 11/21/2022] Open
Abstract
Background Previous studies of network properties of human disease genes have mainly focused on monogenic diseases or cancers and have suffered from discovery bias. Here we investigated the network properties of complex disease genes identified by genome-wide association studies (GWAs), thereby eliminating discovery bias. Principal findings We derived a network of complex diseases (n = 54) and complex disease genes (n = 349) to explore the shared genetic architecture of complex diseases. We evaluated the centrality measures of complex disease genes in comparison with essential and monogenic disease genes in the human interactome. The complex disease network showed that diseases belonging to the same disease class do not always share common disease genes. A possible explanation could be that the variants with higher minor allele frequency and larger effect size identified using GWAs constitute disjoint parts of the allelic spectra of similar complex diseases. The complex disease gene network showed high modularity with the size of the largest component being smaller than expected from a randomized null-model. This is consistent with limited sharing of genes between diseases. Complex disease genes are less central than the essential and monogenic disease genes in the human interactome. Genes associated with the same disease, compared to genes associated with different diseases, more often tend to share a protein-protein interaction and a Gene Ontology Biological Process. Conclusions This indicates that network neighbors of known disease genes form an important class of candidates for identifying novel genes for the same disease.
Collapse
Affiliation(s)
- Fredrik Barrenas
- The Unit for Clinical Systems Biology, University of Gothenburg, Gothenburg, Sweden.
| | | | | | | | | |
Collapse
|
2497
|
Gould CM, Diella F, Via A, Puntervoll P, Gemünd C, Chabanis-Davidson S, Michael S, Sayadi A, Bryne JC, Chica C, Seiler M, Davey NE, Haslam N, Weatheritt RJ, Budd A, Hughes T, Pas J, Rychlewski L, Travé G, Aasland R, Helmer-Citterich M, Linding R, Gibson TJ. ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res 2009; 38:D167-80. [PMID: 19920119 PMCID: PMC2808914 DOI: 10.1093/nar/gkp1016] [Citation(s) in RCA: 196] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.
Collapse
Affiliation(s)
- Cathryn M Gould
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2498
|
Dogrusoz U, Cetintas A, Demir E, Babur O. Algorithms for effective querying of compound graph-based pathway databases. BMC Bioinformatics 2009; 10:376. [PMID: 19917102 PMCID: PMC2784781 DOI: 10.1186/1471-2105-10-376] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2008] [Accepted: 11/16/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools. RESULTS Towards this goal, we developed a querying framework, along with a number of graph-theoretic algorithms from simple neighborhood queries to shortest paths to feedback loops, that is applicable to all sorts of graph-based pathway databases, from PPIs (protein-protein interactions) to metabolic and signaling pathways. The framework is unique in that it can account for compound or nested structures and ubiquitous entities present in the pathway data. In addition, the queries may be related to each other through "AND" and "OR" operators, and can be recursively organized into a tree, in which the result of one query might be a source and/or target for another, to form more complex queries. The algorithms were implemented within the querying component of a new version of the software tool PATIKAweb (Pathway Analysis Tool for Integration and Knowledge Acquisition) and have proven useful for answering a number of biologically significant questions for large graph-based pathway databases. CONCLUSION The PATIKA Project Web site is http://www.patika.org. PATIKAweb version 2.1 is available at http://web.patika.org.
Collapse
Affiliation(s)
- Ugur Dogrusoz
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| | - Ahmet Cetintas
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| | - Emek Demir
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Ozgun Babur
- Center for Bioinformatics and Computer Engineering Dept., Bilkent University, Ankara, Turkey
| |
Collapse
|
2499
|
Wang L, Xiong Y, Sun Y, Fang Z, Li L, Ji H, Shi T. HLungDB: an integrated database of human lung cancer research. Nucleic Acids Res 2009; 38:D665-9. [PMID: 19900972 PMCID: PMC2808962 DOI: 10.1093/nar/gkp945] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The human lung cancer database (HLungDB) is a database with the integration of the lung cancer-related genes, proteins and miRNAs together with the corresponding clinical information. The main purpose of this platform is to establish a network of lung cancer-related molecules and to facilitate the mechanistic study of lung carcinogenesis. The entries describing the relationships between molecules and human lung cancer in the current release were extracted manually from literatures. Currently, we have collected 2585 genes and 212 miRNA with the experimental evidences involved in the different stages of lung carcinogenesis through text mining. Furthermore, we have incorporated the results from analysis of transcription factor-binding motifs, the promoters and the SNP sites for each gene. Since epigenetic alterations also play an important role in lung carcinogenesis, genes with epigenetic regulation were also included. We hope HLungDB will enrich our knowledge about lung cancer biology and eventually lead to the development of novel therapeutic strategies. HLungDB can be freely accessed at http://www.megabionet.org/bio/hlung.
Collapse
Affiliation(s)
- Lishan Wang
- Center for Bioinformatics and Computational Biology, and The Institute of Biomedical Sciences, College of Life Science, East China Normal University, Shanghai 200241, China
| | | | | | | | | | | | | |
Collapse
|
2500
|
Kandasamy K, Keerthikumar S, Raju R, Keshava Prasad TS, Ramachandra YL, Mohan S, Pandey A. PathBuilder--open source software for annotating and developing pathway resources. Bioinformatics 2009; 25:2860-2. [PMID: 19628504 PMCID: PMC2781757 DOI: 10.1093/bioinformatics/btp453] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Revised: 07/16/2009] [Accepted: 07/17/2009] [Indexed: 11/13/2022] Open
Abstract
SUMMARY We have developed PathBuilder, an open-source web application to annotate biological information pertaining to signaling pathways and to create web-based pathway resources. PathBuilder enables annotation of molecular events including protein-protein interactions, enzyme-substrate relationships and protein translocation events either manually or through automated importing of data from other databases. Salient features of PathBuilder include automatic validation of data formats, built-in modules for visualization of pathways, automated import of data from other pathway resources, export of data in several standard data exchange formats and an application programming interface for retrieving existing pathway datasets. AVAILABILITY PathBuilder is freely available for download at http://pathbuilder.sourceforge.net/ under the terms of GNU lesser general public license (LGPL: http://www.gnu.org/copyleft/lesser.html). The software is platform independent and has been tested on Windows and Linux platforms. CONTACT pandey@jhmi.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kumaran Kandasamy
- Institute of Bioinformatics, International Tech Park, Bangalore 560066, India
| | | | | | | | | | | | | |
Collapse
|