151
|
Guney E, Sanz-Pamplona R, Sierra A, Oliva B. Understanding Cancer Progression Using Protein Interaction Networks. SYSTEMS BIOLOGY IN CANCER RESEARCH AND DRUG DISCOVERY 2012:167-195. [DOI: 10.1007/978-94-007-4819-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
152
|
He D, Liu ZP, Chen L. Identification of dysfunctional modules and disease genes in congenital heart disease by a network-based approach. BMC Genomics 2011; 12:592. [PMID: 22136190 PMCID: PMC3256240 DOI: 10.1186/1471-2164-12-592] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 12/02/2011] [Indexed: 12/16/2022] Open
Abstract
Background The incidence of congenital heart disease (CHD) is continuously increasing among infants born alive nowadays, making it one of the leading causes of infant morbidity worldwide. Various studies suggest that both genetic and environmental factors lead to CHD, and therefore identifying its candidate genes and disease-markers has been one of the central topics in CHD research. By using the high-throughput genomic data of CHD which are available recently, network-based methods provide powerful alternatives of systematic analysis of complex diseases and identification of dysfunctional modules and candidate disease genes. Results In this paper, by modeling the information flow from source disease genes to targets of differentially expressed genes via a context-specific protein-protein interaction network, we extracted dysfunctional modules which were then validated by various types of measurements and independent datasets. Network topology analysis of these modules revealed major and auxiliary pathways and cellular processes in CHD, demonstrating the biological usefulness of the identified modules. We also prioritized a list of candidate CHD genes from these modules using a guilt-by-association approach, which are well supported by various kinds of literature and experimental evidence. Conclusions We provided a network-based analysis to detect dysfunctional modules and disease genes of CHD by modeling the information transmission from source disease genes to targets of differentially expressed genes. Our method resulted in 12 modules from the constructed CHD subnetwork. We further identified and prioritized candidate disease genes of CHD from these dysfunctional modules. In conclusion, module analysis not only revealed several important findings with regard to the underlying molecular mechanisms of CHD, but also suggested the distinct network properties of causal disease genes which lead to identification of candidate CHD genes.
Collapse
Affiliation(s)
- Danning He
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | |
Collapse
|
153
|
Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. HCVpro: Hepatitis C virus protein interaction database. INFECTION GENETICS AND EVOLUTION 2011; 11:1971-7. [PMID: 21930248 DOI: 10.1016/j.meegid.2011.09.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 08/24/2011] [Accepted: 09/02/2011] [Indexed: 02/07/2023]
|
154
|
Hsu CL, Huang YH, Hsu CT, Yang UC. Prioritizing disease candidate genes by a gene interconnectedness-based approach. BMC Genomics 2011; 12 Suppl 3:S25. [PMID: 22369140 PMCID: PMC3333184 DOI: 10.1186/1471-2164-12-s3-s25] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Genome-wide disease-gene finding approaches may sometimes provide us with a long list of candidate genes. Since using pure experimental approaches to verify all candidates could be expensive, a number of network-based methods have been developed to prioritize candidates. Such tools usually have a set of parameters pre-trained using available network data. This means that re-training network-based tools may be required when existing biological networks are updated or when networks from different sources are to be tried. Results We developed a parameter-free method, interconnectedness (ICN), to rank candidate genes by assessing the closeness of them to known disease genes in a network. ICN was tested using 1,993 known disease-gene associations and achieved a success rate of ~44% using a protein-protein interaction network under a test scenario of simulated linkage analysis. This performance is comparable with those of other well-known methods and ICN outperforms other methods when a candidate disease gene is not directly linked to known disease genes in a network. Interestingly, we show that a combined scoring strategy could enable ICN to achieve an even better performance (~50%) than other methods used alone. Conclusions ICN, a user-friendly method, can well complement other network-based methods in the context of prioritizing candidate disease genes.
Collapse
Affiliation(s)
- Chia-Lang Hsu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan 11221, Republic of China
| | | | | | | |
Collapse
|
155
|
Mora A, Donaldson IM. iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database. BMC Bioinformatics 2011; 12:455. [PMID: 22115179 PMCID: PMC3282787 DOI: 10.1186/1471-2105-12-455] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 11/24/2011] [Indexed: 11/19/2022] Open
Abstract
Background The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. Results The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL. The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. Conclusions The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.
Collapse
Affiliation(s)
- Antonio Mora
- Department for Molecular Biosciences, University of Oslo, P,O, Box 1041 Blindern, 0316 Oslo, Norway
| | | |
Collapse
|
156
|
Li CY, Zhou WZ, Zhang PW, Johnson C, Wei L, Uhl GR. Meta-analysis and genome-wide interpretation of genetic susceptibility to drug addiction. BMC Genomics 2011; 12:508. [PMID: 21999673 PMCID: PMC3215751 DOI: 10.1186/1471-2164-12-508] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2011] [Accepted: 10/15/2011] [Indexed: 12/21/2022] Open
Abstract
Background Classical genetic studies provide strong evidence for heritable contributions to susceptibility to developing dependence on addictive substances. Candidate gene and genome-wide association studies (GWAS) have sought genes, chromosomal regions and allelic variants likely to contribute to susceptibility to drug addiction. Results Here, we performed a meta-analysis of addiction candidate gene association studies and GWAS to investigate possible functional mechanisms associated with addiction susceptibility. From meta-data retrieved from 212 publications on candidate gene association studies and 5 GWAS reports, we linked a total of 843 haplotypes to addiction susceptibility. We mapped the SNPs in these haplotypes to functional and regulatory elements in the genome and estimated the magnitude of the contributions of different molecular mechanisms to their effects on addiction susceptibility. In addition to SNPs in coding regions, these data suggest that haplotypes in gene regulatory regions may also contribute to addiction susceptibility. When we compared the lists of genes identified by association studies and those identified by molecular biological studies of drug-regulated genes, we observed significantly higher participation in the same gene interaction networks than expected by chance, despite little overlap between the two gene lists. Conclusions These results appear to offer new insights into the genetic factors underlying drug addiction.
Collapse
Affiliation(s)
- Chuan-Yun Li
- Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, Peking University, Beijing, China.
| | | | | | | | | | | |
Collapse
|
157
|
Liu H, Su J, Li J, Liu H, Lv J, Li B, Qiao H, Zhang Y. Prioritizing cancer-related genes with aberrant methylation based on a weighted protein-protein interaction network. BMC SYSTEMS BIOLOGY 2011; 5:158. [PMID: 21985575 PMCID: PMC3224234 DOI: 10.1186/1752-0509-5-158] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 10/11/2011] [Indexed: 02/07/2023]
Abstract
Background As an important epigenetic modification, DNA methylation plays a crucial role in the development of mammals and in the occurrence of complex diseases. Genes that interact directly or indirectly may have the same or similar functions in the biological processes in which they are involved and together contribute to the related disease phenotypes. The complicated relations between genes can be clearly represented using network theory. A protein-protein interaction (PPI) network offers a platform from which to systematically identify disease-related genes from the relations between genes with similar functions. Results We constructed a weighted human PPI network (WHPN) using DNA methylation correlations based on human protein-protein interactions. WHPN represents the relationships of DNA methylation levels in gene pairs for four cancer types. A cancer-associated subnetwork (CASN) was obtained from WHPN by selecting genes associated with seed genes which were known to be methylated in the four cancers. We found that CASN had a more densely connected network community than WHPN, indicating that the genes in CASN were much closer to seed genes. We prioritized 154 potential cancer-related genes with aberrant methylation in CASN by neighborhood-weighting decision rule. A function enrichment analysis for GO and KEGG indicated that the optimized genes were mainly involved in the biological processes of regulating cell apoptosis and programmed cell death. An analysis of expression profiling data revealed that many of the optimized genes were expressed differentially in the four cancers. By examining the PubMed co-citations, we found 43 optimized genes were related with cancers and aberrant methylation, and 10 genes were validated to be methylated aberrantly in cancers. Of 154 optimized genes, 27 were as diagnostic markers and 20 as prognostic markers previously identified in literature for cancers and other complex diseases by searching PubMed manually. We found that 31 of the optimized genes were targeted as drug response markers in DrugBank. Conclusions Here we have shown that network theory combined with epigenetic characteristics provides a favorable platform from which to identify cancer-related genes. We prioritized 154 potential cancer-related genes with aberrant methylation that might contribute to the further understanding of cancers.
Collapse
Affiliation(s)
- Hui Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | |
Collapse
|
158
|
Razick S, Mora A, Michalickova K, Boddie P, Donaldson IM. iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex. BMC Bioinformatics 2011; 12:388. [PMID: 21975162 PMCID: PMC3228863 DOI: 10.1186/1471-2105-12-388] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 10/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background The iRefIndex consolidates protein interaction data from ten databases in a rigorous manner using sequence-based hash keys. Working with consolidated interaction data comes with distinct challenges: data are redundant, overlapping, highly interconnected and may be collected and represented using different curation practices. These phenomena were quantified in our previous studies. Results The iRefScape plug-in for the Cytoscape graphical viewer addresses these challenges. We show how these factors impact on data-mining tasks and how our solutions resolve them in a simple and efficient manner. A uniform accession space is used to limit redundancy and support search expansion and searching on multiple accession types. Multiple node and edge features support data filtering and mining. Node colours and features supply information about search result provenance. Overlapping evidence is presented using a multi-graph and a bi-partite representation is used to distinguish binary and n-ary source data. Searching for interactions between sets of proteins is supported and specifically includes searches on disease-related genes found in OMIM. Finally, a synchronized adjacency-matrix view facilitates visualization of relationships between sets of user defined groups. Conclusions The iRefScape plug-in will be of interest to advanced users of interaction data. The plug-in provides access to a consolidated data set in a uniform accession space while remaining faithful to the underlying source data. Tools are provided to facilitate a range of tasks from a simple search to knowledge discovery. The plug-in uses a number of strategies that will be of interest to other plug-in developers.
Collapse
Affiliation(s)
- Sabry Razick
- The Biotechnology Centre of Oslo, University of Oslo, P,O, Box 1125 Blindern, 0317 Oslo, Norway
| | | | | | | | | |
Collapse
|
159
|
Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinformatics 2011; 12:359. [PMID: 21884587 PMCID: PMC3203352 DOI: 10.1186/1471-2105-12-359] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Accepted: 08/31/2011] [Indexed: 01/22/2023] Open
Abstract
Background Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable. Results We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information. Conclusion our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
Collapse
|
160
|
Stojmirović A, Yu YK. ppiTrim: constructing non-redundant and up-to-date interactomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar036. [PMID: 21873645 PMCID: PMC3162744 DOI: 10.1093/database/bar036] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Robust advances in interactome analysis demand comprehensive, non-redundant and consistently annotated data sets. By non-redundant, we mean that the accounting of evidence for every interaction should be faithful: each independent experimental support is counted exactly once, no more, no less. While many interactions are shared among public repositories, none of them contains the complete known interactome for any model organism. In addition, the annotations of the same experimental result by different repositories often disagree. This brings up the issue of which annotation to keep while consolidating evidences that are the same. The iRefIndex database, including interactions from most popular repositories with a standardized protein nomenclature, represents a significant advance in all aspects, especially in comprehensiveness. However, iRefIndex aims to maintain all information/annotation from original sources and requires users to perform additional processing to fully achieve the aforementioned goals. Another issue has to do with protein complexes. Some databases represent experimentally observed complexes as interactions with more than two participants, while others expand them into binary interactions using spoke or matrix model. To avoid untested interaction information buildup, it is preferable to replace the expanded protein complexes, either from spoke or matrix models, with a flat list of complex members. To address these issues and to achieve our goals, we have developed ppiTrim, a script that processes iRefIndex to produce non-redundant, consistently annotated data sets of physical interactions. Our script proceeds in three stages: mapping all interactants to gene identifiers and removing all undesired raw interactions, deflating potentially expanded complexes, and reconciling for each interaction the annotation labels among different source databases. As an illustration, we have processed the three largest organismal data sets: yeast, human and fruitfly. While ppiTrim can resolve most apparent conflicts between different labelings, we also discovered some unresolvable disagreements mostly resulting from different annotation policies among repositories. Database URL:http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/ppiTrim.html
Collapse
Affiliation(s)
- Aleksandar Stojmirović
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
161
|
Fahey ME, Bennett MJ, Mahon C, Jäger S, Pache L, Kumar D, Shapiro A, Rao K, Chanda SK, Craik CS, Frankel AD, Krogan NJ. GPS-Prot: a web-based visualization platform for integrating host-pathogen interaction data. BMC Bioinformatics 2011; 12:298. [PMID: 21777475 PMCID: PMC3213248 DOI: 10.1186/1471-2105-12-298] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Accepted: 07/22/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND The increasing availability of HIV-host interaction datasets, including both physical and genetic interactions, has created a need for software tools to integrate and visualize the data. Because these host-pathogen interactions are extensive and interactions between human proteins are found within many different databases, it is difficult to generate integrated HIV-human interaction networks. RESULTS We have developed a web-based platform, termed GPS-Prot http://www.gpsprot.org, that allows for facile integration of different HIV interaction data types as well as inclusion of interactions between human proteins derived from publicly-available databases, including MINT, BioGRID and HPRD. The software has the ability to group proteins into functional modules or protein complexes, generating more intuitive network representations and also allows for the uploading of user-generated data. CONCLUSIONS GPS-Prot is a software tool that allows users to easily create comprehensive and integrated HIV-host networks. A major advantage of this platform compared to other visualization tools is its web-based format, which requires no software installation or data downloads. GPS-Prot allows novice users to quickly generate networks that combine both genetic and protein-protein interactions between HIV and its human host into a single representation. Ultimately, the platform is extendable to other host-pathogen systems.
Collapse
Affiliation(s)
- Marie E Fahey
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, 1700 4th Street, San Francisco, CA 94158, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
162
|
Goel R, Muthusamy B, Pandey A, Prasad TSK. Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology. Mol Biotechnol 2011; 48:87-95. [PMID: 20927658 DOI: 10.1007/s12033-010-9336-8] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
In the recent years, research in molecular biotechnology has transformed from being small scale studies targeted at a single or a small set of molecule(s) into a combination of high throughput discovery platforms and extensive validations. Such a discovery platform provided an unbiased approach which resulted in the identification of several novel genetic and protein biomarkers. High throughput nature of these investigations coupled with higher sensitivity and specificity of Next Generation technologies provided qualitatively and quantitatively richer biological data. These developments have also revolutionized biological research and speed of data generation. However, it is becoming difficult for individual investigators to directly benefit from this data because they are not easily accessible. Data resources became necessary to assimilate, store and disseminate information that could allow future discoveries. We have developed two resources--Human Protein Reference Database (HPRD) and Human Proteinpedia, which integrate knowledge relevant to human proteins. A number of protein features including protein-protein interactions, post-translational modifications, subcellular localization, and tissue expression, which have been studied using different strategies were incorporated in these databases. Human Proteinpedia also provides a portal for community participation to annotate and share proteomic data and uses HPRD as the scaffold for data processing. Proteomic investigators can even share unpublished data in Human Proteinpedia, which provides a meaningful platform for data sharing. As proteomic information reflects a direct view of cellular systems, proteomics is expected to complement other areas of biology such as genomics, transcriptomics, molecular biology, cloning, and classical genetics in understanding the relationships among multiple facets of biological systems.
Collapse
Affiliation(s)
- Renu Goel
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India
| | | | | | | |
Collapse
|
163
|
Bell L, Chowdhary R, Liu JS, Niu X, Zhang J. Integrated bio-entity network: a system for biological knowledge discovery. PLoS One 2011; 6:e21474. [PMID: 21738677 PMCID: PMC3124513 DOI: 10.1371/journal.pone.0021474] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Accepted: 06/01/2011] [Indexed: 01/26/2023] Open
Abstract
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.
Collapse
Affiliation(s)
- Lindsey Bell
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
| | | | | | | | | |
Collapse
|
164
|
Chen Y, Wang W, Zhou Y, Shields R, Chanda SK, Elston RC, Li J. In silico gene prioritization by integrating multiple data sources. PLoS One 2011; 6:e21137. [PMID: 21731658 PMCID: PMC3123338 DOI: 10.1371/journal.pone.0021137] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Accepted: 05/20/2011] [Indexed: 11/19/2022] Open
Abstract
Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.
Collapse
Affiliation(s)
- Yixuan Chen
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Wenhui Wang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Yingyao Zhou
- Genomics Institute of the Novartis Research Foundation, San Diego, California, United States of America
| | - Robert Shields
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Sumit K. Chanda
- Infectious and Inflammatory Disease Center, Burnham Institute for Medical Research, La Jolla, California, United States of America
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Jing Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, United States of America
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
- Joint Institute of Systems Biology, College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
165
|
Kritikos GD, Moschopoulos C, Vazirgiannis M, Kossida S. Noise reduction in protein-protein interaction graphs by the implementation of a novel weighting scheme. BMC Bioinformatics 2011; 12:239. [PMID: 21679454 PMCID: PMC3230908 DOI: 10.1186/1471-2105-12-239] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 06/16/2011] [Indexed: 11/10/2022] Open
Abstract
Background Recent technological advances applied to biology such as yeast-two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of protein interaction networks. These interaction networks represent a rich, yet noisy, source of data that could be used to extract meaningful information, such as protein complexes. Several interaction network weighting schemes have been proposed so far in the literature in order to eliminate the noise inherent in interactome data. In this paper, we propose a novel weighting scheme and apply it to the S. cerevisiae interactome. Complex prediction rates are improved by up to 39%, depending on the clustering algorithm applied. Results We adopt a two step procedure. During the first step, by applying both novel and well established protein-protein interaction (PPI) weighting methods, weights are introduced to the original interactome graph based on the confidence level that a given interaction is a true-positive one. The second step applies clustering using established algorithms in the field of graph theory, as well as two variations of Spectral clustering. The clustered interactome networks are also cross-validated against the confirmed protein complexes present in the MIPS database. Conclusions The results of our experimental work demonstrate that interactome graph weighting methods clearly improve the clustering results of several clustering algorithms. Moreover, our proposed weighting scheme outperforms other approaches of PPI graph weighting.
Collapse
Affiliation(s)
- George D Kritikos
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, Athens, Soranou Efesiou 4, GR-11527, Greece
| | | | | | | |
Collapse
|
166
|
Acuner Ozbabacan SE, Engin HB, Gursoy A, Keskin O. Transient protein-protein interactions. Protein Eng Des Sel 2011; 24:635-48. [DOI: 10.1093/protein/gzr025] [Citation(s) in RCA: 170] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
|
167
|
Zhang M, Zhu C, Jacomy A, Lu L, Jegga A. The orphan disease networks. Am J Hum Genet 2011; 88:755-766. [PMID: 21664998 DOI: 10.1016/j.ajhg.2011.05.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Revised: 04/29/2011] [Accepted: 05/06/2011] [Indexed: 01/29/2023] Open
Abstract
The low prevalence rate of orphan diseases (OD) requires special combined efforts to improve diagnosis, prevention, and discovery of novel therapeutic strategies. To identify and investigate relationships based on shared genes or shared functional features, we have conducted a bioinformatic-based global analysis of all orphan diseases with known disease-causing mutant genes. Starting with a bipartite network of known OD and OD-causing mutant genes and using the human protein interactome, we first construct and topologically analyze three networks: the orphan disease network, the orphan disease-causing mutant gene network, and the orphan disease-causing mutant gene interactome. Our results demonstrate that in contrast to the common disease-causing mutant genes that are predominantly nonessential, a majority of orphan disease-causing mutant genes are essential. In confirmation of this finding, we found that OD-causing mutant genes are topologically important in the protein interactome and are ubiquitously expressed. Additionally, functional enrichment analysis of those genes in which mutations cause ODs shows that a majority result in premature death or are lethal in the orthologous mouse gene knockout models. To address the limitations of traditional gene-based disease networks, we also construct and analyze OD networks on the basis of shared enriched features (biological processes, cellular components, pathways, phenotypes, and literature citations). Analyzing these functionally-linked OD networks, we identified several additional OD-OD relations that are both phenotypically similar and phenotypically diverse. Surprisingly, we observed that the wiring of the gene-based and other feature-based OD networks are largely different; this suggests that the relationship between ODs cannot be fully captured by the gene-based network alone.
Collapse
|
168
|
Drug discovery and the use of computational approaches for infectious diseases. Future Med Chem 2011; 3:1011-25. [DOI: 10.4155/fmc.11.60] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
For centuries infectious diseases were the scourge of humanity, overcome only by the discovery of vaccination and penicillin. With an armamentarium of effective antibiotics, vaccines and drugs at hand, infectious diseases for many years were considered to be negligible. With the onset of the AIDS pandemic, the return of tuberculosis and influenza (e.g., swine influenza) this notion has changed in recent years. Drug discovery for infectious diseases, therefore, is again gaining increasing interest. This article discusses the drug-discovery process in this area and introduces major computational approaches used to identify suitable drug targets and to discover and optimize chemical lead compounds towards drug candidates using examples from antiparasitic drug discovery.
Collapse
|
169
|
A systemic network triggered by human cytomegalovirus entry. Adv Virol 2011; 2011:262080. [PMID: 22312338 PMCID: PMC3263853 DOI: 10.1155/2011/262080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2010] [Revised: 01/25/2011] [Accepted: 03/14/2011] [Indexed: 01/09/2023] Open
Abstract
Virus entry is a multistep process that triggers various cellular pathways that interconnect into a complex network; yet the molecular complexity of this network remains largely elusive. Here, by employing systems biology approaches, we reveal a systemic virus-entry network initiated by human cytomegalovirus (HCMV), a widespread opportunistic pathogen. This network contains ten functional modules (i.e., groups of proteins) that coordinately respond to HCMV entry. Functional modules activated (up- and downregulated) in this network dramatically decline shortly within 25 minutes post infection. While modules annotated as receptor system, ion transport, and immune response are continuously activated during the entire process of HCMV entry, those annotated for cell adhesion and skeletal movement are specifically activated during viral early attachment. The up-regulated network contains various functional modules, such as cell surface receptors, skeletal development, endocytosis, ion transport, and chromatin remodeling. Interestingly, macromolecule metabolism and chromatin remodeling module predominates this over-expressed system, suggesting that the fundamental nuclear process modulation is one of the most important events in HCMV entry. The entire up-regulated network is primarily controlled by multiple elements like SLC10A1. Thus, virus entry triggers multiple cellular processes especially nuclear processes to facilitate its entry.
Collapse
|
170
|
Polajnar T, Damoulas T, Girolami M. Protein interaction sentence detection using multiple semantic kernels. J Biomed Semantics 2011; 2:1. [PMID: 21569604 PMCID: PMC3116455 DOI: 10.1186/2041-1480-2-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 05/14/2011] [Indexed: 11/24/2022] Open
Abstract
Background Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Many state-of-the-art approaches for this task employ kernel classification methods, in particular support vector machines (SVMs). In this work we propose a novel data integration approach that utilises semantic kernels and a kernel classification method that is a probabilistic analogue to SVMs. Semantic kernels are created from statistical information gathered from large amounts of unlabelled text using lexical semantic models. Several semantic kernels are then fused into an overall composite classification space. In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection. Results We show that combinations of semantic kernels lead to statistically significant improvements in recognition rates and receiver operating characteristic (ROC) scores over the plain Gaussian kernel, when applied to a well-known labelled collection of abstracts. The proposed kernel composition method also allows us to automatically infer the most discriminative kernels. Conclusions The results from this paper indicate that using semantic information from unlabelled text, and combinations of such information, can be valuable for classification of short texts such as PPI sentences. This study, however, is only a first step in evaluation of semantic kernels and probabilistic multiple kernel learning in the context of PPI detection. The method described herein is modular, and can be applied with a variety of feature types, kernels, and semantic models, in order to facilitate full extraction of interacting proteins.
Collapse
Affiliation(s)
- Tamara Polajnar
- School of Computing Science, University of Glasgow, Glasgow, UK.
| | | | | |
Collapse
|
171
|
Wei XL. Notice of Retraction: Visualization and Analysis of Integrin Signaling Network. 2011 5TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2011:1-4. [DOI: 10.1109/icbbe.2011.5780095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
172
|
Lin M, Zhou X, Shen X, Mao C, Chen X. The predicted Arabidopsis interactome resource and network topology-based systems biology analyses. THE PLANT CELL 2011; 23:911-22. [PMID: 21441435 PMCID: PMC3082272 DOI: 10.1105/tpc.110.082529] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 12/30/2010] [Accepted: 03/10/2011] [Indexed: 05/17/2023]
Abstract
Predicted interactions are a valuable complement to experimentally reported interactions in molecular mechanism studies, particularly for higher organisms, for which reported experimental interactions represent only a small fraction of their total interactomes. With careful engineering consideration of the lessons from previous efforts, the predicted arabidopsis interactome resource (PAIR; ) presents 149,900 potential molecular interactions, which are expected to cover approximately 24% of the entire interactome with approximately 40% precision. This study demonstrates that, although PAIR still has limited coverage, it is rich enough to capture many significant functional linkages within and between higher-order biological systems, such as pathways and biological processes. These inferred interactions can nicely power several network topology-based systems biology analyses, such as gene set linkage analysis, protein function prediction, and identification of regulatory genes demonstrating insignificant expression changes. The drastically expanded molecular network in PAIR has considerably improved the capability of these analyses to integrate existing knowledge and suggest novel insights into the function and coordination of genes and gene networks.
Collapse
Affiliation(s)
- Mingzhi Lin
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xi Zhou
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xueling Shen
- Institute of Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Chuanzao Mao
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xin Chen
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Institute of Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| |
Collapse
|
173
|
Isserlin R, El-Badrawi RA, Bader GD. The Biomolecular Interaction Network Database in PSI-MI 2.5. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:baq037. [PMID: 21233089 PMCID: PMC3021793 DOI: 10.1093/database/baq037] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases. Database URL: http://download.baderlab.org/BINDTranslation/
Collapse
Affiliation(s)
- Ruth Isserlin
- The Donnelly Centre, University of Toronto, ON, Canada
| | | | | |
Collapse
|
174
|
Lasher CD, Rajagopalan P, Murali TM. Discovering networks of perturbed biological processes in hepatocyte cultures. PLoS One 2011; 6:e15247. [PMID: 21245926 PMCID: PMC3016309 DOI: 10.1371/journal.pone.0015247] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 11/02/2010] [Indexed: 12/20/2022] Open
Abstract
The liver plays a vital role in glucose homeostasis, the synthesis of bile acids and the detoxification of foreign substances. Liver culture systems are widely used to test adverse effects of drugs and environmental toxicants. The two most prevalent liver culture systems are hepatocyte monolayers (HMs) and collagen sandwiches (CS). Despite their wide use, comprehensive transcriptional programs and interaction networks in these culture systems have not been systematically investigated. We integrated an existing temporal transcriptional dataset for HM and CS cultures of rat hepatocytes with a functional interaction network of rat genes. We aimed to exploit the functional interactions to identify statistically significant linkages between perturbed biological processes. To this end, we developed a novel approach to compute Contextual Biological Process Linkage Networks (CBPLNs). CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Multiple phenomena captured by CBPLNs at the process level such as regulation, downstream effects, and feedback loops have well described counterparts at the gene and protein level. CBPLNs reveal high-level linkages between pathways and processes, making the identification of important biological trends more tractable than through interactions between individual genes and molecules alone. Our approach may provide a new route to explore, analyze, and understand cellular responses to internal and external cues within the context of the intricate networks of molecular interactions that control cellular behavior.
Collapse
Affiliation(s)
- Christopher D. Lasher
- Genetics, Bioinformatics, and Computational Biology PhD Program, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|
175
|
Paliouras M, Zaman N, Lumbroso R, Kapogeorgakis L, Beitel LK, Wang E, Trifiro M. Dynamic rewiring of the androgen receptor protein interaction network correlates with prostate cancer clinical outcomes. Integr Biol (Camb) 2011; 3:1020-32. [DOI: 10.1039/c1ib00038a] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
176
|
Bhattacharyya R. Cohesion: A concept and framework for confident association discovery with potential application in microarray mining. Appl Soft Comput 2011. [DOI: 10.1016/j.asoc.2009.12.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
177
|
Song MO, Freedman JH. Role of hepatocyte nuclear factor 4α in controlling copper-responsive transcription. BIOCHIMICA ET BIOPHYSICA ACTA 2011; 1813:102-8. [PMID: 20875833 PMCID: PMC3014409 DOI: 10.1016/j.bbamcr.2010.09.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/05/2010] [Revised: 09/07/2010] [Accepted: 09/16/2010] [Indexed: 01/04/2023]
Abstract
Previous global transcriptome and interactome analyses of copper-treated HepG2 cells identified hepatocyte nuclear factor 4α (HNF4α) as a potential master regulator of copper-responsive transcription. Copper exposure caused a decrease in the expression of HNF4α at both mRNA and protein levels, which was accompanied by a decrease in the level of HNF4α binding to its consensus DNA binding sequence. qRT-PCR and RNAi studies demonstrated that changes in HNF4α expression ultimately affected the expressions of its down-stream target genes. Analysis of upstream regulators of HNF4α expression, including p53 and ATF3, showed that copper caused an increase in the steady-state levels of these proteins. These results support a model for copper-responsive transcription in which the metal affects ATF3 expression and stabilizes p53 resulting in the down-regulation of HNF4α expression. In addition, copper may directly affect p53 protein levels. The suppression of HNF4α activity may contribute to the molecular mechanisms underlying the physiological and toxicological consequences of copper toxicity in hepatic-derived cells.
Collapse
Affiliation(s)
- Min Ok Song
- National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | | |
Collapse
|
178
|
Doderer MS, Yoon K, Robbins KA. SIDEKICK: Genomic data driven analysis and decision-making framework. BMC Bioinformatics 2010; 11:611. [PMID: 21192813 PMCID: PMC3022632 DOI: 10.1186/1471-2105-11-611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 12/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. RESULTS Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. CONCLUSIONS Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that traditional single gene lists do not, particularly in areas such as interaction discovery.
Collapse
Affiliation(s)
- Mark S Doderer
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
| | | | | |
Collapse
|
179
|
Cohen O, Gophna U, Pupko T. The Complexity Hypothesis Revisited: Connectivity Rather Than Function Constitutes a Barrier to Horizontal Gene Transfer. Mol Biol Evol 2010; 28:1481-9. [DOI: 10.1093/molbev/msq333] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
180
|
Functional genomics complements quantitative genetics in identifying disease-gene associations. PLoS Comput Biol 2010; 6:e1000991. [PMID: 21085640 PMCID: PMC2978695 DOI: 10.1371/journal.pcbi.1000991] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Accepted: 10/07/2010] [Indexed: 11/25/2022] Open
Abstract
An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype. Many recent efforts to understand the genetic origins of complex diseases utilize statistical approaches to analyze phenotypic traits measured in genetically well-characterized populations. While these quantitative genetics methods are powerful, their success is limited by sampling biases and other confounding factors, and the biological interpretation of results can be challenging since these methods are not based on any functional information for candidate loci. On the other hand, the functional genomics field has greatly expanded in past years, both in terms of experimental approaches and analytical algorithms. However, functional approaches have been applied to understanding phenotypes in only the most basic ways. In this study, we demonstrate that functional genomics can complement traditional quantitative genetics by analytically extracting protein function information from large collections of high throughput data, which can then be used to predict genotype-phenotype associations. We applied our prediction methodology to the laboratory mouse, and we experimentally confirmed a role in osteoporosis for two of our predictions that were not candidates from any previous quantitative genetics study. The ability of our approach to produce accurate and unique predictions implies that functional genomics can complement quantitative genetics and can help address previous limitations in identifying disease genes.
Collapse
|
181
|
Pible O, Vidaud C, Plantevin S, Pellequer JL, Quéméneur E. Predicting the disruption by UO2(2+) of a protein-ligand interaction. Protein Sci 2010; 19:2219-30. [PMID: 20842713 PMCID: PMC3005792 DOI: 10.1002/pro.501] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 08/30/2010] [Accepted: 09/04/2010] [Indexed: 01/27/2023]
Abstract
The uranyl cation (UO(2) (2+)) can be suspected to interfere with the binding of essential metal cations to proteins, underlying some mechanisms of toxicity. A dedicated computational screen was used to identify UO(2) (2+) binding sites within a set of nonredundant protein structures. The list of potential targets was compared to data from a small molecules interaction database to pinpoint specific examples where UO(2) (2+) should be able to bind in the vicinity of an essential cation, and would be likely to affect the function of the corresponding protein. The C-reactive protein appeared as an interesting hit since its structure involves critical calcium ions in the binding of phosphorylcholine. Biochemical experiments confirmed the predicted binding site for UO(2) (2+) and it was demonstrated by surface plasmon resonance assays that UO(2) (2+) binding to CRP prevents the calcium-mediated binding of phosphorylcholine. Strikingly, the apparent affinity of UO(2) (2+) for native CRP was almost 100-fold higher than that of Ca(2+). This result exemplifies in the case of CRP the capability of our computational tool to predict effective binding sites for UO(2) (2+) in proteins and is a first evidence of calcium substitution by the uranyl cation in a native protein.
Collapse
Affiliation(s)
- Olivier Pible
- CEA Life Sciences Division, DSV, IBEB, SBTN, Bagnols-sur-Cèze, F-30207, France.
| | | | | | | | | |
Collapse
|
182
|
Abstract
The predicted Arabidopsis interactome resource (PAIR, http://www.cls.zju.edu.cn/pair/), comprised of 5990 experimentally reported molecular interactions in Arabidopsis thaliana together with 145 494 predicted interactions, is currently the most comprehensive data set of the Arabidopsis interactome with high reliability. PAIR predicts interactions by a fine-tuned support vector machine model that integrates indirect evidences for interaction, such as gene co-expressions, domain interactions, shared GO annotations, co-localizations, phylogenetic profile similarities and homologous interactions in other organisms (interologs). These predictions were expected to cover 24% of the entire Arabidopsis interactome, and their reliability was estimated to be 44%. Two independent example data sets were used to rigorously validate the prediction accuracy. PAIR features a user-friendly query interface, providing rich annotation on the relationships between two proteins. A graphical interaction network browser has also been integrated into the PAIR web interface to facilitate mining of specific pathways.
Collapse
Affiliation(s)
- Mingzhi Lin
- Department of Bioinformatics and Institute of Biochemistry, Zhejiang University, Hangzhou, PR China
| | | | | |
Collapse
|
183
|
Jaeger S, Ertaylan G, van Dijk D, Leser U, Sloot P. Inference of surface membrane factors of HIV-1 infection through functional interaction networks. PLoS One 2010; 5:e13139. [PMID: 20967291 PMCID: PMC2953485 DOI: 10.1371/journal.pone.0013139] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Accepted: 09/08/2010] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND HIV infection affects the populations of T helper cells, dendritic cells and macrophages. Moreover, it has a serious impact on the central nervous system. It is yet not clear whether this list is complete and why specifically those cell types are affected. To address this question, we have developed a method to identify cellular surface proteins that permit, mediate or enhance HIV infection in different cell/tissue types in HIV-infected individuals. Receptors associated with HIV infection share common functions and domains and are involved in similar cellular processes. These properties are exploited by bioinformatics techniques to predict novel cell surface proteins that potentially interact with HIV. METHODOLOGY/PRINCIPAL FINDINGS We compiled a set of surface membrane proteins (SMP) that are known to interact with HIV. This set is extended by proteins that have direct interaction and share functional similarity. This resulted in a comprehensive network around the initial SMP set. Using network centrality analysis we predict novel surface membrane factors from the annotated network. We identify 21 surface membrane factors, among which three have confirmed functions in HIV infection, seven have been identified by at least two other studies, and eleven are novel predictions and thus excellent targets for experimental investigation. CONCLUSIONS Determining to what extent HIV can interact with human SMPs is an important step towards understanding patient specific disease progression. Using various bioinformatics techniques, we generate a set of surface membrane factors that constitutes a well-founded starting point for experimental testing of cell/tissue susceptibility of different HIV strains as well as for cohort studies evaluating patient specific disease progression.
Collapse
Affiliation(s)
- Samira Jaeger
- Knowledge Management in Bioinformatics, Humboldt-Universität Berlin, Berlin, Germany
- Algorithmic Computational Biology, Centrum Wiskunde and Informatica, Amsterdam, The Netherlands
| | - Gokhan Ertaylan
- Computational Science, University of Amsterdam, Amsterdam, The Netherlands
| | - David van Dijk
- Computational Science, University of Amsterdam, Amsterdam, The Netherlands
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Humboldt-Universität Berlin, Berlin, Germany
| | - Peter Sloot
- Computational Science, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
184
|
Zhang M, Lu LJ. Investigating the validity of current network analysis on static conglomerate networks by protein network stratification. BMC Bioinformatics 2010; 11:466. [PMID: 20846443 PMCID: PMC2949894 DOI: 10.1186/1471-2105-11-466] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2010] [Accepted: 09/16/2010] [Indexed: 01/25/2023] Open
Abstract
Background A molecular network perspective forms the foundation of systems biology. A common practice in analyzing protein-protein interaction (PPI) networks is to perform network analysis on a conglomerate network that is an assembly of all available binary interactions in a given organism from diverse data sources. Recent studies on network dynamics suggested that this approach might have ignored the dynamic nature of context-dependent molecular systems. Results In this study, we employed a network stratification strategy to investigate the validity of the current network analysis on conglomerate PPI networks. Using the genome-scale tissue- and condition-specific proteomics data in Arabidopsis thaliana, we present here the first systematic investigation into this question. We stratified a conglomerate A. thaliana PPI network into three levels of context-dependent subnetworks. We then focused on three types of most commonly conducted network analyses, i.e., topological, functional and modular analyses, and compared the results from these network analyses on the conglomerate network and five stratified context-dependent subnetworks corresponding to specific tissues. Conclusions We found that the results based on the conglomerate PPI network are often significantly different from those of context-dependent subnetworks corresponding to specific tissues or conditions. This conclusion depends neither on relatively arbitrary cutoffs (such as those defining network hubs or bottlenecks), nor on specific network clustering algorithms for module extraction, nor on the possible high false positive rates of binary interactions in PPI networks. We also found that our conclusions are likely to be valid in human PPI networks. Furthermore, network stratification may help resolve many controversies in current research of systems biology.
Collapse
Affiliation(s)
- Minlu Zhang
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH 45229, USA
| | | |
Collapse
|
185
|
Liu ZP, Wang Y, Zhang XS, Chen L. Identifying dysfunctional crosstalk of pathways in various regions of Alzheimer's disease brains. BMC SYSTEMS BIOLOGY 2010; 4 Suppl 2:S11. [PMID: 20840725 PMCID: PMC2982685 DOI: 10.1186/1752-0509-4-s2-s11] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Background Alzheimer's disease (AD) is a major neurodegenerative disorder leading to amnesia, cognitive impairment and dementia in the elderly. Usually this type of lesions results from dysfunctional protein cooperations in the biological pathways. In addition, AD progression is known to occur in different brain regions with particular features. Thus identification and analysis of crosstalk among dysregulated pathways as well as identification of their clusters in various diseased brain regions are expected to provide deep insights into the pathogenetic mechanism. Results Here we propose a network-based systems biology approach to detect the crosstalks among AD related pathways, as well as their dysfunctions in the six brain regions of AD patients. Through constructing a network of pathways, the relationships among AD pathway and its neighbor pathways are systematically investigated and visually presented by their intersections. We found that the significance degree of pathways related to the fatal disorders and the pathway overlapping strength can indicate the impacts of these neighbored pathways to AD development. Furthermore, the crosstalks among pathways reveal some evidence that the neighbor pathways of AD pathway closely cooperate and play important tasks in the AD progression. Conclusions Our study identifies the common and distinct features of the dysfunctional crosstalk of pathways in various AD brain regions. The global pathway crosstalk network and the clusters of relevant pathways of AD provide evidence of cooperativity among pathways for potential pathogenesis of the neuron complex disease.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Key Laboratory of Systems Biology and SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| | | | | | | |
Collapse
|
186
|
Lynn DJ, Chan C, Naseer M, Yau M, Lo R, Sribnaia A, Ring G, Que J, Wee K, Winsor GL, Laird MR, Breuer K, Foroushani AK, Brinkman FSL, Hancock REW. Curating the innate immunity interactome. BMC SYSTEMS BIOLOGY 2010; 4:117. [PMID: 20727158 PMCID: PMC2936296 DOI: 10.1186/1752-0509-4-117] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2010] [Accepted: 08/20/2010] [Indexed: 12/29/2022]
Abstract
BACKGROUND The innate immune response is the first line of defence against invading pathogens and is regulated by complex signalling and transcriptional networks. Systems biology approaches promise to shed new light on the regulation of innate immunity through the analysis and modelling of these networks. A key initial step in this process is the contextual cataloguing of the components of this system and the molecular interactions that comprise these networks. InnateDB (http://www.innatedb.com) is a molecular interaction and pathway database developed to facilitate systems-level analyses of innate immunity. RESULTS Here, we describe the InnateDB curation project, which is manually annotating the human and mouse innate immunity interactome in rich contextual detail, and present our novel curation software system, which has been developed to ensure interactions are curated in a highly accurate and data-standards compliant manner. To date, over 13,000 interactions (protein, DNA and RNA) have been curated from the biomedical literature. Here, we present data, illustrating how InnateDB curation of the innate immunity interactome has greatly enhanced network and pathway annotation available for systems-level analysis and discuss the challenges that face such curation efforts. Significantly, we provide several lines of evidence that analysis of the innate immunity interactome has the potential to identify novel signalling, transcriptional and post-transcriptional regulators of innate immunity. Additionally, these analyses also provide insight into the cross-talk between innate immunity pathways and other biological processes, such as adaptive immunity, cancer and diabetes, and intriguingly, suggests links to other pathways, which as yet, have not been implicated in the innate immune response. CONCLUSIONS In summary, curation of the InnateDB interactome provides a wealth of information to enable systems-level analysis of innate immunity.
Collapse
Affiliation(s)
- David J Lynn
- Animal & Bioscience Research Department, AGRIC, Teagasc, Grange, Dunsany, Co. Meath, Ireland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
187
|
Termanini A, Tieri P, Franceschi C. Encoding the states of interacting proteins to facilitate biological pathways reconstruction. Biol Direct 2010; 5:52; discussion 52. [PMID: 20707925 PMCID: PMC2930634 DOI: 10.1186/1745-6150-5-52] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2010] [Accepted: 08/13/2010] [Indexed: 12/04/2022] Open
Abstract
Background In a systems biology perspective, protein-protein interactions (PPI) are encoded in machine-readable formats to avoid issues encountered in their retrieval for the reconstruction of comprehensive interaction maps and biological pathways. However, the information stored in electronic formats currently used doesn't allow a valid automatic reconstruction of biological pathways. Results We propose a logical model of PPI that takes into account the "state" of proteins before and after the interaction. This information is necessary for proper reconstruction of the pathway. Conclusions The adoption of the proposed model, which can be easily integrated into existing machine-readable formats used to store the PPI data, would facilitate the automatic or semi-automated reconstruction of biological pathways. Reviewers This article was reviewed by Dr. Wen-Yu Chung (nominated by Kateryna Makova), Dr. Carl Herrmann (nominated by Dr. Purificación López-García) and Dr. Arcady Mushegian.
Collapse
Affiliation(s)
- Alberto Termanini
- L, Galvani Interdepartmental Center, University of Bologna, Bologna, Italy.
| | | | | |
Collapse
|
188
|
Proteome analysis of microtubule-associated proteins and their interacting partners from mammalian brain. Amino Acids 2010; 41:363-85. [PMID: 20567863 DOI: 10.1007/s00726-010-0649-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2010] [Accepted: 06/01/2010] [Indexed: 10/19/2022]
Abstract
The microtubule (MT) cytoskeleton is essential for a variety of cellular processes. MTs are finely regulated by distinct classes of MT-associated proteins (MAPs), which themselves bind to and are regulated by a large number of additional proteins. We have carried out proteome analyses of tubulin-rich and tubulin-depleted MAPs and their interacting partners isolated from bovine brain. In total, 573 proteins were identified giving us unprecedented access to brain-specific MT-associated proteins from mammalian brain. Most of the standard MAPs were identified and at least 500 proteins have been reported as being associated with MTs. We identified protein complexes with a large number of subunits such as brain-specific motor/adaptor/cargo complexes for kinesins, dynein, and dynactin, and proteins of an RNA-transporting granule. About 25% of the identified proteins were also found in the synaptic vesicle proteome. Analysis of the MS/MS data revealed many posttranslational modifications, amino acid changes, and alternative splice variants, particularly in tau, a key protein implicated in Alzheimer's disease. Bioinformatic analysis of known protein-protein interactions of the identified proteins indicated that the number of MAPs and their associated proteins is larger than previously anticipated and that our database will be a useful resource to identify novel binding partners.
Collapse
|
189
|
ROCK: a breast cancer functional genomics resource. Breast Cancer Res Treat 2010; 124:567-72. [PMID: 20563840 DOI: 10.1007/s10549-010-0945-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 05/08/2010] [Indexed: 12/20/2022]
Abstract
The clinical and pathological heterogeneity of breast cancer has instigated efforts to stratify breast cancer sub-types according to molecular profiles. These profiling efforts are now being augmented by large-scale functional screening of breast tumour cell lines, using approaches such as RNA interference. We have developed ROCK ( rock.icr.ac.uk ) to provide a unique, publicly accessible resource for the integration of breast cancer functional and molecular profiling datasets. ROCK provides a simple online interface for the navigation and cross-correlation of gene expression, aCGH and RNAi screen data. It enables the interrogation of gene lists in the context of statistically analysed functional genomic datasets, interaction networks, pathways, GO terms, mutations and drug targets. The interface also provides interactive visualisations of datasets and interaction networks. ROCK collates data from a wealth of breast cancer molecular profiling and functional screening studies into a single portal, where analysed and annotated results can be accessed at the level of a gene, sample or study. We believe that portals such as ROCK will not only afford researchers rapid access to profiling data, but also aid the integration of different data types, thus enhancing the discovery of novel targets and biomarkers for breast cancer.
Collapse
|
190
|
Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM. Predicting genetic modifier loci using functional gene networks. Genome Res 2010; 20:1143-53. [PMID: 20538624 DOI: 10.1101/gr.102749.109] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Most phenotypes are genetically complex, with contributions from mutations in many different genes. Mutations in more than one gene can combine synergistically to cause phenotypic change, and systematic studies in model organisms show that these genetic interactions are pervasive. However, in human association studies such nonadditive genetic interactions are very difficult to identify because of a lack of statistical power--simply put, the number of potential interactions is too vast. One approach to resolve this is to predict candidate modifier interactions between loci, and then to specifically test these for associations with the phenotype. Here, we describe a general method for predicting genetic interactions based on the use of integrated functional gene networks. We show that in both Saccharomyces cerevisiae and Caenorhabditis elegans a single high-coverage, high-quality functional network can successfully predict genetic modifiers for the majority of genes. For C. elegans we also describe the construction of a new, improved, and expanded functional network, WormNet 2. Using this network we demonstrate how it is possible to rapidly expand the number of modifier loci known for a gene, predicting and validating new genetic interactions for each of three signal transduction genes. We propose that this approach, termed network-guided modifier screening, provides a general strategy for predicting genetic interactions. This work thus suggests that a high-quality integrated human gene network will provide a powerful resource for modifier locus discovery in many different diseases.
Collapse
Affiliation(s)
- Insuk Lee
- Department of Biotechnology, College of Life science and Biotechnology, Yonsei University, Seodaemun-ku, Seoul 120-749, South Korea.
| | | | | | | | | | | |
Collapse
|
191
|
Freeman TC, Raza S, Theocharidis A, Ghazal P. The mEPN scheme: an intuitive and flexible graphical system for rendering biological pathways. BMC SYSTEMS BIOLOGY 2010; 4:65. [PMID: 20478018 PMCID: PMC2878301 DOI: 10.1186/1752-0509-4-65] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2009] [Accepted: 05/17/2010] [Indexed: 01/15/2023]
Abstract
BACKGROUND There is general agreement amongst biologists about the need for good pathway diagrams and a need to formalize the way biological pathways are depicted. However, implementing and agreeing how best to do this is currently the subject of some debate. RESULTS The modified Edinburgh Pathway Notation (mEPN) scheme is founded on a notation system originally devised a number of years ago and through use has now been refined extensively. This process has been primarily driven by the author's attempts to produce process diagrams for a diverse range of biological pathways, particularly with respect to immune signaling in mammals. Here we provide a specification of the mEPN notation, its symbols, rules for its use and a comparison to the proposed Systems Biology Graphical Notation (SBGN) scheme. CONCLUSIONS We hope this work will contribute to the on-going community effort to develop a standard for depicting pathways and will provide a coherent guide to those planning to construct pathway diagrams of their biological systems of interest.
Collapse
Affiliation(s)
- Tom C Freeman
- Division of Pathway Medicine, University of Edinburgh Medical School, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, EH25 9PS, UK
| | - Sobia Raza
- Division of Pathway Medicine, University of Edinburgh Medical School, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, EH25 9PS, UK
| | - Athanasios Theocharidis
- Division of Pathway Medicine, University of Edinburgh Medical School, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian, EH25 9PS, UK
| | - Peter Ghazal
- Division of Pathway Medicine, University of Edinburgh Medical School, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh, EH16 4SB, UK
- Centre for Systems Biology at Edinburgh, C H Waddington Building, King's Buildings, Mayfield Road, Edinburgh, EH9 3JU, UK
| |
Collapse
|
192
|
Raza S, McDerment N, Lacaze PA, Robertson K, Watterson S, Chen Y, Chisholm M, Eleftheriadis G, Monk S, O'Sullivan M, Turnbull A, Roy D, Theocharidis A, Ghazal P, Freeman TC. Construction of a large scale integrated map of macrophage pathogen recognition and effector systems. BMC SYSTEMS BIOLOGY 2010; 4:63. [PMID: 20470404 PMCID: PMC2892459 DOI: 10.1186/1752-0509-4-63] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2010] [Accepted: 05/14/2010] [Indexed: 11/24/2022]
Abstract
BACKGROUND In an effort to better understand the molecular networks that underpin macrophage activation we have been assembling a map of relevant pathways. Manual curation of the published literature was carried out in order to define the components of these pathways and the interactions between them. This information has been assembled into a large integrated directional network and represented graphically using the modified Edinburgh Pathway Notation (mEPN) scheme. RESULTS The diagram includes detailed views of the toll-like receptor (TLR) pathways, other pathogen recognition systems, NF-kappa-B, apoptosis, interferon signalling, MAP-kinase cascades, MHC antigen presentation and proteasome assembly, as well as selected views of the transcriptional networks they regulate. The integrated pathway includes a total of 496 unique proteins, the complexes formed between them and the processes in which they are involved. This produces a network of 2,170 nodes connected by 2,553 edges. CONCLUSIONS The pathway diagram is a navigable visual aid for displaying a consensus view of the pathway information available for these systems. It is also a valuable resource for computational modelling and aid in the interpretation of functional genomics data. We envisage that this work will be of value to those interested in macrophage biology and also contribute to the ongoing Systems Biology community effort to develop a standard notation scheme for the graphical representation of biological pathways.
Collapse
Affiliation(s)
- Sobia Raza
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian EH25 9PS, UK
| | - Neil McDerment
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian EH25 9PS, UK
| | - Paul A Lacaze
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Kevin Robertson
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- Centre for Systems Biology, University of Edinburgh, Darwin Building, King's Building Campus, Mayfield Road, Edinburgh EH9 3JU, UK
| | - Steven Watterson
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- Centre for Systems Biology, University of Edinburgh, Darwin Building, King's Building Campus, Mayfield Road, Edinburgh EH9 3JU, UK
| | - Ying Chen
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Michael Chisholm
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - George Eleftheriadis
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Stephanie Monk
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Maire O'Sullivan
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Arran Turnbull
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Douglas Roy
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
| | - Athanasios Theocharidis
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian EH25 9PS, UK
| | - Peter Ghazal
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- Centre for Systems Biology, University of Edinburgh, Darwin Building, King's Building Campus, Mayfield Road, Edinburgh EH9 3JU, UK
| | - Tom C Freeman
- Division of Pathway Medicine, University of Edinburgh, The Chancellor's Building, College of Medicine, 49 Little France Crescent, Edinburgh EH16 4SB, UK
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Roslin, Midlothian EH25 9PS, UK
| |
Collapse
|
193
|
Abstract
A comprehensive analysis of enriched functional categories in differentially expressed genes is important to extract the underlying biological processes of genome-wide expression profiles. Moreover, identification of the network of significant functional modules in these dynamic processes is an interesting challenge. This study introduces DynaMod, a web-based application that identifies significant functional modules reflecting the change of modularity and differential expressions that are correlated with gene expression profiles under different conditions. DynaMod allows the inspection of a wide variety of functional modules such as the biological pathways, transcriptional factor–target gene groups, microRNA–target gene groups, protein complexes and hub networks involved in protein interactome. The statistical significance of dynamic functional modularity is scored based on Z-statistics from the average of mutual information (MI) changes of involved gene pairs under different conditions. Significantly correlated gene pairs among the functional modules are used to generate a correlated network of functional categories. In addition to these main goals, this scoring strategy supports better performance to detect significant genes in microarray analyses, as the scores of correlated genes show the superior characteristics of the significance analysis compared with those of individual genes. DynaMod also offers cross-comparison between different analysis outputs. DynaMod is freely accessible at http://piech.kaist.ac.kr/dynamod.
Collapse
Affiliation(s)
- Choong-Hyun Sun
- Department of Computer Science, KAIST, Daejeon 305-701, South Korea
| | | | | | | |
Collapse
|
194
|
Kaake RM, Wang X, Huang L. Profiling of protein interaction networks of protein complexes using affinity purification and quantitative mass spectrometry. Mol Cell Proteomics 2010; 9:1650-65. [PMID: 20445003 DOI: 10.1074/mcp.r110.000265] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Protein-protein interactions are important for nearly all biological processes, and it is known that aberrant protein-protein interactions can lead to human disease and cancer. Recent evidence has suggested that protein interaction interfaces describe a new class of attractive targets for drug development. Full characterization of protein interaction networks of protein complexes and their dynamics in response to various cellular cues will provide essential information for us to understand how protein complexes work together in cells to maintain cell viability and normal homeostasis. Affinity purification coupled with quantitative mass spectrometry has become the primary method for studying in vivo protein interactions of protein complexes and whole organism proteomes. Recent developments in sample preparation and affinity purification strategies allow the capture, identification, and quantification of protein interactions of protein complexes that are stable, dynamic, transient, and/or weak. Current efforts have mainly focused on generating reliable, reproducible, and high confidence protein interaction data sets for functional characterization. The availability of increasing amounts of information on protein interactions in eukaryotic systems and new bioinformatics tools allow functional analysis of quantitative protein interaction data to unravel the biological significance of the identified protein interactions. Existing studies in this area have laid a solid foundation toward generating a complete map of in vivo protein interaction networks of protein complexes in cells or tissues.
Collapse
Affiliation(s)
- Robyn M Kaake
- Department of Physiology and Biophysics, University of California, Irvine, California 92697-4560, USA
| | | | | |
Collapse
|
195
|
Inference of functional relations in predicted protein networks with a machine learning approach. PLoS One 2010; 5:e9969. [PMID: 20376314 PMCID: PMC2848617 DOI: 10.1371/journal.pone.0009969] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 03/08/2010] [Indexed: 11/19/2022] Open
Abstract
Background Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.
Collapse
|
196
|
Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010; 67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Collapse
Affiliation(s)
- Ursula Hinz
- Swiss-Prot Group, Swiss Institute of Bioinformatics, 1 rue Michel Servet, 1211, Geneva, Switzerland.
| |
Collapse
|
197
|
Wiles AM, Doderer M, Ruan J, Gu TT, Ravi D, Blackman B, Bishop AJR. Building and analyzing protein interactome networks by cross-species comparisons. BMC SYSTEMS BIOLOGY 2010; 4:36. [PMID: 20353594 PMCID: PMC2859380 DOI: 10.1186/1752-0509-4-36] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 03/30/2010] [Indexed: 11/10/2022]
Abstract
Background A genomic catalogue of protein-protein interactions is a rich source of information, particularly for exploring the relationships between proteins. Numerous systems-wide and small-scale experiments have been conducted to identify interactions; however, our knowledge of all interactions for any one species is incomplete, and alternative means to expand these network maps is needed. We therefore took a comparative biology approach to predict protein-protein interactions across five species (human, mouse, fly, worm, and yeast) and developed InterologFinder for research biologists to easily navigate this data. We also developed a confidence score for interactions based on available experimental evidence and conservation across species. Results The connectivity of the resultant networks was determined to have scale-free distribution, small-world properties, and increased local modularity, indicating that the added interactions do not disrupt our current understanding of protein network structures. We show examples of how these improved interactomes can be used to analyze a genome-scale dataset (RNAi screen) and to assign new function to proteins. Predicted interactions within this dataset were tested by co-immunoprecipitation, resulting in a high rate of validation, suggesting the high quality of networks produced. Conclusions Protein-protein interactions were predicted in five species, based on orthology. An InteroScore, a score accounting for homology, number of orthologues with evidence of interactions, and number of unique observations of interactions, is given to each known and predicted interaction. Our website http://www.interologfinder.org provides research biologists intuitive access to this data.
Collapse
Affiliation(s)
- Amy M Wiles
- Greehey Children's Cancer Research Institute, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | | | | | | | | | | | | |
Collapse
|
198
|
Malik R, Dulla K, Nigg EA, Körner R. From proteome lists to biological impact--tools and strategies for the analysis of large MS data sets. Proteomics 2010; 10:1270-1283. [PMID: 20077408 DOI: 10.1002/pmic.200900365] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2009] [Accepted: 11/16/2009] [Indexed: 01/03/2025]
Abstract
MS has become a method-of-choice for proteome analysis, generating large data sets, which reflect proteome-scale protein-protein interaction and PTM networks. However, while a rapid growth in large-scale proteomics data can be observed, the sound biological interpretation of these results clearly lags behind. Therefore, combined efforts of bioinformaticians and biologists have been made to develop strategies and applications to help experimentalists perform this crucial task. This review presents an overview of currently available analytical strategies and tools to extract biologically relevant information from large protein lists. Moreover, we also present current research publications making use of these tools as examples of how the presented strategies may be incorporated into proteomic workflows. Emphasis is placed on the analysis of Gene Ontology terms, interaction networks, biological pathways and PTMs. In addition, topics including domain analysis and text mining are reviewed in the context of computational analysis of proteomic results. We expect that these types of analyses will significantly contribute to a deeper understanding of the role of individual proteins, protein networks and pathways in complex systems.
Collapse
Affiliation(s)
- Rainer Malik
- Max Planck Institute of Biochemistry, Department of Cell Biology, Martinsried, Germany
| | | | | | | |
Collapse
|
199
|
Martin A, Ochagavia ME, Rabasa LC, Miranda J, Fernandez-de-Cossio J, Bringas R. BisoGenet: a new tool for gene network building, visualization and analysis. BMC Bioinformatics 2010; 11:91. [PMID: 20163717 PMCID: PMC3098113 DOI: 10.1186/1471-2105-11-91] [Citation(s) in RCA: 267] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2009] [Accepted: 02/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The increasing availability and diversity of omics data in the post-genomic era offers new perspectives in most areas of biomedical research. Graph-based biological networks models capture the topology of the functional relationships between molecular entities such as gene, protein and small compounds and provide a suitable framework for integrating and analyzing omics-data. The development of software tools capable of integrating data from different sources and to provide flexible methods to reconstruct, represent and analyze topological networks is an active field of research in bioinformatics. RESULTS BisoGenet is a multi-tier application for visualization and analysis of biomolecular relationships. The system consists of three tiers. In the data tier, an in-house database stores genomics information, protein-protein interactions, protein-DNA interactions, gene ontology and metabolic pathways. In the middle tier, a global network is created at server startup, representing the whole data on bioentities and their relationships retrieved from the database. The client tier is a Cytoscape plugin, which manages user input, communication with the Web Service, visualization and analysis of the resulting network. CONCLUSION BisoGenet is able to build and visualize biological networks in a fast and user-friendly manner. A feature of Bisogenet is the possibility to include coding relations to distinguish between genes and their products. This feature could be instrumental to achieve a finer grain representation of the bioentities and their relationships. The client application includes network analysis tools and interactive network expansion capabilities. In addition, an option is provided to allow other networks to be converted to BisoGenet. This feature facilitates the integration of our software with other tools available in the Cytoscape platform. BisoGenet is available at http://bio.cigb.edu.cu/bisogenet-cytoscape/.
Collapse
|
200
|
Laurila K, Yli-Harja O, Lähdesmäki H. A protein-protein interaction guided method for competitive transcription factor binding improves target predictions. Nucleic Acids Res 2010; 37:e146. [PMID: 19786498 PMCID: PMC2794167 DOI: 10.1093/nar/gkp789] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
An important milestone in revealing cells' functions is to build a comprehensive understanding of transcriptional regulation processes. These processes are largely regulated by transcription factors (TFs) binding to DNA sites. Several TF binding site (TFBS) prediction methods have been developed, but they usually model binding of a single TF at a time albeit few methods for predicting binding of multiple TFs also exist. In this article, we propose a probabilistic model that predicts binding of several TFs simultaneously. Our method explicitly models the competitive binding between TFs and uses the prior knowledge of existing protein-protein interactions (PPIs), which mimics the situation in the nucleus. Modeling DNA binding for multiple TFs improves the accuracy of binding site prediction remarkably when compared with other programs and the cases where individual binding prediction results of separate TFs have been combined. The traditional TFBS prediction methods usually predict overwhelming number of false positives. This lack of specificity is overcome remarkably with our competitive binding prediction method. In addition, previously unpredictable binding sites can be detected with the help of PPIs. Source codes are available at http://www.cs.tut.fi/ approximately harrila/.
Collapse
Affiliation(s)
- Kirsti Laurila
- Department of Signal Processing, Tampere University of Technology, P.O. Box 527, FI-33101 Tampere, Finland
| | | | | |
Collapse
|