1
|
Dilucca M, Cimini G, Giansanti A. Bacterial Protein Interaction Networks: Connectivity is Ruled by Gene Conservation, Essentiality and Function. Curr Genomics 2021; 22:111-121. [PMID: 34220298 PMCID: PMC8188579 DOI: 10.2174/1389202922666210219110831] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 08/13/2020] [Accepted: 08/27/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Protein-protein interaction (PPI) networks are the backbone of all processes in living cells. In this work, we relate conservation, essentiality and functional repertoire of a gene to the connectivity k (i.e. the number of interactions, links) of the corresponding protein in the PPI network. METHODS On a set of 42 bacterial genomes of different sizes, and with reasonably separated evolutionary trajectories, we investigate three issues: i) whether the distribution of connectivities changes between PPI subnetworks of essential and nonessential genes; ii) how gene conservation, measured both by the evolutionary retention index (ERI) and by evolutionary pressures, is related to the connectivity of the corresponding protein; iii) how PPI connectivities are modulated by evolutionary and functional relationships, as represented by the Clusters of Orthologous Genes (COGs). RESULTS We show that conservation, essentiality and functional specialisation of genes constrain the connectivity of the corresponding proteins in bacterial PPI networks. In particular, we isolated a core of highly connected proteins (connectivities k≥40), which is ubiquitous among the species considered here, though mostly visible in the degree distributions of bacteria with small genomes (less than 1000 genes). CONCLUSION The genes that support this highly connected core are conserved, essential and, in most cases, belong to the COG cluster J, related to ribosomal functions and the processing of genetic information.
Collapse
Affiliation(s)
- Maddalena Dilucca
- Dipartimento di Fisica, Sapienza University of Rome, 00185, Rome, Italy
| | - Giulio Cimini
- Dipartimento di Fisica, Tor Vergata University of Rome, 00133, Rome, Italy Istituto dei Sistemi Complessi CNR UoS, Rome, Italy
| | - Andrea Giansanti
- Dipartimento di Fisica, Sapienza University of Rome, 00185, Rome, Italy INFN Roma1 Unit, Rome, Italy
| |
Collapse
|
2
|
Chen W, Li W, Huang G, Flavel M. The Applications of Clustering Methods in Predicting Protein Functions. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666181212114612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The understanding of protein function is essential to the study of biological
processes. However, the prediction of protein function has been a difficult task for bioinformatics to
overcome. This has resulted in many scholars focusing on the development of computational methods
to address this problem.
Objective:
In this review, we introduce the recently developed computational methods of protein function
prediction and assess the validity of these methods. We then introduce the applications of clustering
methods in predicting protein functions.
Collapse
Affiliation(s)
- Weiyang Chen
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Weiwei Li
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Guohua Huang
- College of Information Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Matthew Flavel
- School of Life Sciences, La Trobe University, Bundoora, Vic 3083, Australia
| |
Collapse
|
3
|
Kasavi C, Eraslan S, Arga KY, Oner ET, Kirdar B. A system based network approach to ethanol tolerance in Saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2014; 8:90. [PMID: 25103914 PMCID: PMC4236716 DOI: 10.1186/s12918-014-0090-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 07/15/2014] [Indexed: 01/23/2023]
Abstract
Background Saccharomyces cerevisiae has been widely used for bio-ethanol production and development of rational genetic engineering strategies leading both to the improvement of productivity and ethanol tolerance is very important for cost-effective bio-ethanol production. Studies on the identification of the genes that are up- or down-regulated in the presence of ethanol indicated that the genes may be involved to protect the cells against ethanol stress, but not necessarily required for ethanol tolerance. Results In the present study, a novel network based approach was developed to identify candidate genes involved in ethanol tolerance. Protein-protein interaction (PPI) network associated with ethanol tolerance (tETN) was reconstructed by integrating PPI data with Gene Ontology (GO) terms. Modular analysis of the constructed networks revealed genes with no previously reported experimental evidence related to ethanol tolerance and resulted in the identification of 17 genes with previously unknown biological functions. We have randomly selected four of these genes and deletion strains of two genes (YDR307W and YHL042W) were found to exhibit improved tolerance to ethanol when compared to wild type strain. The genome-wide transcriptomic response of yeast cells to the deletions of YDR307W and YHL042W in the absence of ethanol revealed that the deletion of YDR307W and YHL042W genes resulted in the transcriptional re-programming of the metabolism resulting from a mis-perception of the nutritional environment. Yeast cells perceived an excess amount of glucose and a deficiency of methionine or sulfur in the absence of YDR307W and YHL042W, respectively, possibly resulting from a defect in the nutritional sensing and signaling or transport mechanisms. Mutations leading to an increase in ribosome biogenesis were found to be important for the improvement of ethanol tolerance. Modulations of chronological life span were also identified to contribute to ethanol tolerance in yeast. Conclusions The system based network approach developed allows the identification of novel gene targets for improved ethanol tolerance and supports the highly complex nature of ethanol tolerance in yeast.
Collapse
Affiliation(s)
| | | | | | | | - Betul Kirdar
- Department of Chemical Engineering, Boğaziçi University, Istanbul, Turkey.
| |
Collapse
|
4
|
Fadhal E, Mwambene EC, Gamieldien J. Modelling human protein interaction networks as metric spaces has potential in disease research and drug target discovery. BMC SYSTEMS BIOLOGY 2014; 8:68. [PMID: 24929653 PMCID: PMC4088370 DOI: 10.1186/1752-0509-8-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 06/04/2014] [Indexed: 01/06/2023]
Abstract
Background We have recently shown by formally modelling human protein interaction networks (PINs) as metric spaces and classified proteins into zones based on their distance from the topological centre that hub proteins are primarily centrally located. We also showed that zones closest to the network centre are enriched for critically important proteins and are also functionally very specialised for specific ‘house keeping’ functions. We proposed that proteins closest to the network centre may present good therapeutic targets. Here, we present multiple pieces of novel functional evidence that provides strong support for this hypothesis. Results We found that the human PINs has a highly connected signalling core, with the majority of proteins involved in signalling located in the two zones closest to the topological centre. The majority of essential, disease related, tumour suppressor, oncogenic and approved drug target proteins were found to be centrally located. Similarly, the majority of proteins consistently expressed in 13 types of cancer are also predominantly located in zones closest to the centre. Proteins from zones 1 and 2 were also found to comprise the majority of proteins in key KEGG pathways such as MAPK-signalling, the cell cycle, apoptosis and also pathways in cancer, with very similar patterns seen in pathways that lead to cancers such as melanoma and glioma, and non-neoplastic diseases such as measles, inflammatory bowel disease and Alzheimer’s disease. Conclusions Based on the diversity of evidence uncovered, we propose that when considered holistically, proteins located centrally in the human PINs that also have similar functions to existing drug targets are good candidate targets for novel therapeutics. Similarly, since disease pathways are dominated by centrally located proteins, candidates shortlisted in genome scale disease studies can be further prioritized and contextualised based on whether they occupy central positions in the human PINs.
Collapse
Affiliation(s)
| | | | - Junaid Gamieldien
- South African National Bioinformatics Institute/ MRC Unit for Bioinformatics Capacity Development, University of the Western Cape, Bellville 7530, South Africa.
| |
Collapse
|
5
|
Rende D, Baysal N, Kirdar B. Complex disease interventions from a network model for type 2 diabetes. PLoS One 2013; 8:e65854. [PMID: 23776558 PMCID: PMC3679160 DOI: 10.1371/journal.pone.0065854] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 05/02/2013] [Indexed: 12/20/2022] Open
Abstract
There is accumulating evidence that the proteins encoded by the genes associated with a common disorder interact with each other, participate in similar pathways and share GO terms. It has been anticipated that the functional modules in a disease related functional linkage network are informative to reveal significant metabolic processes and disease's associations with other complex disorders. In the current study, Type 2 diabetes associated functional linkage network (T2DFN) containing 2770 proteins and 15041 linkages was constructed. The functional modules in this network were scored and evaluated in terms of shared pathways, co-localization, co-expression and associations with similar diseases. The assembly of top scoring overlapping members in the functional modules revealed that, along with the well known biological pathways, circadian rhythm, diverse actions of nuclear receptors in steroid and retinoic acid metabolisms have significant occurrence in the pathophysiology of the disease. The disease's association with other metabolic and neuromuscular disorders was established through shared proteins. Nuclear receptor NRIP1 has a pivotal role in lipid and carbohydrate metabolism, indicating the need to investigate subsequent effects of NRIP1 on Type 2 diabetes. Our study also revealed that CREB binding protein (CREBBP) and cardiotrophin-1 (CTF1) have suggestive roles in linking Type 2 diabetes and neuromuscular diseases.
Collapse
Affiliation(s)
- Deniz Rende
- Department of Materials Science and Engineering, Rensselaer Polytechnic Institute, Troy, New York, United States of America.
| | | | | |
Collapse
|
6
|
Rende D, Baysal N, Kirdar B. A novel integrative network approach to understand the interplay between cardiovascular disease and other complex disorders. MOLECULAR BIOSYSTEMS 2011; 7:2205-19. [PMID: 21559538 DOI: 10.1039/c1mb05064h] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
There is accumulating evidence that the proteins encoded by the genes associated with a common disorder interact with each other, participate in similar pathways and share GO terms. It has been anticipated that the functional modules in a disease related functional linkage network can be integrated with bibliomics to reveal association with other complex disorders. In this study, the cardiovascular disease functional linkage network (CFN) containing 1536 nodes and 3345 interactions was constructed using proteins encoded by 234 genes associated with the disease. Integration of CFN with bibliomics showed that 227 out of 566 functional modules are significantly associated with one or more diseases. Analysis of functional modules revealed the possible regulatory roles of SP1 and CXCL12 in the pathogenesis of cardiovascular disease (CVD) and modulation of their activities may be considered as potential therapeutic tools. The integration of CFN with bibliomics also indicated significant relations of CVD with other complex disorders. In a stratified map the members of 227 functional modules and 58 diseases in 15 disease classes were combined. In this map, leprosy, listeria monocytogenes, myasthenia, hemorrhagic diathesis and Protein S deficiency, which were not previously reported to be associated with CVD, showed significant associations. Several cancers arising from epithelial cells were also found to be linked to other diseases through hub proteins, VEGFA and PTGS2.
Collapse
Affiliation(s)
- Deniz Rende
- Rensselaer Nanotechnology Center, Rensselaer Polytechnic Institute, Troy, NY12180, USA.
| | | | | |
Collapse
|
7
|
Ranea JAG, Morilla I, Lees JG, Reid AJ, Yeats C, Clegg AB, Sanchez-Jimenez F, Orengo C. Finding the "dark matter" in human and yeast protein network prediction and modelling. PLoS Comput Biol 2010; 6:e1000945. [PMID: 20885791 PMCID: PMC2944794 DOI: 10.1371/journal.pcbi.1000945] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2009] [Accepted: 08/30/2010] [Indexed: 11/17/2022] Open
Abstract
Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter" of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions.
Collapse
Affiliation(s)
- Juan A. G. Ranea
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Ian Morilla
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Jon G. Lees
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Adam J. Reid
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Corin Yeats
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Andrew B. Clegg
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Francisca Sanchez-Jimenez
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Christine Orengo
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
8
|
Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009; 5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open
Abstract
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Collapse
Affiliation(s)
- Sonia M. Leach
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Hannah Tipney
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Weiguo Feng
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - William A. Baumgartner
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Priyanka Kasliwal
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Ronald P. Schuyler
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Trevor Williams
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Richard A. Spritz
- Human Medical Genetics Program, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
- * E-mail:
| |
Collapse
|
9
|
Wang Z, Chen Q, Liu L. Relationship between topology and functions in metabolic network evolution. Sci Bull (Beijing) 2009. [DOI: 10.1007/s11434-009-0072-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
10
|
Protein evolution on a human signaling network. BMC SYSTEMS BIOLOGY 2009; 3:21. [PMID: 19226461 PMCID: PMC2649034 DOI: 10.1186/1752-0509-3-21] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Accepted: 02/18/2009] [Indexed: 11/30/2022]
Abstract
Background The architectural structure of cellular networks provides a framework for innovations as well as constraints for protein evolution. This issue has previously been studied extensively by analyzing protein interaction networks. However, it is unclear how signaling networks influence and constrain protein evolution and conversely, how protein evolution modifies and shapes the functional consequences of signaling networks. In this study, we constructed a human signaling network containing more than 1,600 nodes and 5,000 links through manual curation of signaling pathways, and analyzed the dN/dS values of human-mouse orthologues on the network. Results We revealed that the protein dN/dS value decreases along the signal information flow from the extracellular space to nucleus. In the network, neighbor proteins tend to have similar dN/dS ratios, indicating neighbor proteins have similar evolutionary rates: co-fast or co-slow. However, different types of relationships (activating, inhibitory and neutral) between proteins have different effects on protein evolutionary rates, i.e., physically interacting protein pairs have the closest evolutionary rates. Furthermore, for directed shortest paths, the more distant two proteins are, the less chance they share similar evolutionary rates. However, such behavior was not observed for neutral shortest paths. Fast evolving signaling proteins have two modes of evolution: immunological proteins evolve more independently, while apoptotic proteins tend to form network components with other signaling proteins and share more similar evolutionary rates, possibly enhancing rapid information exchange between apoptotic and other signaling pathways. Conclusion Major network constraints on protein evolution in protein interaction networks previously described have been found for signaling networks. We further uncovered how network characteristics affect the evolutionary and co-evolutionary behavior of proteins and how protein evolution can modify the existing functionalities of signaling networks. These new insights provide some general principles for understanding protein evolution in the context of signaling networks.
Collapse
|
11
|
Karimpour-Fard A, Leach SM, Gill RT, Hunter LE. Predicting protein linkages in bacteria: which method is best depends on task. BMC Bioinformatics 2008; 9:397. [PMID: 18816389 PMCID: PMC2570368 DOI: 10.1186/1471-2105-9-397] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2008] [Accepted: 09/24/2008] [Indexed: 01/06/2023] Open
Abstract
Background Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. Results Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. Conclusion A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA.
| | | | | | | |
Collapse
|