1
|
Avecilla G, Spealman P, Matthews J, Caudal E, Schacherer J, Gresham D. Copy number variation alters local and global mutational tolerance. Genome Res 2023; 33:1340-1353. [PMID: 37652668 PMCID: PMC10547251 DOI: 10.1101/gr.277625.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 07/07/2023] [Indexed: 09/02/2023]
Abstract
Copy number variants (CNVs), duplications and deletions of genomic sequences, contribute to evolutionary adaptation but can also confer deleterious effects and cause disease. Whereas the effects of amplifying individual genes or whole chromosomes (i.e., aneuploidy) have been studied extensively, much less is known about the genetic and functional effects of CNVs of differing sizes and structures. Here, we investigated Saccharomyces cerevisiae (yeast) strains that acquired adaptive CNVs of variable structures and copy numbers following experimental evolution in glutamine-limited chemostats. Although beneficial in the selective environment, CNVs result in decreased fitness compared with the euploid ancestor in rich media. We used transposon mutagenesis to investigate mutational tolerance and genome-wide genetic interactions in CNV strains. We find that CNVs increase mutational target size, confer increased mutational tolerance in amplified essential genes, and result in novel genetic interactions with unlinked genes. We validated a novel genetic interaction between different CNVs and BMH1 that was common to multiple strains. We also analyzed global gene expression and found that transcriptional dosage compensation does not affect most genes amplified by CNVs, although gene-specific transcriptional dosage compensation does occur for ∼12% of amplified genes. Furthermore, we find that CNV strains do not show previously described transcriptional signatures of aneuploidy. Our study reveals the extent to which local and global mutational tolerance is modified by CNVs with implications for genome evolution and CNV-associated diseases, such as cancer.
Collapse
Affiliation(s)
- Grace Avecilla
- Department of Biology, New York University, New York, New York 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA
| | - Pieter Spealman
- Department of Biology, New York University, New York, New York 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA
| | - Julia Matthews
- Department of Biology, New York University, New York, New York 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA
| | - Elodie Caudal
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
- Institut Universitaire de France (IUF), 75231 Paris Cedex 05, France
| | - David Gresham
- Department of Biology, New York University, New York, New York 10003, USA;
- Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA
| |
Collapse
|
2
|
|
3
|
Predicting Essential Proteins Based on Integration of Local Fuzzy Fractal Dimension and Subcellular Location Information. Genes (Basel) 2022; 13:genes13020173. [PMID: 35205217 PMCID: PMC8872415 DOI: 10.3390/genes13020173] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 01/08/2022] [Accepted: 01/12/2022] [Indexed: 11/17/2022] Open
Abstract
Essential proteins are indispensable to cells’ survival and development. Prediction and analysis of essential proteins are crucial for uncovering the mechanisms of cells. With the help of computer science and high-throughput technologies, forecasting essential proteins by protein–protein interaction (PPI) networks has become more efficient than traditional approaches (expensive experimental methods are generally used). Many computational algorithms were employed to predict the essential proteins; however, they have various restrictions. To improve the prediction accuracy, by introducing the Local Fuzzy Fractal Dimension (LFFD) of complex networks into the analysis of the PPI network, we propose a novel algorithm named LDS, which combines the LFFD of the PPI network with the protein subcellular location information. By testing the proposed LDS algorithm on three different yeast PPI networks, the experimental results show that LDS outperforms some state-of-the-art essential protein-prediction techniques.
Collapse
|
4
|
Protein Integrated Network Analysis to Reveal Potential Drug Targets Against Extended Drug-Resistant Mycobacterium tuberculosis XDR1219. Mol Biotechnol 2021; 63:1252-1267. [PMID: 34382159 DOI: 10.1007/s12033-021-00377-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
The reconstruction and analysis of the protein-protein interaction (PPI) network is a powerful approach to understand the complex biological and molecular functions in normal and disease states of the cell. The interactome of most organisms is largely unidentified except some model organisms. The current study focused on the construction of PPI network for the human pathogen Mycobacterium tuberculosis (MTB)-resistant strain XDR1219 using computational methods. In this work, a bioinformatics approach was employed to reveal potential drug targets. The pipeline adopted the combination of an extensive integrated network analysis that led to identify 22 key proteins involved in drug resistance, resistant metabolic pathways, virulence, pathogenesis and persistency of the infection. The MTB XDR1219 interactome consists of 11,383 non-redundant PPIs among 1499 proteins covering 38% of the entire MTB XDR1219 proteome. The overall quality of the network was assessed and topological parameters of the PPI were calculated. The predicted interactions were functionally annotated and their relevance was assessed with the functional similarity. The study attempts to present the interactome of previously unidentified MTB XDR1219 and revealed potential drug targets that can be further explored by scientific community.
Collapse
|
5
|
Wang Y, Li Z, Zhang Y, Ma Y, Huang Q, Chen X, Dai Z, Zou X. Performance improvement for a 2D convolutional neural network by using SSC encoding on protein-protein interaction tasks. BMC Bioinformatics 2021; 22:184. [PMID: 33845759 PMCID: PMC8042949 DOI: 10.1186/s12859-021-04111-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 03/30/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The interactions of proteins are determined by their sequences and affect the regulation of the cell cycle, signal transduction and metabolism, which is of extraordinary significance to modern proteomics research. Despite advances in experimental technology, it is still expensive, laborious, and time-consuming to determine protein-protein interactions (PPIs), and there is a strong demand for effective bioinformatics approaches to identify potential PPIs. Considering the large amount of PPI data, a high-performance processor can be utilized to enhance the capability of the deep learning method and directly predict protein sequences. RESULTS We propose the Sequence-Statistics-Content protein sequence encoding format (SSC) based on information extraction from the original sequence for further performance improvement of the convolutional neural network. The original protein sequences are encoded in the three-channel format by introducing statistical information (the second channel) and bigram encoding information (the third channel), which can increase the unique sequence features to enhance the performance of the deep learning model. On predicting protein-protein interaction tasks, the results using the 2D convolutional neural network (2D CNN) with the SSC encoding method are better than those of the 1D CNN with one hot encoding. The independent validation of new interactions from the HIPPIE database (version 2.1 published on July 18, 2017) and the validation of directly predicted results by applying a molecular docking tool indicate the effectiveness of the proposed protein encoding improvement in the CNN model. CONCLUSION The proposed protein sequence encoding method is efficient at improving the capability of the CNN model on protein sequence-related tasks and may also be effective at enhancing the capability of other machine learning or deep learning methods. Prediction accuracy and molecular docking validation showed considerable improvement compared to the existing hot encoding method, indicating that the SSC encoding method may be useful for analyzing protein sequence-related tasks. The source code of the proposed methods is freely available for academic research at https://github.com/wangy496/SSC-format/ .
Collapse
Affiliation(s)
- Yang Wang
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Zhanchao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Yanfei Zhang
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Yingjun Ma
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
| | - Qixing Huang
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Xingyu Chen
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou, 510006, People's Republic of China
| | - Zong Dai
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China
- Research Institute of Sun Yat-Sen University in Shenzhen, Shenzhen, 518000, People's Republic of China
| | - Xiaoyong Zou
- School of Chemistry, Sun Yat-Sen University, Guangzhou, 510275, People's Republic of China.
- Research Institute of Sun Yat-Sen University in Shenzhen, Shenzhen, 518000, People's Republic of China.
| |
Collapse
|
6
|
Li Z, Jiang H, Kong L, Chen Y, Lang K, Fan X, Zhang L, Pian C. Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species. PLoS Comput Biol 2021; 17:e1008767. [PMID: 33600435 PMCID: PMC7924747 DOI: 10.1371/journal.pcbi.1008767] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 03/02/2021] [Accepted: 02/03/2021] [Indexed: 12/25/2022] Open
Abstract
N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression. DNA N6 methyladenine (6mA) is a newly recognized methylation modification in eukaryotes. It exists widely and conservatively in organisms, and its modification level changes dynamically in the whole life cycle. This study proposes an algorithm based on a deep learning framework including LSTM and CNN to predict 6mA sites. The results showed that our method could accurately predict the 6mA sites in different species, which means DNA sub-sequences containing 6mA sites among species have certain conservation. Importantly, we found that 6mA methylation in most different species is more likely to occur on the GAGG motif. In addition, we also found that 6mA is rich in the promoter’s TATA box, which may be a mechanism of regulating downstream gene expression.
Collapse
Affiliation(s)
- Zutan Li
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Lingpeng Kong
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
| | - Kun Lang
- College of information science & Technology, Nanjing Agricultural University, Nanjing, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| | - Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, China
- * E-mail: (LYZ); (CP)
| |
Collapse
|
7
|
Identifying patient-specific flow of signal transduction perturbed by multiple single-nucleotide alterations. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-020-0227-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
8
|
Lu WC, Xie H, Yuan C, Li JJ, Li ZY, Wu AH. Genomic landscape of the immune microenvironments of brain metastases in breast cancer. J Transl Med 2020; 18:327. [PMID: 32867782 PMCID: PMC7461335 DOI: 10.1186/s12967-020-02503-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 08/26/2020] [Indexed: 01/19/2023] Open
Abstract
Background This study was intended to investigate the genomic landscape of the immune microenvironments of brain metastases in breast cancer. Methods Three gene expression profile datasets (GSE76714, GSE125989 and GSE43837) of breast cancer with brain metastases were downloaded from Gene Expression Omnibus (GEO) database. After differential expression analysis, the tumor immune microenvironment and immune cell infiltration were analyzed. Then immune-related genes were identified, followed by function analysis, transcription factor (TF)-miRNA–mRNA co-regulatory network analysis, and survival analysis of metastatic recurrence. Results The present results showed that the tumor immune microenvironment in brain metastases was immunosuppressed compared with primary caner. Compared with primary cancer samples, the infiltration ratio of plasma cells in brain metastases samples was significantly higher, while the infiltration ratio of macrophages M2 cells in brain metastases samples was significantly lower. Total 42 immune-related genes were identified, such as THY1 and NEU2. CD1B, THY1 and DOCK2 were found to be implicated in the metastatic recurrence of breast cancer. Conclusions Targeting macrophages or plasma cells may be new strategies for immunotherapy of breast cancer with brain metastases. THY1 and NEU2 may be potential therapeutic targets for breast cancer with brain metastases, and THY1, CD1B and DOCK2 may serve as potential prognostic markers for improvement of brain metastases survival.
Collapse
Affiliation(s)
- Wei-Cheng Lu
- Department of Neurosurgery, First Affiliated Hospital of China Medical University, Shenyang, Liaoning, China
| | - Hui Xie
- Department of Histology and Embryology, College of Basic Medicine, Shenyang Medical College, Shenyang, Liaoning, China
| | - Ce Yuan
- Graduate Program in Bioinformatics and Computational Biology, University of Minnesota, Minneapolis, USA
| | - Jin-Jiang Li
- Department of Neurosurgery, General Hospital of Northern Theater Command, Shenyang, Liaoning, China
| | - Zhao-Yang Li
- Department of Laboratory Animal Center, China Medical University, Shenyang, Liaoning, China
| | - An-Hua Wu
- Department of Neurosurgery, First Affiliated Hospital of China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
9
|
Li M, Ni P, Chen X, Wang J, Wu FX, Pan Y. Construction of Refined Protein Interaction Network for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1386-1397. [PMID: 28186903 DOI: 10.1109/tcbb.2017.2665482] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Identification of essential proteins based on protein interaction network (PIN) is a very important and hot topic in the post genome era. Up to now, a number of network-based essential protein discovery methods have been proposed. Generally, a static protein interaction network was constructed by using the protein-protein interactions obtained from different experiments or databases. Unfortunately, most of the network-based essential protein discovery methods are sensitive to the reliability of the constructed PIN. In this paper, we propose a new method for constructing refined PIN by using gene expression profiles and subcellular location information. The basic idea behind refining the PIN is that two proteins should have higher possibility to physically interact with each other if they appear together at the same subcellular location and are active together at least at a time point in the cell cycle. The original static PIN is denoted by S-PIN while the final PIN refined by our method is denoted by TS-PIN. To evaluate whether the constructed TS-PIN is more suitable to be used in the identification of essential proteins, 10 network-based essential protein discovery methods (DC, EC, SC, BC, CC, IC, LAC, NC, BN, and DMNC) are applied on it to identify essential proteins. A comparison of TS-PIN and two other networks: S-PIN and NF-APIN (a noise-filtered active PIN constructed by using gene expression data and S-PIN) is implemented on the prediction of essential proteins by using these ten network-based methods. The comparison results show that all of the 10 network-based methods achieve better results when being applied on TS-PIN than that being applied on S-PIN and NF-APIN.
Collapse
|
10
|
Sadhukhan P, Palit S. Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.08.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Zhao B, Zhao Y, Zhang X, Zhang Z, Zhang F, Wang L. An iteration method for identifying yeast essential proteins from heterogeneous network. BMC Bioinformatics 2019; 20:355. [PMID: 31234779 PMCID: PMC6591974 DOI: 10.1186/s12859-019-2930-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/04/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Essential proteins are distinctly important for an organism's survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods. RESULTS In this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods. CONCLUSIONS We demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.
Collapse
Affiliation(s)
- Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, Hunan 410022 China
| | - Yulin Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Xiaoxia Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Fan Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, Hunan 410022 People’s Republic of China
- College of Information Engineering, Xiangtan University, Xiangtan, 411105 Hunan China
| |
Collapse
|
12
|
Alshabi AM, Vastrad B, Shaikh IA, Vastrad C. Identification of Crucial Candidate Genes and Pathways in Glioblastoma Multiform by Bioinformatics Analysis. Biomolecules 2019; 9:biom9050201. [PMID: 31137733 PMCID: PMC6571969 DOI: 10.3390/biom9050201] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/17/2019] [Accepted: 05/23/2019] [Indexed: 02/07/2023] Open
Abstract
The present study aimed to investigate the molecular mechanisms underlying glioblastoma multiform (GBM) and its biomarkers. The differentially expressed genes (DEGs) were diagnosed using the limma software package. The ToppGene (ToppFun) was used to perform pathway and Gene Ontology (GO) enrichment analysis of the DEGs. Protein-protein interaction (PPI) networks, extracted modules, miRNA-target genes regulatory network and TF-target genes regulatory network were used to obtain insight into the actions of DEGs. Survival analysis for DEGs was carried out. A total of 590 DEGs, including 243 up regulated and 347 down regulated genes, were diagnosed between scrambled shRNA expression and Lin7A knock down. The up-regulated genes were enriched in ribosome, mitochondrial translation termination, translation, and peptide biosynthetic process. The down-regulated genes were enriched in focal adhesion, VEGFR3 signaling in lymphatic endothelium, extracellular matrix organization, and extracellular matrix. The current study screened the genes in the PPI network, extracted modules, miRNA-target genes regulatory network, and TF-target genes regulatory network with higher degrees as hub genes, which included NPM1, CUL4A, YIPF1, SHC1, AKT1, VLDLR, RPL14, P3H2, DTNA, FAM126B, RPL34, and MYL5. Survival analysis indicated that the high expression of RPL36A and MRPL35 were predicting longer survival of GBM, while high expression of AP1S1 and AKAP12 were predicting shorter survival of GBM. High expression of RPL36A and AP1S1 were associated with pathogenesis of GBM, while low expression of ALPL was associated with pathogenesis of GBM. In conclusion, the current study diagnosed DEGs between scrambled shRNA expression and Lin7A knock down samples, which could improve our understanding of the molecular mechanisms in the progression of GBM, and these crucial as well as new diagnostic markers might be used as therapeutic targets for GBM.
Collapse
Affiliation(s)
- Ali Mohamed Alshabi
- Department of Clinical Pharmacy, College of Pharmacy, Najran University, Najran 61441, Saudi Arabia.
| | - Basavaraj Vastrad
- Department of Pharmaceutics, SET`S College of Pharmacy, Dharwad, Karnataka 580002, India.
| | - Ibrahim Ahmed Shaikh
- Department of Pharmacology, College of Pharmacy, Najran University, Najran 61441, Saudi Arabia.
| | - Chanabasayya Vastrad
- Biostatistics and Bioinformatics, Chanabasava Nilaya, Bharthinagar, Dharwad 580001, Karnataka, India.
| |
Collapse
|
13
|
Joshi H, Vastrad B, Vastrad C. Identification of Important Invasion-Related Genes in Non-functional Pituitary Adenomas. J Mol Neurosci 2019; 68:565-589. [PMID: 30982163 DOI: 10.1007/s12031-019-01318-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/29/2019] [Indexed: 12/18/2022]
Abstract
Non-functioning pituitary adenomas (NFPAs) are locally invasive with high morbidity. The objective of this study was to diagnose important genes and pathways related to the invasiveness of NFPAs and gain more insights into the underlying molecular mechanisms of NFPAs. The gene expression profiles of GSE51618 were downloaded from the Gene Expression Omnibus database with 4 non-invasive NFPA samples, 3 invasive NFPA samples, and 3 normal pituitary gland samples. Differentially expressed genes (DEGs) are screened between invasive NFPA samples and normal pituitary gland samples, followed by pathway and ontology (GO) enrichment analyses. Subsequently, a protein-protein interaction (PPI) network was constructed and analyzed for these DEGs, and module analysis was performed. In addition, a target gene-miRNA network and target gene-TF (transcription factor) network were analyzed for these DEGs. A total of 879 DEGs were obtained. Among them, 439 genes were upregulated and 440 genes were downregulated. Pathway enrichment analysis indicated that the upregulated genes were significantly enriched in cysteine biosynthesis/homocysteine degradation (trans-sulfuration) and PI3K-Akt signaling pathway, while the downregulated genes were mainly associated with docosahexaenoate biosynthesis III (mammals) and chemokine signaling pathway. GO enrichment analysis indicated that the upregulated genes were significantly enriched in animal organ morphogenesis, extracellular matrix, and hormone activity, while the downregulated genes were mainly associated with leukocyte chemotaxis, dendrites, and RAGE receptor binding. Subsequently, ESR1, SOX2, TTN, GFAP, WIF1, TTR, XIST, SPAG5, PPBP, AR, IL1R2, and HIST1H1C were diagnosed as the top hub genes in the upregulated and downregulated PPI networks and modules. In addition, HS3ST1, GPC4, CCND2, and SCD were diagnosed as the top hub genes in the upregulated and downregulated target gene-miRNA networks, while CISH, ISLR, UBE2E3, and CCNG2 were diagnosed as the top hub genes in the upregulated and downregulated target gene-TF networks. The new important DEGs and pathways diagnosed in this study may serve key roles in the invasiveness of NFPAs and indicate more molecular targets for the treatment of NFPAs.
Collapse
Affiliation(s)
- Harish Joshi
- Endocrine and Diabetes Care Center, Hubli, Karnataka, 5800029, India
| | - Basavaraj Vastrad
- Department of Pharmaceutics, SET'S College of Pharmacy, Dharwad, Karnataka, 580002, India
| | - Chanabasayya Vastrad
- Biostatistics and Bioinformatics, Chanabasava Nilaya, Bharthinagar, Dharwad, Karnataka, 580001, India.
| |
Collapse
|
14
|
Abstract
Background:
Essential proteins play important roles in the survival or reproduction of
an organism and support the stability of the system. Essential proteins are the minimum set of
proteins absolutely required to maintain a living cell. The identification of essential proteins is a
very important topic not only for a better comprehension of the minimal requirements for cellular
life, but also for a more efficient discovery of the human disease genes and drug targets.
Traditionally, as the experimental identification of essential proteins is complex, it usually requires
great time and expense. With the cumulation of high-throughput experimental data, many
computational methods that make useful complements to experimental methods have been
proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify
essential proteins is of great significance for discovering disease genes and drug design, and has
great potential for applications in basic and synthetic biology research.
Objective:
The aim of this paper is to provide a review on the identification of essential proteins
and genes focusing on the current developments of different types of computational methods, point
out some progress and limitations of existing methods, and the challenges and directions for
further research are discussed.
Collapse
Affiliation(s)
- Ming Fang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Ling Guo
- College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China
| |
Collapse
|
15
|
Azhagesan K, Ravindran B, Raman K. Network-based features enable prediction of essential genes across diverse organisms. PLoS One 2018; 13:e0208722. [PMID: 30543651 PMCID: PMC6292609 DOI: 10.1371/journal.pone.0208722] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 11/21/2018] [Indexed: 12/19/2022] Open
Abstract
Machine learning approaches to predict essential genes have gained a lot of traction in recent years. These approaches predominantly make use of sequence and network-based features to predict essential genes. However, the scope of network-based features used by the existing approaches is very narrow. Further, many of these studies focus on predicting essential genes within the same organism, which cannot be readily used to predict essential genes across organisms. Therefore, there is clearly a need for a method that is able to predict essential genes across organisms, by leveraging network-based features. In this study, we extract several sets of network-based features from protein-protein association networks available from the STRING database. Our network features include some common measures of centrality, and also some novel recursive measures recently proposed in social network literature. We extract hundreds of network-based features from networks of 27 diverse organisms to predict the essentiality of 87000+ genes. Our results show that network-based features are statistically significantly better at classifying essential genes across diverse bacterial species, compared to the current state-of-the-art methods, which use mostly sequence and a few 'conventional' network-based features. Our diverse set of network properties gave an AUROC of 0.847 and a precision of 0.320 across 27 organisms. When we augmented the complete set of network features with sequence-derived features, we achieved an improved AUROC of 0.857 and a precision of 0.335. We also constructed a reduced set of 100 sequence and network features, which gave a comparable performance. Further, we show that our features are useful for predicting essential genes in new organisms by using leave-one-species-out validation. Our network features capture the local, global and neighbourhood properties of the network and are hence effective for prediction of essential genes across diverse organisms, even in the absence of other complex biological knowledge. Our approach can be readily exploited to predict essentiality for organisms in interactome databases such as the STRING, where both network and sequence are readily available. All codes are available at https://github.com/RamanLab/nbfpeg.
Collapse
Affiliation(s)
- Karthik Azhagesan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai – 600 036, India
- Initiative for Biological Systems Engineering (IBSE), IIT Madras, Chennai – 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai – 600 036, India
| | - Balaraman Ravindran
- Department of Computer Science and Engineering, IIT Madras, Chennai – 600 036, India
- Initiative for Biological Systems Engineering (IBSE), IIT Madras, Chennai – 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai – 600 036, India
- * E-mail: (BR); (KR)
| | - Karthik Raman
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology (IIT) Madras, Chennai – 600 036, India
- Initiative for Biological Systems Engineering (IBSE), IIT Madras, Chennai – 600 036, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai – 600 036, India
- * E-mail: (BR); (KR)
| |
Collapse
|
16
|
Shokri-Gharelo R, Noparvar PM. Molecular response of canola to salt stress: insights on tolerance mechanisms. PeerJ 2018; 6:e4822. [PMID: 29844974 PMCID: PMC5969047 DOI: 10.7717/peerj.4822] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 05/02/2018] [Indexed: 01/16/2023] Open
Abstract
Canola (Brassica napus L.) is widely cultivated around the world for the production of edible oils and biodiesel fuel. Despite many canola varieties being described as ‘salt-tolerant’, plant yield and growth decline drastically with increasing salinity. Although many studies have resulted in better understanding of the many important salt-response mechanisms that control salt signaling in plants, detoxification of ions, and synthesis of protective metabolites, the engineering of salt-tolerant crops has only progressed slowly. Genetic engineering has been considered as an efficient method for improving the salt tolerance of canola but there are many unknown or little-known aspects regarding canola response to salinity stress at the cellular and molecular level. In order to develop highly salt-tolerant canola, it is essential to improve knowledge of the salt-tolerance mechanisms, especially the key components of the plant salt-response network. In this review, we focus on studies of the molecular response of canola to salinity to unravel the different pieces of the salt response puzzle. The paper includes a comprehensive review of the latest studies, particularly of proteomic and transcriptomic analysis, including the most recently identified canola tolerance components under salt stress, and suggests what researchers should focus on in future studies.
Collapse
Affiliation(s)
- Reza Shokri-Gharelo
- Department of Plant Breeding and Biotechnology, University of Tabriz, Tabriz, Iran
| | - Pouya Motie Noparvar
- Department of Plant Breeding and Biotechnology, University of Tabriz, Tabriz, Iran.,Young Researchers and Elite Club, Islamic Azad University, Tabriz, Iran
| |
Collapse
|
17
|
Chen L, Zhang YH, Wang S, Zhang Y, Huang T, Cai YD. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways. PLoS One 2017; 12:e0184129. [PMID: 28873455 PMCID: PMC5584762 DOI: 10.1371/journal.pone.0184129] [Citation(s) in RCA: 191] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/18/2017] [Indexed: 12/20/2022] Open
Abstract
Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
- College of Information Engineering, Shanghai Maritime University, Shanghai, People’s Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
| | - YunHua Zhang
- Anhui province key lab of farmland ecological conversation and pollution prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
| |
Collapse
|
18
|
Suratanee A, Plaimas K. Reverse Nearest Neighbor Search on a Protein-Protein Interaction Network to Infer Protein-Disease Associations. Bioinform Biol Insights 2017; 11:1177932217720405. [PMID: 28757797 PMCID: PMC5513527 DOI: 10.1177/1177932217720405] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 06/18/2017] [Indexed: 12/17/2022] Open
Abstract
The associations between proteins and diseases are crucial information for investigating pathological mechanisms. However, the number of known and reliable protein-disease associations is quite small. In this study, an analysis framework to infer associations between proteins and diseases was developed based on a large data set of a human protein-protein interaction network integrating an effective network search, namely, the reverse k-nearest neighbor (RkNN) search. The RkNN search was used to identify an impact of a protein on other proteins. Then, associations between proteins and diseases were inferred statistically. The method using the RkNN search yielded a much higher precision than a random selection, standard nearest neighbor search, or when applying the method to a random protein-protein interaction network. All protein-disease pair candidates were verified by a literature search. Supporting evidence for 596 pairs was identified. In addition, cluster analysis of these candidates revealed 10 promising groups of diseases to be further investigated experimentally. This method can be used to identify novel associations to better understand complex relationships between proteins and diseases.
Collapse
Affiliation(s)
- Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut's University of Technology North Bangkok, Bangkok, Thailand
| | - Kitiporn Plaimas
- Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
19
|
|
20
|
Zhang X, Xiao W, Acencio ML, Lemke N, Wang X. An ensemble framework for identifying essential proteins. BMC Bioinformatics 2016; 17:322. [PMID: 27557880 PMCID: PMC4997703 DOI: 10.1186/s12859-016-1166-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 08/09/2016] [Indexed: 11/10/2022] Open
Abstract
Background Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. Results In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins. Conclusions This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1166-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xue Zhang
- Systems Biology Core, NHLBI, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, USA
| | - Wangxin Xiao
- Department of Computer Science, XiangNan University, Eastern Wangxian Park, Chenzhou, Hunan, 423000, China.
| | - Marcio Luis Acencio
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP-São Paulo State University, CEP 18618-970, Botucatu, São Paulo, 510, Brazil.,Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), P.B. 8905, N-7491, Trondheim, Norway
| | - Ney Lemke
- Department of Physics and Biophysics, Institute of Biosciences of Botucatu, UNESP-São Paulo State University, CEP 18618-970, Botucatu, São Paulo, 510, Brazil
| | - Xujing Wang
- Systems Biology Core, NHLBI, NIH, 9000 Rockville Pike, Bethesda, MD, 20892, USA.
| |
Collapse
|
21
|
Grazziotin AL, Vidal NM, Venancio TM. Uncovering major genomic features of essential genes in Bacteria and a methanogenic Archaea. FEBS J 2015; 282:3395-3411. [PMID: 26084810 DOI: 10.1111/febs.13350] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 06/02/2015] [Accepted: 06/15/2015] [Indexed: 12/19/2022]
Abstract
Identification of essential genes is critical to understanding the physiology of a species, proposing novel drug targets and uncovering minimal gene sets required for life. Although essential gene sets of several organisms have been determined using large-scale mutagenesis techniques, systematic studies addressing their conservation, genomic context and functions remain scant. Here we integrate 17 essential gene sets from genome-wide in vitro screenings and three gene collections required for growth in vivo, encompassing 15 Bacteria and one Archaea. We refine and generalize important theories proposed using Escherichia coli. Essential genes are typically monogenic and more conserved than nonessential genes. Genes required in vivo are less conserved than those essential in vitro, suggesting that more divergent strategies are deployed when the organism is stressed by the host immune system and unstable nutrient availability. We identified essential analogous pathways that would probably be missed by orthology-based essentiality prediction strategies. For example, Streptococcus sanguinis carries horizontally transferred isoprenoid biosynthesis genes that are widespread in Archaea. Genes specifically essential in Mycobacterium tuberculosis and Burkholderia pseudomallei are reported as potential drug targets. Moreover, essential genes are not only preferentially located in operons, but also occupy the first position therein, supporting the influence of their regulatory regions in driving transcription of whole operons. Finally, these important genomic features are shared between Bacteria and at least one Archaea, suggesting that high order properties of gene essentiality and genome architecture were probably present in the last universal common ancestor or evolved independently in the prokaryotic domains.
Collapse
Affiliation(s)
- Ana Laura Grazziotin
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Newton Medeiros Vidal
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Thiago Motta Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil
| |
Collapse
|
22
|
Musungu B, Bhatnagar D, Brown RL, Fakhoury AM, Geisler M. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize. Front Genet 2015; 6:201. [PMID: 26089837 PMCID: PMC4454876 DOI: 10.3389/fgene.2015.00201] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/21/2015] [Indexed: 12/30/2022] Open
Abstract
Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize.
Collapse
Affiliation(s)
- Bryan Musungu
- Department of Plant Biology, Southern Illinois University Carbondale, IL, USA
| | - Deepak Bhatnagar
- Food and Feed Safety Research, Southern Regional Research Center, United States Department of Agriculture, Agricultural Research Service New Orleans, LA, USA
| | - Robert L Brown
- Food and Feed Safety Research, Southern Regional Research Center, United States Department of Agriculture, Agricultural Research Service New Orleans, LA, USA
| | - Ahmad M Fakhoury
- Department of Plant Soil and Agriculture Systems, Southern Illinois University Carbondale, IL, USA
| | - Matt Geisler
- Department of Plant Biology, Southern Illinois University Carbondale, IL, USA
| |
Collapse
|
23
|
Srihari S, Yong CH, Patil A, Wong L. Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Lett 2015; 589:2590-602. [PMID: 25913176 DOI: 10.1016/j.febslet.2015.04.026] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 04/14/2015] [Accepted: 04/14/2015] [Indexed: 12/30/2022]
Abstract
Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organisation of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight their limitations and challenges, in particular at detecting sparse and small or sub-complexes and discerning overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.
Collapse
Affiliation(s)
- Sriganesh Srihari
- Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Queensland 4067, Australia.
| | - Chern Han Yong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| | - Ashwini Patil
- Human Genome Centre, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
24
|
Srihari S, Madhamshettiwar PB, Song S, Liu C, Simpson PT, Khanna KK, Ragan MA. Complex-based analysis of dysregulated cellular processes in cancer. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S1. [PMID: 25521701 PMCID: PMC4290683 DOI: 10.1186/1752-0509-8-s4-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Background Differential expression analysis of (individual) genes is often used to study their roles in diseases. However, diseases such as cancer are a result of the combined effect of multiple genes. Gene products such as proteins seldom act in isolation, but instead constitute stable multi-protein complexes performing dedicated functions. Therefore, complexes aggregate the effect of individual genes (proteins) and can be used to gain a better understanding of cancer mechanisms. Here, we observe that complexes show considerable changes in their expression, in turn directed by the concerted action of transcription factors (TFs), across cancer conditions. We seek to gain novel insights into cancer mechanisms through a systematic analysis of complexes and their transcriptional regulation. Results We integrated large-scale protein-interaction (PPI) and gene-expression datasets to identify complexes that exhibit significant changes in their expression across different conditions in cancer. We devised a log-linear model to relate these changes to the differential regulation of complexes by TFs. The application of our model on two case studies involving pancreatic and familial breast tumour conditions revealed: (i) complexes in core cellular processes, especially those responsible for maintaining genome stability and cell proliferation (e.g. DNA damage repair and cell cycle) show considerable changes in expression; (ii) these changes include decrease and countering increase for different sets of complexes indicative of compensatory mechanisms coming into play in tumours; and (iii) TFs work in cooperative and counteractive ways to regulate these mechanisms. Such aberrant complexes and their regulating TFs play vital roles in the initiation and progression of cancer. Conclusions Complexes in core cellular processes display considerable decreases and countering increases in expression, strongly reflective of compensatory mechanisms in cancer. These changes are directed by the concerted action of cooperative and counteractive TFs. Our study highlights the roles of these complexes and TFs and presents several case studies of compensatory processes, thus providing novel insights into cancer mechanisms.
Collapse
|
25
|
Suratanee A, Plaimas K. Identification of inflammatory bowel disease-related proteins using a reverse k-nearest neighbor search. J Bioinform Comput Biol 2014; 12:1450017. [DOI: 10.1142/s0219720014500176] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Inflammatory bowel disease (IBD) is a chronic disease whose incidence and prevalence increase every year; however, the pathogenesis of IBD is still unclear. Thus, identifying IBD-related proteins is important for understanding its complex disease mechanism. Here, we propose a new and simple network-based approach using a reverse k-nearest neighbor ( R k NN ) search to identify novel IBD-related proteins. Protein–protein interactions (PPI) and Genome-Wide Association Studies (GWAS) were used in this study. After constructing the PPI network, the R k NN search was applied to all of the proteins to identify sets of influenced proteins among their k-nearest neighbors ( R k NNs ). An observed protein whose influenced proteins were mostly known IBD-related proteins was statistically identified as a novel IBD-related protein. Our method outperformed a random aspect, k NN search, and centrality measures based on the network topology. A total of 39 proteins were identified as IBD-related proteins. Of these proteins, 71% were reported at least once in the literature as related to IBD. Additionally, these proteins were found over-represented in the IBD pathway and enriched in importantly functional pathways in IBD. In conclusion, the R k NN search with the statistical enrichment test is a great tool to identify IBD-related proteins to better understand its complex disease mechanism.
Collapse
Affiliation(s)
- Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut's University of Technology North Bangkok, 1518 Pracharat 1 Road, Wongsawang, Bangsue, Bangkok 10800, Thailand
| | - Kitiporn Plaimas
- Integrative Bioinformatics and Systems Biology Group, Advanced Virtual and Intelligent Computing Research Center (AVIC), Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Phyathai Road, Patumwan, Bangkok 10330, Thailand
| |
Collapse
|
26
|
Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods 2014; 67:325-33. [DOI: 10.1016/j.ymeth.2014.02.016] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/16/2014] [Accepted: 02/11/2014] [Indexed: 11/23/2022] Open
|
27
|
Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. TRENDS IN PLANT SCIENCE 2014; 19:212-21. [PMID: 24231067 DOI: 10.1016/j.tplants.2013.10.006] [Citation(s) in RCA: 146] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/10/2013] [Accepted: 10/16/2013] [Indexed: 05/19/2023]
Abstract
The great recent progress made in identifying the molecular parts lists of organisms revealed the paucity of our understanding of what most of the parts do. In this review, we introduce computational and statistical approaches and omics data used for inferring gene function in plants, with an emphasis on network-based inference. We also discuss caveats associated with network-based function predictions such as performance assessment, annotation propagation, the guilt-by-association concept, and the meaning of hubs. Finally, we note the current limitations and possible future directions such as the need for gold standard data from several species, unified access to data and tools, quantitative comparison of data and tool quality, and high-throughput experimental validation platforms for systematic gene function elucidation in plants.
Collapse
Affiliation(s)
- Seung Yon Rhee
- Carnegie Institution for Science, Department of Plant Biology, 260 Panama St, Stanford, CA 94305, USA.
| | - Marek Mutwil
- Max Planck Institute for Molecular Plant Physiology, 14476 Potsdam, Germany.
| |
Collapse
|
28
|
Tang X, Wang J, Zhong J, Pan Y. Predicting Essential Proteins Based on Weighted Degree Centrality. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:407-18. [PMID: 26355787 DOI: 10.1109/tcbb.2013.2295318] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Essential proteins are vital for an organism's viability under a variety of conditions. There are many experimental and computational methods developed to identify essential proteins. Computational prediction of essential proteins based on the global protein-protein interaction (PPI) network is severely restricted because of the insufficiency of the PPI data, but fortunately the gene expression profiles help to make up the deficiency. In this work, Pearson correlation coefficient (PCC) is used to bridge the gap between PPI and gene expression data. Based on PCC and edge clustering coefficient (ECC), a new centrality measure, i.e., the weighted degree centrality (WDC), is developed to achieve the reliable prediction of essential proteins. WDC is employed to identify essential proteins in the yeast PPI and e-Coli networks in order to estimate its performance. For comparison, other prediction technologies are also performed to identify essential proteins. Some evaluation methods are used to analyze the results from various prediction approaches. The prediction results and comparative analyses are shown in the paper. Furthermore, the parameter λ in the method WDC will be analyzed in detail and an optimal λ value will be found. Based on the optimal λ value, the differentiation of WDC and another prediction method PeC is discussed. The analyses prove that WDC outperforms other methods including DC, BC, CC, SC, EC, IC, NC, and PeC. At the same time, the analyses also mean that it is an effective way to predict essential proteins by means of integrating different data sources.
Collapse
|
29
|
Srihari S, Raman V, Leong HW, Ragan MA. Evolution and Controllability of Cancer Networks: A Boolean Perspective. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:83-94. [PMID: 26355510 DOI: 10.1109/tcbb.2013.128] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Cancer forms a robust system capable of maintaining stable functioning (cell sustenance and proliferation) despite perturbations. Cancer progresses as stages over time typically with increasing aggressiveness and worsening prognosis. Characterizing these stages and identifying the genes driving transitions between them is critical to understand cancer progression and to develop effective anti-cancer therapies. In this work, we propose a novel model for the `cancer system' as a Boolean state space in which a Boolean network, built from protein-interaction and gene-expression data from different stages of cancer, transits between Boolean satisfiability states by "editing" interactions and "flipping" genes. Edits reflect rewiring of the PPI network while flipping of genes reflect activation or silencing of genes between stages. We formulate a minimization problem min flip to identify these genes driving the transitions. The application of our model (called BoolSpace) on three case studies-pancreatic and breast tumours in human and post spinal-cord injury (SCI) in rats-reveals valuable insights into the phenomenon of cancer progression: (i) interactions involved in core cell-cycle and DNA-damage repair pathways are significantly rewired in tumours, indicating significant impact to key genome-stabilizing mechanisms; (ii) several of the genes flipped are serine/threonine kinases which act as biological switches, reflecting cellular switching mechanisms between stages; and (iii) different sets of genes are flipped during the initial and final stages indicating a pattern to tumour progression. Based on these results, we hypothesize that robustness of cancer partly stems from "passing of the baton" between genes at different stages-genes from different biological processes and/or cellular components are involved in different stages of tumour progression thereby allowing tumour cells to evade targeted therapy, and therefore an effective therapy should target a "cover set" of these genes. A C/C++ implementation of BoolSpace is freely available at: http://www.bioinformatics.org.au/tools-data.
Collapse
|
30
|
Raman K, Damaraju N, Joshi GK. The organisational structure of protein networks: revisiting the centrality-lethality hypothesis. SYSTEMS AND SYNTHETIC BIOLOGY 2013; 8:73-81. [PMID: 24592293 DOI: 10.1007/s11693-013-9123-5] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 08/05/2013] [Accepted: 08/12/2013] [Indexed: 01/09/2023]
Abstract
Protein networks, describing physical interactions as well as functional associations between proteins, have been unravelled for many organisms in the recent past. Databases such as the STRING provide excellent resources for the analysis of such networks. In this contribution, we revisit the organisation of protein networks, particularly the centrality-lethality hypothesis, which hypothesises that nodes with higher centrality in a network are more likely to produce lethal phenotypes on removal, compared to nodes with lower centrality. We consider the protein networks of a diverse set of 20 organisms, with essentiality information available in the Database of Essential Genes and assess the relationship between centrality measures and lethality. For each of these organisms, we obtained networks of high-confidence interactions from the STRING database, and computed network parameters such as degree, betweenness centrality, closeness centrality and pairwise disconnectivity indices. We observe that the networks considered here are predominantly disassortative. Further, we observe that essential nodes in a network have a significantly higher average degree and betweenness centrality, compared to the network average. Most previous studies have evaluated the centrality-lethality hypothesis for Saccharomyces cerevisiae and Escherichia coli; we here observe that the centrality-lethality hypothesis hold goods for a large number of organisms, with certain limitations. Betweenness centrality may also be a useful measure to identify essential nodes, but measures like closeness centrality and pairwise disconnectivity are not significantly higher for essential nodes.
Collapse
Affiliation(s)
- Karthik Raman
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036 India
| | - Nandita Damaraju
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036 India
| | - Govind Krishna Joshi
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036 India
| |
Collapse
|
31
|
Zhang X, Xu J, Xiao WX. A new method for the discovery of essential proteins. PLoS One 2013; 8:e58763. [PMID: 23555595 PMCID: PMC3605424 DOI: 10.1371/journal.pone.0058763] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2012] [Accepted: 02/06/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Experimental methods for the identification of essential proteins are always costly, time-consuming, and laborious. It is a challenging task to find protein essentiality only through experiments. With the development of high throughput technologies, a vast amount of protein-protein interactions are available, which enable the identification of essential proteins from the network level. Many computational methods for such task have been proposed based on the topological properties of protein-protein interaction (PPI) networks. However, the currently available PPI networks for each species are not complete, i.e. false negatives, and very noisy, i.e. high false positives, network topology-based centrality measures are often very sensitive to such noise. Therefore, exploring robust methods for identifying essential proteins would be of great value. METHOD In this paper, a new essential protein discovery method, named CoEWC (Co-Expression Weighted by Clustering coefficient), has been proposed. CoEWC is based on the integration of the topological properties of PPI network and the co-expression of interacting proteins. The aim of CoEWC is to capture the common features of essential proteins in both date hubs and party hubs. The performance of CoEWC is validated based on the PPI network of Saccharomyces cerevisiae. Experimental results show that CoEWC significantly outperforms the classical centrality measures, and that it also outperforms PeC, a newly proposed essential protein discovery method which outperforms 15 other centrality measures on the PPI network of Saccharomyces cerevisiae. Especially, when predicting no more than 500 proteins, even more than 50% improvements are obtained by CoEWC over degree centrality (DC), a better centrality measure for identifying protein essentiality. CONCLUSIONS We demonstrate that more robust essential protein discovery method can be developed by integrating the topological properties of PPI network and the co-expression of interacting proteins. The proposed centrality measure, CoEWC, is effective for the discovery of essential proteins.
Collapse
Affiliation(s)
- Xue Zhang
- Key Laboratory of High Confidence Software Technologies, Ministry of Education, Peking University, Beijing, China.
| | | | | |
Collapse
|
32
|
Wang J, Peng W, Wu FX. Computational approaches to predicting essential proteins: A survey. Proteomics Clin Appl 2013; 7:181-92. [DOI: 10.1002/prca.201200068] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Revised: 09/12/2012] [Accepted: 11/06/2012] [Indexed: 12/13/2022]
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering; Central South University; Changsha; China
| | - Wei Peng
- School of Information Science and Engineering; Central South University; Changsha; China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering; University of Saskatchewan; Saskatoon; SK; Canada
| |
Collapse
|
33
|
Schrum AG, Gil D. Robustness and Specificity in Signal Transduction via Physiologic Protein Interaction Networks. CLINICAL & EXPERIMENTAL PHARMACOLOGY 2012; 2:S3.001. [PMID: 24535485 PMCID: PMC3923534 DOI: 10.4172/2161-1459.s3-001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
The collective Protein:Protein Interactions (PPI) of a cell are thought to represent a system with emergent network properties that integrate signals from a multiplicity of inputs into coordinated responses. It is hypothesized that the PPI network supplies both specificity for many distinct signals that utilize common intermediate pathways, and also robustness by allowing specific signals to be communicated by alternate routes. Progress with genetic networks points to these concepts, but the extent to which PPI networks possess these properties has not been empirically tested, due to lack of quantitative data needed for such assessments. Here, a hypothetical physiologic PPI network is used to illustrate how signaling robustness and specificity could be manifest under conditions of (i) deletion mutation, or (ii) changes in signaling due to variation in environmental conditions or stimuli. It is proposed that advances in technology enabling empirical analysis of PPI network principles will have the potential to significantly impact basic understanding of signaling mechanisms, and contribute to the generation of novel applications in drug screening and pharmacology.
Collapse
Affiliation(s)
- Adam G. Schrum
- Department of Immunology, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
| | - Diana Gil
- Department of Immunology, Mayo Clinic College of Medicine, Rochester, Minnesota, USA
| |
Collapse
|
34
|
Wang J, Li M, Wang H, Pan Y. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1070-1080. [PMID: 22084147 DOI: 10.1109/tcbb.2011.147] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Identification of essential proteins is key to understanding the minimal requirements for cellular life and important for drug design. The rapid increase of available protein-protein interaction (PPI) data has made it possible to detect protein essentiality on network level. A series of centrality measures have been proposed to discover essential proteins based on network topology. However, most of them tended to focus only on the location of single protein, but ignored the relevance between interactions and protein essentiality. In this paper, a new centrality measure for identifying essential proteins based on edge clustering coefficient, named as NC, is proposed. Different from previous centrality measures, NC considers both the centrality of a node and the relationship between it and its neighbors. For each interaction in the network, we calculate its edge clustering coefficient. A node’s essentiality is determined by the sum of the edge clustering coefficients of interactions connecting it and its neighbors. The new centrality measure NC takes into account the modular nature of protein essentiality. NC is applied to three different types of yeast protein-protein interaction networks, which are obtained from the DIP database, the MIPS database and the BioGRID database, respectively. The experimental results on the three different networks show that the number of essential proteins discovered by NC universally exceeds that discovered by the six other centrality measures: DC, BC, CC, SC, EC, and IC. Moreover, the essential proteins discovered by NC show significant cluster effect.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering, Central South University, Computer Building, Changsha 410083, China.
| | | | | | | |
Collapse
|
35
|
Li M, Zhang H, Wang JX, Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC SYSTEMS BIOLOGY 2012; 6:15. [PMID: 22405054 PMCID: PMC3325894 DOI: 10.1186/1752-0509-6-15] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 03/10/2012] [Indexed: 01/09/2023]
Abstract
BACKGROUND Identification of essential proteins is always a challenging task since it requires experimental approaches that are time-consuming and laborious. With the advances in high throughput technologies, a large number of protein-protein interactions are available, which have produced unprecedented opportunities for detecting proteins' essentialities from the network level. There have been a series of computational approaches proposed for predicting essential proteins based on network topologies. However, the network topology-based centrality measures are very sensitive to the robustness of network. Therefore, a new robust essential protein discovery method would be of great value. RESULTS In this paper, we propose a new centrality measure, named PeC, based on the integration of protein-protein interaction and gene expression data. The performance of PeC is validated based on the protein-protein interaction network of Saccharomyces cerevisiae. The experimental results show that the predicted precision of PeC clearly exceeds that of the other fifteen previously proposed centrality measures: Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), Bottle Neck (BN), Density of Maximum Neighborhood Component (DMNC), Local Average Connectivity-based method (LAC), Sum of ECC (SoECC), Range-Limited Centrality (RL), L-index (LI), Leader Rank (LR), Normalized α-Centrality (NC), and Moduland-Centrality (MC). Especially, the improvement of PeC over the classic centrality measures (BC, CC, SC, EC, and BN) is more than 50% when predicting no more than 500 proteins. CONCLUSIONS We demonstrate that the integration of protein-protein interaction network and gene expression data can help improve the precision of predicting essential proteins. The new centrality measure, PeC, is an effective essential protein discovery method.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha, Hunan, P R China.
| | | | | | | |
Collapse
|
36
|
Srihari S, Ning K, Leong HW. MCL-CAw: a refinement of MCL for detecting yeast complexes from weighted PPI networks by incorporating core-attachment structure. BMC Bioinformatics 2010; 11:504. [PMID: 20939868 PMCID: PMC2965181 DOI: 10.1186/1471-2105-11-504] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2010] [Accepted: 10/12/2010] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. RESULTS Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. CONCLUSIONS We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw.
Collapse
Affiliation(s)
- Sriganesh Srihari
- Department of Computer Science, National University of Singapore, 117590, Singapore
| | - Kang Ning
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
- Qingdao Institute of Bioenergy and Bioprocess Technology, Qingdao 266101, China
| | - Hon Wai Leong
- Department of Computer Science, National University of Singapore, 117590, Singapore
| |
Collapse
|