1
|
Zhu X, He X, Kuang L, Chen Z, Lancine C. A Novel Collaborative Filtering Model-Based Method for Identifying Essential Proteins. Front Genet 2021; 12:763153. [PMID: 34745230 PMCID: PMC8566338 DOI: 10.3389/fgene.2021.763153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 09/13/2021] [Indexed: 11/19/2022] Open
Abstract
Considering that traditional biological experiments are expensive and time consuming, it is important to develop effective computational models to infer potential essential proteins. In this manuscript, a novel collaborative filtering model-based method called CFMM was proposed, in which, an updated protein–domain interaction (PDI) network was constructed first by applying collaborative filtering algorithm on the original PDI network, and then, through integrating topological features of PDI networks with biological features of proteins, a calculative method was designed to infer potential essential proteins based on an improved PageRank algorithm. The novelties of CFMM lie in construction of an updated PDI network, application of the commodity-customer-based collaborative filtering algorithm, and introduction of the calculation method based on an improved PageRank algorithm, which ensured that CFMM can be applied to predict essential proteins without relying entirely on known protein–domain associations. Simulation results showed that CFMM can achieve reliable prediction accuracies of 92.16, 83.14, 71.37, 63.87, 55.84, and 52.43% in the top 1, 5, 10, 15, 20, and 25% predicted candidate key proteins based on the DIP database, which are remarkably higher than 14 competitive state-of-the-art predictive models as a whole, and in addition, CFMM can achieve satisfactory predictive performances based on different databases with various evaluation measurements, which further indicated that CFMM may be a useful tool for the identification of essential proteins in the future.
Collapse
Affiliation(s)
- Xianyou Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang, China.,Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang, China
| | - Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Camara Lancine
- The Social Sciences and Management University of Bamako, Bamako, Mali
| |
Collapse
|
2
|
He X, Kuang L, Chen Z, Tan Y, Wang L. Method for Identifying Essential Proteins by Key Features of Proteins in a Novel Protein-Domain Network. Front Genet 2021; 12:708162. [PMID: 34267785 PMCID: PMC8276041 DOI: 10.3389/fgene.2021.708162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 05/31/2021] [Indexed: 11/21/2022] Open
Abstract
In recent years, due to low accuracy and high costs of traditional biological experiments, more and more computational models have been proposed successively to infer potential essential proteins. In this paper, a novel prediction method called KFPM is proposed, in which, a novel protein-domain heterogeneous network is established first by combining known protein-protein interactions with known associations between proteins and domains. Next, based on key topological characteristics extracted from the newly constructed protein-domain network and functional characteristics extracted from multiple biological information of proteins, a new computational method is designed to effectively integrate multiple biological features to infer potential essential proteins based on an improved PageRank algorithm. Finally, in order to evaluate the performance of KFPM, we compared it with 13 state-of-the-art prediction methods, experimental results show that, among the top 1, 5, and 10% of candidate proteins predicted by KFPM, the prediction accuracy can achieve 96.08, 83.14, and 70.59%, respectively, which significantly outperform all these 13 competitive methods. It means that KFPM may be a meaningful tool for prediction of potential essential proteins in the future.
Collapse
Affiliation(s)
- Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
3
|
Wang F, Han S, Yang J, Yan W, Hu G. Knowledge-Guided "Community Network" Analysis Reveals the Functional Modules and Candidate Targets in Non-Small-Cell Lung Cancer. Cells 2021; 10:cells10020402. [PMID: 33669233 PMCID: PMC7919838 DOI: 10.3390/cells10020402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 02/06/2021] [Accepted: 02/15/2021] [Indexed: 12/24/2022] Open
Abstract
Non-small-cell lung cancer (NSCLC) represents a heterogeneous group of malignancies that are the leading cause of cancer-related death worldwide. Although many NSCLC-related genes and pathways have been identified, there remains an urgent need to mechanistically understand how these genes and pathways drive NSCLC. Here, we propose a knowledge-guided and network-based integration method, called the node and edge Prioritization-based Community Analysis, to identify functional modules and their candidate targets in NSCLC. The protein–protein interaction network was prioritized by performing a random walk with restart algorithm based on NSCLC seed genes and the integrating edge weights, and then a “community network” was constructed by combining Girvan–Newman and Label Propagation algorithms. This systems biology analysis revealed that the CCNB1-mediated network in the largest community provides a modular biomarker, the second community serves as a drug regulatory module, and the two are connected by some contextual signaling motifs. Moreover, integrating structural information into the signaling network suggested novel protein–protein interactions with therapeutic significance, such as interactions between GNG11 and CXCR2, CXCL3, and PPBP. This study provides new mechanistic insights into the landscape of cellular functions in the context of modular networks and will help in developing therapeutic targets for NSCLC.
Collapse
Affiliation(s)
- Fan Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Shuqing Han
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Ji Yang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
| | - Wenying Yan
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
- Correspondence: (W.Y.); (G.H.)
| | - Guang Hu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; (F.W.); (S.H.); (J.Y.)
- State Key Laboratory of Radiation Medicine and Protection, Soochow University, Suzhou 215123, China
- Correspondence: (W.Y.); (G.H.)
| |
Collapse
|
4
|
Samarina LS, Bobrovskikh AV, Doroshkov AV, Malyukova LS, Matskiv AO, Rakhmangulov RS, Koninskaya NG, Malyarovskaya VI, Tong W, Xia E, Manakhova KA, Ryndin AV, Orlov YL. Comparative Expression Analysis of Stress-Inducible Candidate Genes in Response to Cold and Drought in Tea Plant [ Camellia sinensis (L.) Kuntze]. Front Genet 2020; 11:611283. [PMID: 33424935 PMCID: PMC7786056 DOI: 10.3389/fgene.2020.611283] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 11/23/2020] [Indexed: 12/15/2022] Open
Abstract
Cold and drought are two of the most severe threats affecting the growth and productivity of the tea plant, limiting its global spread. Both stresses cause osmotic changes in the cells of the tea plant by decreasing their water potential. To develop cultivars that are tolerant to both stresses, it is essential to understand the genetic responses of tea plant to these two stresses, particularly in terms of the genes involved. In this study, we combined literature data with interspecific transcriptomic analyses (using Arabidopsis thaliana and Solanum lycopersicum) to choose genes related to cold tolerance. We identified 45 stress-inducible candidate genes associated with cold and drought responses in tea plants based on a comprehensive homologous detection method. Of these, nine were newly characterized by us, and 36 had previously been reported. The gene network analysis revealed upregulated expression in ICE1-related cluster of bHLH factors, HSP70/BAM5 connected genes (hexokinases, galactinol synthases, SnRK complex, etc.) indicating their possible co-expression. Using qRT-PCR we revealed that 10 genes were significantly upregulated in response to both cold and drought in tea plant: HSP70, GST, SUS1, DHN1, BMY5, bHLH102, GR-RBP3, ICE1, GOLS1, and GOLS3. SnRK1.2, HXK1/2, bHLH7/43/79/93 were specifically upregulated in cold, while RHL41, CAU1, Hydrolase22 were specifically upregulated in drought. Interestingly, the expression of CIP was higher in the recovery stage of both stresses, indicating its potentially important role in plant recovery after stress. In addition, some genes, such as DHN3, bHLH79, PEI54, SnRK1.2, SnRK1.3, and Hydrolase22, were significantly positively correlated between the cold and drought responses. CBF1, GOLS1, HXK2, and HXK3, by contrast, showed significantly negative correlations between the cold and drought responses. Our results provide valuable information and robust candidate genes for future functional analyses intended to improve the stress tolerance of the tea plant and other species.
Collapse
Affiliation(s)
- Lidiia S Samarina
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Alexandr V Bobrovskikh
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia.,Institute Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Doroshkov
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia.,Institute Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Lyudmila S Malyukova
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Alexandra O Matskiv
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Ruslan S Rakhmangulov
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Natalia G Koninskaya
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Valentina I Malyarovskaya
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Wei Tong
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Enhua Xia
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, China
| | - Karina A Manakhova
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Alexey V Ryndin
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia
| | - Yuriy L Orlov
- Biotechnology Department, Federal Research Centre the Subtropical Scientific Centre of the Russian Academy of Sciences, Sochi, Russia.,Agrarian and Technological Institute, Peoples' Friendship University of Russia (RUDN University), Moscow, Russia
| |
Collapse
|
5
|
A novel scheme for essential protein discovery based on multi-source biological information. J Theor Biol 2020; 504:110414. [PMID: 32712150 DOI: 10.1016/j.jtbi.2020.110414] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 02/14/2020] [Accepted: 07/15/2020] [Indexed: 02/06/2023]
Abstract
Mining essential protein is crucial for discovering the process of cellular organization and viability. At present, there are many computational methods for essential proteins detecting. However, these existing methods only focus on the topological information of the networks and ignore the biological information of proteins, which lead to low accuracy of essential protein identification. Therefore, this paper presents a new essential proteins prediction strategy, called DEP-MSB which integrates a variety of biological information including gene expression profiles, GO annotations, and Domain interaction strength. In order to evaluate the performance of DEP-MSB, we conduct a series of experiments on the yeast PPI network and the experimental results have shown that the proposed algorithm DEP-MSB is more superior to the other existing traditional methods and has obviously improvement in prediction accuracy.
Collapse
|
6
|
Roy U. Insight into the structures of Interleukin-18 systems. Comput Biol Chem 2020; 88:107353. [PMID: 32769049 PMCID: PMC7392904 DOI: 10.1016/j.compbiolchem.2020.107353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 07/01/2020] [Accepted: 07/28/2020] [Indexed: 02/08/2023]
Abstract
Structure-based molecular designs play a critical role in the context of next generation drug development. Besides their fundamental scientific aspects, the findings established in this approach have significant implications in the expansions of target-based therapies and vaccines. Interleukin-18 (IL-18), also known as interferon gamma (IFN-γ) inducing factor, is a pro-inflammatory cytokine. The IL-18 binds first to the IL-18α receptor and forms a lower affinity complex. Upon binding with IL-18β a hetero-trimeric complex with higher affinity is formed that initiates the signal transduction process. The present study, including structural and molecular dynamics simulations, takes a close look at the structural stabilities of IL-18 and IL-18 receptor-bound ligand structures as functions of time. The results help to identify the conformational changes of the ligand due to receptor binding, as well as the structural orders of the apo and holo IL-18 protein complexes.
Collapse
Affiliation(s)
- Urmi Roy
- Department of Chemistry & Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5820, United States.
| |
Collapse
|
7
|
Jiang Y, Liang Y, Wang D, Xu D, Joshi T. A dynamic programing approach to integrate gene expression data and network information for pathway model generation. Bioinformatics 2020; 36:169-176. [PMID: 31168616 DOI: 10.1093/bioinformatics/btz467] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 05/15/2019] [Accepted: 05/31/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. RESULTS To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. AVAILABILITY AND IMPLEMENTATION IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Yanchun Liang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Duolin Wang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA.,Department of Health Management and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
8
|
Abstract
Background:
Essential proteins play important roles in the survival or reproduction of
an organism and support the stability of the system. Essential proteins are the minimum set of
proteins absolutely required to maintain a living cell. The identification of essential proteins is a
very important topic not only for a better comprehension of the minimal requirements for cellular
life, but also for a more efficient discovery of the human disease genes and drug targets.
Traditionally, as the experimental identification of essential proteins is complex, it usually requires
great time and expense. With the cumulation of high-throughput experimental data, many
computational methods that make useful complements to experimental methods have been
proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify
essential proteins is of great significance for discovering disease genes and drug design, and has
great potential for applications in basic and synthetic biology research.
Objective:
The aim of this paper is to provide a review on the identification of essential proteins
and genes focusing on the current developments of different types of computational methods, point
out some progress and limitations of existing methods, and the challenges and directions for
further research are discussed.
Collapse
Affiliation(s)
- Ming Fang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Ling Guo
- College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China
| |
Collapse
|
9
|
Xu B, Guan J, Wang Y, Wang Z. Essential Protein Detection by Random Walk on Weighted Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:377-387. [PMID: 28504946 DOI: 10.1109/tcbb.2017.2701824] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Essential proteins are critical to the development and survival of cells. Identification of essential proteins is helpful for understanding the minimal set of required genes in a living cell and for designing new drugs. To detect essential proteins, various computational methods have been proposed based on protein-protein interaction (PPI) networks. However, protein interaction data obtained by high-throughput experiments usually contain high false positives, which negatively impacts the accuracy of essential protein detection. Moreover, most existing studies focused on the local information of proteins in PPI networks, while ignoring the influence of indirect protein interactions on essentiality. In this paper, we propose a novel method, called Essentiality Ranking (EssRank in short), to boost the accuracy of essential protein detection. To deal with the inaccuracy of PPI data, confidence scores of interactions are evaluated by integrating various biological information. Weighted edge clustering coefficient (WECC), considering both interaction confidence scores and network topology, is proposed to calculate edge weights in PPI networks. The weight of each node is evaluated by the sum of WECC values of its linking edges. A random walk method, making use of both direct and indirect protein interactions, is then employed to calculate protein essentiality iteratively. Experimental results on the yeast PPI network show that EssRank outperforms most existing methods, including the most commonly-used centrality measures (SC, DC, BC, CC, IC, and EC), topology based methods (DMNC and NC) and the data integrating method IEW.
Collapse
|
10
|
Liu X, Hong Z, Liu J, Lin Y, Rodríguez-Patón A, Zou Q, Zeng X. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform 2019; 21:486-497. [DOI: 10.1093/bib/bbz011] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 12/03/2018] [Accepted: 01/11/2019] [Indexed: 12/28/2022] Open
Abstract
Abstract
A biological network is complex. A group of critical nodes determines the quality and state of such a network. Increasing studies have shown that diseases and biological networks are closely and mutually related and that certain diseases are often caused by errors occurring in certain nodes in biological networks. Thus, studying biological networks and identifying critical nodes can help determine the key targets in treating diseases. The problem is how to find the critical nodes in a network efficiently and with low cost. Existing experimental methods in identifying critical nodes generally require much time, manpower and money. Accordingly, many scientists are attempting to solve this problem by researching efficient and low-cost computing methods. To facilitate calculations, biological networks are often modeled as several common networks. In this review, we classify biological networks according to the network types used by several kinds of common computational methods and introduce the computational methods used by each type of network.
Collapse
Affiliation(s)
- Xiangrong Liu
- Department of Computer Science, Xiamen University, China
| | - Zengyan Hong
- Department of Computer Science, Xiamen University, China
| | - Juan Liu
- Department of Computer Science, Xiamen University, China
| | - Yuan Lin
- ITOP Section, DNB Bank ASA, Solheimsgaten, Bergen, Norway
| | - Alfonso Rodríguez-Patón
- Universidad Politécnica de Madrid (UPM) Campus Montegancedo s/n, Boadilla del Monte, Madrid, Spain
| | - Quan Zou
- Department of Computer Science, Xiamen University, China
- Insitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | | |
Collapse
|
11
|
|
12
|
Zhang W, Xu J, Li X, Zou X. A New Method for Identifying Essential Proteins by Measuring Co-Expression and Functional Similarity. IEEE Trans Nanobioscience 2016; 15:939-945. [DOI: 10.1109/tnb.2016.2625460] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Affiliation(s)
- Sun Kim
- Seoul National University, Seoul, South Korea
| |
Collapse
|