1
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
2
|
Li G, Luo X, Hu Z, Wu J, Peng W, Liu J, Zhu X. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 2023; 11:55. [PMID: 37981988 PMCID: PMC10654316 DOI: 10.1007/s13755-023-00252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
3
|
Hosseinzadeh MM, Cannataro M, Guzzi PH, Dondi R. Temporal networks in biology and medicine: a survey on models, algorithms, and tools. NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS 2022; 12:10. [PMID: 36618274 PMCID: PMC9803903 DOI: 10.1007/s13721-022-00406-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/16/2022] [Accepted: 12/17/2022] [Indexed: 01/01/2023]
Abstract
The use of static graphs for modelling and analysis of biological and biomedical data plays a key role in biomedical research. However, many real-world scenarios present dynamic behaviours resulting in both node and edges modification as well as feature evolution. Consequently, ad-hoc models for capturing these evolutions along the time have been introduced, also referred to as dynamic, temporal, time-varying graphs. Here, we focus on temporal graphs, i.e., graphs whose evolution is represented by a sequence of time-ordered snapshots. Each snapshot represents a graph active in a particular timestamp. We survey temporal graph models and related algorithms, presenting fundamentals aspects and the recent advances. We formally define temporal graphs, focusing on the problem setting and we present their main applications in biology and medicine. We also present temporal graph embedding and the application to recent problems such as epidemic modelling. Finally, we further state some promising research directions in the area. Main results of this study include a systematic review of fundamental temporal network problems and their algorithmic solutions considered in the literature, in particular those having application in computational biology and medicine. We also include the main software developed in this context.
Collapse
Affiliation(s)
| | - Mario Cannataro
- Department of Surgical and Medical Sciences and Data Analytics Research Center, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences and Data Analytics Research Center, University Magna Graecia of Catanzaro, Catanzaro, Italy
| | - Riccardo Dondi
- Department of Literature, Philosophy, Communication Studies, University of Bergamo, Bergamo, Italy
| |
Collapse
|
4
|
Mansoor M, Nauman M, Rehman HU, Omar M. Gene Ontology Capsule GAN: an improved architecture for protein function prediction. PeerJ Comput Sci 2022; 8:e1014. [PMID: 36092003 PMCID: PMC9454774 DOI: 10.7717/peerj-cs.1014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the core of all functions pertaining to living things. They consist of an extended amino acid chain folding into a three-dimensional shape that dictates their behavior. Currently, convolutional neural networks (CNNs) have been pivotal in predicting protein functions based on protein sequences. While it is a technology crucial to the niche, the computation cost and translational invariance associated with CNN make it impossible to detect spatial hierarchies between complex and simpler objects. Therefore, this research utilizes capsule networks to capture spatial information as opposed to CNNs. Since capsule networks focus on hierarchical links, they have a lot of potential for solving structural biology challenges. In comparison to the standard CNNs, our results exhibit an improvement in accuracy. Gene Ontology Capsule GAN (GOCAPGAN) achieved an F1 score of 82.6%, a precision score of 90.4% and recall score of 76.1%.
Collapse
Affiliation(s)
- Musadaq Mansoor
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Mohammad Nauman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Hafeez Ur Rehman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Maryam Omar
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| |
Collapse
|
5
|
Sheng J, Xue J, Li P, Yi N. [A protein complex recognition method based on spatial-temporal graph convolution neural network]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2022; 42:1075-1081. [PMID: 35869773 DOI: 10.12122/j.issn.1673-4254.2022.07.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To propose a new method for mining complexes in dynamic protein network using spatiotemporal convolution neural network. METHODS The edge strength, node strength and edge existence probability are defined for modeling of the dynamic protein network. Based on the time series information and structure information on the graph, two convolution operators were designed using Hilbert-Huang transform, attention mechanism and residual connection technology to represent and learn the characteristics of the proteins in the network, and the dynamic protein network characteristic map was constructed. Finally, spectral clustering was used to identify the protein complexes. RESULTS The simulation results on several public biological datasets showed that the F value of the proposed algorithm exceeded 90% on DIP dataset and MIPS dataset. Compared with 4 other recognition algorithms (DPCMNE, GE-CFI, VGAE and NOCD), the proposed algorithm improved the recognition efficiency by 34.5%, 28.7%, 25.4% and 17.6%, respectively. CONCLUSION The application of deep learning technology can improve the efficiency in analysis of dynamic protein networks.
Collapse
Affiliation(s)
- J Sheng
- Clinical nursing teaching and Research Office, The Second Xiangya Hospital of Central South University, Changsha 410011, China.,Department of ultrasound diagnosis, The Second Xiangya Hospital of Central South University, Changsha 410011, China
| | - J Xue
- Operation center, The Third Xiangya Hospital of Central South University, Changsha 410013, China
| | - P Li
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China
| | - N Yi
- School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China
| |
Collapse
|
6
|
Li X, Xiang J, Wu FX, Li M. A Dual Ranking Algorithm Based on the Multiplex Network for Heterogeneous Complex Disease Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1993-2002. [PMID: 33577455 DOI: 10.1109/tcbb.2021.3059046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Identifying biomarkers of heterogeneous complex diseases has always been one of the focuses in medical research. In previous studies, the powerful network propagation methods have been applied to finding marker genes related to specific diseases, but existing methods are mostly based on a single network, which may be greatly affected by the incompleteness of the network and the ignorance of a large amount of information about physical and functional interactions between biological components. Other methods that directly integrate multiple types of interactions into an aggregate network have the risks that different types of data may conflict with each other and the characteristics and topologies of each individual network are lost. Meanwhile, biomarkers used in clinical trials should have the characteristics of small quantity and strong discriminate ability. In this study, we developed a multiplex network-based dual ranking framework (DualRank) for heterogeneous complex disease analysis. We applied the proposed method to heterogeneous complex diseases for diagnosis, prognosis, and classification. The results showed that DualRank outperformed competing methods and could identify biomarkers with the small quantity, great prediction performance (average AUC = 0.818) and biological interpretability.
Collapse
|
7
|
Meng X, Xiang J, Zheng R, Wu FX, Li M. DPCMNE: Detecting Protein Complexes From Protein-Protein Interaction Networks Via Multi-Level Network Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1592-1602. [PMID: 33417563 DOI: 10.1109/tcbb.2021.3050102] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological functions of a cell are typically carried out through protein complexes. The detection of protein complexes is therefore of great significance for understanding the cellular organizations and protein functions. In the past decades, many computational methods have been proposed to detect protein complexes. However, most of the existing methods just search the local topological information to mine dense subgraphs as protein complexes, ignoring the global topological information. To tackle this issue, we propose the DPCMNE method to detect protein complexes via multi-level network embedding. It can preserve both the local and global topological information of biological networks. First, DPCMNE employs a hierarchical compressing strategy to recursively compress the input protein-protein interaction (PPI) network into multi-level smaller PPI networks. Then, a network embedding method is applied on these smaller PPI networks to learn protein embeddings of different levels of granularity. The embeddings learned from all the compressed PPI networks are concatenated to represent the final protein embeddings of the original input PPI network. Finally, a core-attachment based strategy is adopted to detect protein complexes in the weighted PPI network constructed by the pairwise similarity of protein embeddings. To assess the efficiency of our proposed method, DPCMNE is compared with other eight clustering algorithms on two yeast datasets. The experimental results show that the performance of DPCMNE outperforms those state-of-the-art complex detection methods in terms of F1 and F1+Acc. Furthermore, the results of functional enrichment analysis indicate that protein complexes detected by DPCMNE are more biologically significant in terms of P-score.
Collapse
|
8
|
Newaz K, Milenkovic T. Inference of a Dynamic Aging-related Biological Subnetwork via Network Propagation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:974-988. [PMID: 32897864 DOI: 10.1109/tcbb.2020.3022767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene expression (GE)data capture valuable condition-specific information ("condition" can mean a biological process, disease stage, age, patient, etc.)However, GE analyses ignore physical interactions between gene products, i.e., proteins. Because proteins function by interacting with each other, and because biological networks (BNs)capture these interactions, BN analyses are promising. However, current BN data fail to capture condition-specific information. Recently, GE and BN data have been integrated using network propagation (NP)to infer condition-specific BNs. However, existing NP-based studies result in a static condition-specific subnetwork, even though cellular processes are dynamic. A dynamic process of our interest is human aging. We use prominent existing NP methods in a new task of inferring a dynamic rather than static condition-specific (aging-related)subnetwork. Then, we study evolution of network structure with age - we identify proteins whose network positions significantly change with age and predict them as new aging-related candidates. We validate the predictions via e.g., functional enrichment analyses and literature search. Dynamic network inference via NP yields higher prediction quality than the only existing method for inferring a dynamic aging-related BN, which does not use NP. Our data and code are available at https://nd.edu/~cone/dynetinf.
Collapse
|
9
|
Feng H, Zheng R, Wang J, Wu FX, Li M. NIMCE: A Gene Regulatory Network Inference Approach Based on Multi Time Delays Causal Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1042-1049. [PMID: 33035155 DOI: 10.1109/tcbb.2020.3029846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks (GRNs)are involved in various biological processes, such as cell cycle, differentiation and apoptosis. The existing large amount of expression data, especially the time-series expression data, provide a chance to infer GRNs by computational methods. These data can reveal the dynamics of gene expression and imply the regulatory relationships among genes. However, identify the indirect regulatory links is still a big challenge as most studies treat time points as independent observations, while ignoring the influences of time delays. In this study, we propose a GRN inference method based on information-theory measure, called NIMCE. NIMCE incorporates the transfer entropy to measure the regulatory links between each pair of genes, then applies the causation entropy to filter indirect relationships. In addition, NIMCE applies multi time delays to identify indirect regulatory relationships from candidate genes. Experiments on simulated and colorectal cancer data show NIMCE outperforms than other competing methods. All data and codes used in this study are publicly available at https://github.com/CSUBioGroup/NIMCE.
Collapse
|
10
|
Zhang F, Song H, Zeng M, Wu FX, Li Y, Pan Y, Li M. A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2208-2217. [PMID: 31985440 DOI: 10.1109/tcbb.2020.2968882] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Knowledge of protein functions plays an important role in biology and medicine. With the rapid development of high-throughput technologies, a huge number of proteins have been discovered. However, there are a great number of proteins without functional annotations. A protein usually has multiple functions and some functions or biological processes require interactions of a plurality of proteins. Additionally, Gene Ontology provides a useful classification for protein functions and contains more than 40,000 terms. We propose a deep learning framework called DeepGOA to predict protein functions with protein sequences and protein-protein interaction (PPI) networks. For protein sequences, we extract two types of information: sequence semantic information and subsequence-based features. We use the word2vec technique to numerically represent protein sequences, and utilize a Bi-directional Long and Short Time Memory (Bi-LSTM) and multi-scale convolutional neural network (multi-scale CNN) to obtain the global and local semantic features of protein sequences, respectively. Additionally, we use the InterPro tool to scan protein sequences for extracting subsequence-based information, such as domains and motifs. Then, the information is plugged into a neural network to generate high-quality features. For the PPI network, the Deepwalk algorithm is applied to generate its embedding information of PPI. Then the two types of features are concatenated together to predict protein functions. To evaluate the performance of DeepGOA, several different evaluation methods and metrics are utilized. The experimental results show that DeepGOA outperforms DeepGO and BLAST.
Collapse
|
11
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
12
|
Zheng R, Li M, Chen X, Zhao S, Wu FX, Pan Y, Wang J. An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:347-354. [PMID: 30794516 DOI: 10.1109/tcbb.2019.2900614] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Gene regulatory networks (GRNs) play a key role in biological processes. However, GRNs are diverse under different biological conditions. Reconstructing gene regulatory networks (GRNs) from gene expression has become an important opportunity and challenge in the past decades. Although there are a lot of existing methods to infer the topology of GRNs, such as mutual information, random forest, and partial least squares, the accuracy is still low due to the noise and high dimension of the expression data. In this paper, we introduce an ensemble Multivariate Adaptive Regression Splines (MARS) based method to reconstruct the directed GRNs from multifactorial gene expression data, called PBMarsNet. PBMarsNet incorporates part mutual information (PMI) to pre-weight the candidate regulatory genes and then uses MARS to detect the nonlinear regulatory links. Moreover, we apply bootstrap to run the MARS multiple times and average the outputs of each MARS as the final score of regulatory links. The results on DREAM4 challenge and DREAM5 challenge datasets show PBMarsNet has a superior performance and generalization over other state-of-the-art methods.
Collapse
|
13
|
Zeng M, Li M, Fei Z, Wu FX, Li Y, Pan Y, Wang J. A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:296-305. [PMID: 30736002 DOI: 10.1109/tcbb.2019.2897679] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Computational methods including centrality and machine learning-based methods have been proposed to identify essential proteins for understanding the minimum requirements of the survival and evolution of a cell. In centrality methods, researchers are required to design a score function which is based on prior knowledge, yet is usually not sufficient to capture the complexity of biological information. In machine learning-based methods, some selected biological features cannot represent the complete properties of biological information as they lack a computational framework to automatically select features. To tackle these problems, we propose a deep learning framework to automatically learn biological features without prior knowledge. We use node2vec technique to automatically learn a richer representation of protein-protein interaction (PPI) network topologies than a score function. Bidirectional long short term memory cells are applied to capture non-local relationships in gene expression data. For subcellular localization information, we exploit a high dimensional indicator vector to characterize their feature. To evaluate the performance of our method, we tested it on PPI network of S. cerevisiae. Our experimental results demonstrate that the performance of our method is better than traditional centrality methods and is superior to existing machine learning-based methods. To explore which of the three types of biological information is the most vital element, we conduct an ablation study by removing each component in turn. Our results show that the PPI network embedding contributes most to the improvement. In addition, gene expression profiles and subcellular localization information are also helpful to improve the performance in identification of essential proteins.
Collapse
|
14
|
Li G, Li M, Wang J, Li Y, Pan Y. United Neighborhood Closeness Centrality and Orthology for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1451-1458. [PMID: 30596582 DOI: 10.1109/tcbb.2018.2889978] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Identifying essential proteins plays an important role in disease study, drug design, and understanding the minimal requirement for cellular life. Computational methods for essential proteins discovery overcome the disadvantages of biological experimental methods that are often time-consuming, expensive, and inefficient. The topological features of protein-protein interaction (PPI) networks are often used to design computational prediction methods, such as Degree Centrality (DC), Betweenness Centrality (BC), Closeness Centrality (CC), Subgraph Centrality (SC), Eigenvector Centrality (EC), Information Centrality (IC), and Neighborhood Centrality (NC). However, the prediction accuracies of these individual methods still have space to be improved. Studies show that additional information, such as orthologous relations, helps discover essential proteins. Many researchers have proposed different methods by combining multiple information sources to gain improvement of prediction accuracy. In this study, we find that essential proteins appear in triangular structure in PPI network significantly more often than nonessential ones. Based on this phenomenon, we propose a novel pure centrality measure, so-called Neighborhood Closeness Centrality (NCC). Accordingly, we develop a new combination model, Extended Pareto Optimality Consensus model, named EPOC, to fuse NCC and Orthology information and a novel essential proteins identification method, NCCO, is fully proposed. Compared with seven existing classic centrality methods (DC, BC, IC, CC, SC, EC, and NC) and three consensus methods (PeC, ION, and CSC), our results on S.cerevisiae and E.coli datasets show that NCCO has clear advantages. As a consensus method, EPOC also yields better performance than the random walk model.
Collapse
|
15
|
Lei X, Fang M, Guo L, Wu FX. Protein complex detection based on flower pollination mechanism in multi-relation reconstructed dynamic protein networks. BMC Bioinformatics 2019; 20:131. [PMID: 30925866 PMCID: PMC6440282 DOI: 10.1186/s12859-019-2649-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Detecting protein complex in protein-protein interaction (PPI) networks plays a significant part in bioinformatics field. It enables us to obtain the better understanding for the structures and characteristics of biological systems. Methods In this study, we present a novel algorithm, named Improved Flower Pollination Algorithm (IFPA), to identify protein complexes in multi-relation reconstructed dynamic PPI networks. Specifically, we first introduce a concept called co-essentiality, which considers the protein essentiality to search essential interactions, Then, we devise the multi-relation reconstructed dynamic PPI networks (MRDPNs) and discover the potential cores of protein complexes in MRDPNs. Finally, an IFPA algorithm is put forward based on the flower pollination mechanism to generate protein complexes by simulating the process of pollen find the optimal pollination plants, namely, attach the peripheries to the corresponding cores. Results The experimental results on three different datasets (DIP, MIPS and Krogan) show that our IFPA algorithm is more superior to some representative methods in the prediction of protein complexes. Conclusions Our proposed IFPA algorithm is powerful in protein complex detection by building multi-relation reconstructed dynamic protein networks and using improved flower pollination algorithm. The experimental results indicate that our IFPA algorithm can obtain better performance than other methods.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, 710119, Xi'an, China.
| | - Ming Fang
- School of Computer Science, Shaanxi Normal University, 710119, Xi'an, China
| | - Ling Guo
- College of Life Sciences, Shaanxi Normal University, 710119, Xi'an, China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| |
Collapse
|
16
|
Zhang F, Peng W, Yang Y, Dai W, Song J. A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein⁻Protein Interactive Networks. Genes (Basel) 2019; 10:genes10010031. [PMID: 30626157 PMCID: PMC6356314 DOI: 10.3390/genes10010031] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 12/24/2018] [Accepted: 01/02/2019] [Indexed: 11/16/2022] Open
Abstract
Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational methods have been proposed to detect essential genes based on the static protein⁻protein interactive (PPI) networks. However, these methods have ignored the fact that essential genes play essential roles under certain conditions. In this work, a novel method was proposed for the identification of essential proteins by fusing the dynamic PPI networks of different time points (called by FDP). Firstly, the active PPI networks of each time point were constructed and then they were fused into a final network according to the networks' similarities. Finally, a novel centrality method was designed to assign each gene in the final network a ranking score, whilst considering its orthologous property and its global and local topological properties in the network. This model was applied on two different yeast data sets. The results showed that the FDP achieved a better performance in essential gene prediction as compared to other existing methods that are based on the static PPI network or that are based on dynamic networks.
Collapse
Affiliation(s)
- Fengyu Zhang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
- Computer Center of Kunming University of Science and Technology, Kunming 650093, China.
| | - Yunfei Yang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Junrong Song
- Faculty of Management and Economics, Kunming University of Science and Technology, Kunming 650093, China.
| |
Collapse
|
17
|
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
|
18
|
Zou Q, He W. Special Protein Molecules Computational Identification. Int J Mol Sci 2018; 19:ijms19020536. [PMID: 29439426 PMCID: PMC5855758 DOI: 10.3390/ijms19020536] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/02/2018] [Accepted: 02/10/2018] [Indexed: 01/29/2023] Open
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| | - Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| |
Collapse
|