1
|
Zuo Y, Wu X, Ge F, Yan H, Fei S, Liang J, Deng Z. Research progress on Drug-Target Interactions in the last five years. Anal Biochem 2025; 697:115691. [PMID: 39455038 DOI: 10.1016/j.ab.2024.115691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/06/2024] [Accepted: 10/16/2024] [Indexed: 10/28/2024]
Abstract
The identification of Drug-Target Interaction (DTI) is an important step in drug discovery and drug repositioning, and has high application value in multiple fields such as drug discovery, drug repositioning, and repurposing. However, the high cost of experimental validation limits its identification. In contrast, computation-based approaches are both economical and efficient. This review first synthesizes existing chemical genomic approaches, provides a comprehensive summary of prevalent databases for predicting DTIs, and categorizes the feature encodings from recent years. This is followed by an overview and brief description of the methods currently in use for predicting DTIs. The strengths and weaknesses of newly proposed prediction methods in the last five years (2020-2024), including those based on network representation learning and graph neural networks, are then discussed in detail, evaluating the performance of the different methods on a wide range of datasets. Finally, this review explores potential directions for future DTI research, emphasizing how to improve prediction accuracy and efficiency by combining big data and emerging computing technologies.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| | - Xubin Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Fei Ge
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Hongjin Yan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Sirui Fei
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Jingwen Liang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| |
Collapse
|
2
|
Guo Y, Yi M. THGNCDA: circRNA-disease association prediction based on triple heterogeneous graph network. Brief Funct Genomics 2024; 23:384-394. [PMID: 37738503 DOI: 10.1093/bfgp/elad042] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/04/2023] [Indexed: 09/24/2023] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNA molecules featuring a closed circular structure. They have been proved to play a significant role in the reduction of many diseases. Besides, many researches in clinical diagnosis and treatment of disease have revealed that circRNA can be considered as a potential biomarker. Therefore, understanding the association of circRNA and diseases can help to forecast some disorders of life activities. However, traditional biological experimental methods are time-consuming. The most common method for circRNA-disease association prediction on the basis of machine learning can avoid this, which relies on diverse data. Nevertheless, topological information of circRNA and disease usually is not involved in these methods. Moreover, circRNAs can be associated with diseases through miRNAs. With these considerations, we proposed a novel method, named THGNCDA, to predict the association between circRNAs and diseases. Specifically, for a certain pair of circRNA and disease, we employ a graph neural network with attention to learn the importance of its each neighbor. In addition, we use a multilayer convolutional neural network to explore the relationship of a circRNA-disease pair based on their attributes. When calculating embeddings, we introduce the information of miRNAs. The results of experiments show that THGNCDA outperformed the SOTA methods. In addition, it can be observed that our method gives a better recall rate. To confirm the significance of attention, we conducted extensive ablation studies. Case studies on Urinary Bladder and Prostatic Neoplasms further show THGNCDA's ability in discovering known relationships between circRNA candidates and diseases.
Collapse
Affiliation(s)
- Yuwei Guo
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, 388 Lumo Road, Hongshan District, 430074, Wuhan, Hubei, China
| |
Collapse
|
3
|
Zhu Y, Ning C, Zhang N, Wang M, Zhang Y. GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph. BMC Biol 2024; 22:156. [PMID: 39020316 PMCID: PMC11256582 DOI: 10.1186/s12915-024-01949-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 07/01/2024] [Indexed: 07/19/2024] Open
Abstract
BACKGROUND Identification of potential drug-target interactions (DTIs) with high accuracy is a key step in drug discovery and repositioning, especially concerning specific drug targets. Traditional experimental methods for identifying the DTIs are arduous, time-intensive, and financially burdensome. In addition, robust computational methods have been developed for predicting the DTIs and are widely applied in drug discovery research. However, advancing more precise algorithms for predicting DTIs is essential to meet the stringent standards demanded by drug discovery. RESULTS We proposed a novel method called GSRF-DTI, which integrates networks with a deep learning algorithm to identify DTIs. Firstly, GSRF-DTI learned the embedding representation of drugs and targets by integrating multiple drug association information and target association information, respectively. Then, GSRF-DTI considered the influence of drug-target pair (DTP) association on DTI prediction to construct a drug-target pair network (DTP-NET). Next, we utilized GraphSAGE on DTP-NET to learn the potential features of the network and applied random forest (RF) to predict the DTIs. Furthermore, we conducted ablation experiments to validate the necessity of integrating different types of network features for identifying DTIs. It is worth noting that GSRF-DTI proposed three novel DTIs. CONCLUSIONS GSRF-DTI not only considered the influence of the interaction relationship between drug and target but also considered the impact of DTP association relationship on DTI prediction. We initially use GraphSAGE to aggregate the neighbor information of nodes for better identification. Experimental analysis on Luo's dataset and the newly constructed dataset revealed that the GSRF-DTI framework outperformed several state-of-the-art methods significantly.
Collapse
Affiliation(s)
- Yongdi Zhu
- School of Mathematics and Statistics, Shandong University, Weihai, Shandong, China
| | - Chunhui Ning
- School of Mathematics and Statistics, Shandong University, Weihai, Shandong, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, Shandong, China
| | - Mingyi Wang
- Department of Central Lab, Weihai Municipal Hospital, Weihai, Shandong, China.
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, Shandong, China.
| |
Collapse
|
4
|
Zhang Q, Zuo L, Ren Y, Wang S, Wang W, Ma L, Zhang J, Xia B. FMCA-DTI: a fragment-oriented method based on a multihead cross attention mechanism to improve drug-target interaction prediction. Bioinformatics 2024; 40:btae347. [PMID: 38810106 PMCID: PMC11256963 DOI: 10.1093/bioinformatics/btae347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/23/2024] [Accepted: 05/28/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Identifying drug-target interactions (DTI) is crucial in drug discovery. Fragments are less complex and can accurately characterize local features, which is important in DTI prediction. Recently, deep learning (DL)-based methods predict DTI more efficiently. However, two challenges remain in existing DL-based methods: (i) some methods directly encode drugs and proteins into integers, ignoring the substructure representation; (ii) some methods learn the features of the drugs and proteins separately instead of considering their interactions. RESULTS In this article, we propose a fragment-oriented method based on a multihead cross attention mechanism for predicting DTI, named FMCA-DTI. FMCA-DTI obtains multiple types of fragments of drugs and proteins by branch chain mining and category fragment mining. Importantly, FMCA-DTI utilizes the shared-weight-based multihead cross attention mechanism to learn the complex interaction features between different fragments. Experiments on three benchmark datasets show that FMCA-DTI achieves significantly improved performance by comparing it with four state-of-the-art baselines. AVAILABILITY AND IMPLEMENTATION The code for this workflow is available at: https://github.com/jacky102022/FMCA-DTI.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Le Zuo
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Ying Ren
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Siyuan Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Wenfa Wang
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Lerong Ma
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| | - Jing Zhang
- Medical College of Yan'an University, Yan'an University, Yan'an 716000, China
- Medical Research and Experimental Center, The Second Affiliated Hospital of Xi'an Medical University, Xi'an 710021, China
| | - Bisheng Xia
- College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China
| |
Collapse
|
5
|
Sulaimany S, Farahmandi K, Mafakheri A. Computational prediction of new therapeutic effects of probiotics. Sci Rep 2024; 14:11932. [PMID: 38789535 PMCID: PMC11126595 DOI: 10.1038/s41598-024-62796-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 05/21/2024] [Indexed: 05/26/2024] Open
Abstract
Probiotics are living microorganisms that provide health benefits to their hosts, potentially aiding in the treatment or prevention of various diseases, including diarrhea, irritable bowel syndrome, ulcerative colitis, and Crohn's disease. Motivated by successful applications of link prediction in medical and biological networks, we applied link prediction to the probiotic-disease network to identify unreported relations. Using data from the Probio database and International Classification of Diseases-10th Revision (ICD-10) resources, we constructed a bipartite graph focused on the relationship between probiotics and diseases. We applied customized link prediction algorithms for this bipartite network, including common neighbors, Jaccard coefficient, and Adamic/Adar ranking formulas. We evaluated the results using Area under the Curve (AUC) and precision metrics. Our analysis revealed that common neighbors outperformed the other methods, with an AUC of 0.96 and precision of 0.6, indicating that basic formulas can predict at least six out of ten probable relations correctly. To support our findings, we conducted an exact search of the top 20 predictions and found six confirming papers on Google Scholar and Science Direct. Evidence suggests that Lactobacillus jensenii may provide prophylactic and therapeutic benefits for gastrointestinal diseases and that Lactobacillus acidophilus may have potential activity against urologic and female genital illnesses. Further investigation of other predictions through additional preclinical and clinical studies is recommended. Future research may focus on deploying more powerful link prediction algorithms to achieve better and more accurate results.
Collapse
Affiliation(s)
- Sadegh Sulaimany
- Social and Biological Network Analysis Laboratory (SBNA), Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran.
| | - Kajal Farahmandi
- Department of Industrial and Environmental Biotechnology, National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran
| | - Aso Mafakheri
- Social and Biological Network Analysis Laboratory (SBNA), Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran
| |
Collapse
|
6
|
Luke SS, Raj MN, Ramesh S, Bhatt NP. Network pharmacology prediction and molecular docking-based strategy to explore the potential mechanism of squalene against inflammation. In Silico Pharmacol 2024; 12:44. [PMID: 38756678 PMCID: PMC11093945 DOI: 10.1007/s40203-024-00217-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Accepted: 04/26/2024] [Indexed: 05/18/2024] Open
Abstract
Squalene (SQ) has been documented in the past for its ability to reduce inflammation, but its mechanism needs more information. In this study, we investigated squalene as an anti-inflammatory drug candidate and the framework involved in treating inflammation (INF) using the network pharmacology concept. The molecular targets of SQ and INF that are available in databases and the overlaps between these targets were demonstrated using InteractiVenn. The protein-protein networks were generated that in turn revealed several key targets and were further processed with Cytoscape. The gene ontology enrichment and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) studies were performed. We also performed molecular docking tests that validated the binding affinity of molecular targets and drugs. A total of 100 SQ targets and 11,417 INF-related targets yielded 93 overlapping targets. Seven core targets, CRHR1, EGFR, ERBB2, HIF1A, SLC6A3, MAP2K1, and F2R were found to be relevant with respective to SQ's anti-inflammatory activity. The underlying mechanism of SQ with regard to INF was interpreted by analyzing various enrichment analyses along with the KEGG pathway. In conclusion, SQ played a vital role in the management of INF by regulating CRHR1, EGFR, ERBB2, HIF1A, SLC6A3, MAP2K1, and F2R. The research outcomes are crucial as they offer significant insights into the use of SQ for combating inflammation. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1007/s40203-024-00217-0.
Collapse
Affiliation(s)
- Shana Sara Luke
- Department of Biotechnology, Faculty of Science and Humanities, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nādu 603203 India
| | - M. Naveen Raj
- Department of Biotechnology, Faculty of Science and Humanities, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nādu 603203 India
| | - Suraj Ramesh
- Department of Biotechnology, Faculty of Science and Humanities, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nādu 603203 India
| | - N. Prasanth Bhatt
- Department of Biotechnology, Faculty of Science and Humanities, SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nādu 603203 India
| |
Collapse
|
7
|
Liu Z, Chen Q, Lan W, Lu H, Zhang S. SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning. Artif Intell Med 2024; 149:102778. [PMID: 38462280 DOI: 10.1016/j.artmed.2024.102778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 12/01/2023] [Accepted: 01/14/2024] [Indexed: 03/12/2024]
Abstract
Many computational methods have been proposed to identify potential drug-target interactions (DTIs) to expedite drug development. Graph neural network (GNN) methods are considered to be one of the most effective approaches. However, shallow GNN methods can only aggregate local information from nodes. Also, deep GNN methods may result in over-smoothing while obtaining long-distance neighbourhood information. As a result, existing GNN methods struggle to extract the complete features of the graph. Additionally, the number of known DTIs is insufficient, and there are far more unknown drug-target pairs than known DTIs, leading to class imbalance. This article proposes a model that combines graph autoencoder and self-supervised learning to accurately encode multilevel features of graphs using only a small number of labelled samples. We introduce a positive sample compensation coefficient to the objective function to mitigate the impact of class imbalance. Experiments on two datasets demonstrated that our model outperforms the four baseline methods, and the new DTIs predicted by the SSLDTI model were verified by the DrugBank database.
Collapse
Affiliation(s)
- Zhixian Liu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Qingfeng Chen
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China.
| | - Wei Lan
- School of Computer, Electronic and Information, Guangxi University, Nanning, Guangxi, China
| | - Huihui Lu
- School of Electronics and Information Engineering, Beibu Gulf University, Qinzhou, Guangxi, China
| | - Shichao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China.
| |
Collapse
|
8
|
Qu X, Du G, Hu J, Cai Y. Graph-DTI: A New Model for Drug-target Interaction Prediction Based on Heterogenous Network Graph Embedding. Curr Comput Aided Drug Des 2024; 20:1013-1024. [PMID: 37448360 DOI: 10.2174/1573409919666230713142255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/04/2023] [Accepted: 05/26/2023] [Indexed: 07/15/2023]
Abstract
BACKGROUND In this study, we aimed to develop a new end-to-end learning model called Graph-Drug-Target Interaction (DTI), which integrates various types of information in the heterogeneous network data, and to explore automatic learning of the topology-maintaining representations of drugs and targets, thereby effectively contributing to the prediction of DTI. Precise predictions of DTI can guide drug discovery and development. Most machine learning algorithms integrate multiple data sources and combine them with common embedding methods. However, the relationship between the drugs and target proteins is not well reported. Although some existing studies have used heterogeneous network graphs for DTI prediction, there are many limitations in the neighborhood information between the nodes in the heterogeneous network graphs. We studied the drug-drug interaction (DDI) and DTI from DrugBank Version 3.0, protein-protein interaction (PPI) from the human protein reference database Release 9, drug structure similarity from Morgan fingerprints of radius 2 and calculated by RDKit, and protein sequence similarity from Smith-Waterman score. METHODS Our study consists of three major components. First, various drugs and target proteins were integrated, and a heterogeneous network was established based on a series of data sets. Second, the graph neural networks-inspired graph auto-encoding method was used to extract high-order structural information from the heterogeneous networks, thereby revealing the description of nodes (drugs and proteins) and their topological neighbors. Finally, potential DTI prediction was made, and the obtained samples were sent to the classifier for secondary classification. RESULTS The performance of Graph-DTI and all baseline methods was evaluated using the sums of the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC). The results indicated that Graph-DTI outperformed the baseline methods in both performance results. CONCLUSION Compared with other baseline DTI prediction methods, the results showed that Graph-DTI had better prediction performance. Additionally, in this study, we effectively classified drugs corresponding to different targets and vice versa. The above findings showed that Graph-DTI provided a powerful tool for drug research, development, and repositioning. Graph- DTI can serve as a drug development and repositioning tool more effectively than previous studies that did not use heterogeneous network graph embedding.
Collapse
Affiliation(s)
- Xiaohan Qu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Guoxia Du
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Jing Hu
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
| | - Yongming Cai
- School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
- Guangdong Provincial Traditional Chinese Medicine Precision Medicine Big Data Engineering Technology Research Center, Guangzhou, China
| |
Collapse
|
9
|
Jin S, Hong Y, Zeng L, Jiang Y, Lin Y, Wei L, Yu Z, Zeng X, Liu X. A general hypergraph learning algorithm for drug multi-task predictions in micro-to-macro biomedical networks. PLoS Comput Biol 2023; 19:e1011597. [PMID: 37956212 PMCID: PMC10681315 DOI: 10.1371/journal.pcbi.1011597] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/27/2023] [Accepted: 10/13/2023] [Indexed: 11/15/2023] Open
Abstract
The powerful combination of large-scale drug-related interaction networks and deep learning provides new opportunities for accelerating the process of drug discovery. However, chemical structures that play an important role in drug properties and high-order relations that involve a greater number of nodes are not tackled in current biomedical networks. In this study, we present a general hypergraph learning framework, which introduces Drug-Substructures relationship into Molecular interaction Networks to construct the micro-to-macro drug centric heterogeneous network (DSMN), and develop a multi-branches HyperGraph learning model, called HGDrug, for Drug multi-task predictions. HGDrug achieves highly accurate and robust predictions on 4 benchmark tasks (drug-drug, drug-target, drug-disease, and drug-side-effect interactions), outperforming 8 state-of-the-art task specific models and 6 general-purpose conventional models. Experiments analysis verifies the effectiveness and rationality of the HGDrug model architecture as well as the multi-branches setup, and demonstrates that HGDrug is able to capture the relations between drugs associated with the same functional groups. In addition, our proposed drug-substructure interaction networks can help improve the performance of existing network models for drug-related prediction tasks.
Collapse
Affiliation(s)
- Shuting Jin
- School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
- School of Informatics, Xiamen University, Xiamen, China
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yue Hong
- School of Informatics, Xiamen University, Xiamen, China
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, China
| | - Yinghui Jiang
- School of Informatics, Xiamen University, Xiamen, China
| | - Yuan Lin
- School of Economics, Innovation, and Technology, Kristiania University College, Bergen, Norway
| | - Leyi Wei
- School of Software, Shandong University, Shandong, China
| | - Zhuohang Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Hunan, China
| | - Xiangrong Liu
- School of Informatics, Xiamen University, Xiamen, China
- Zhejiang Lab, Hangzhou, China
| |
Collapse
|
10
|
Milano M, Agapito G, Cannataro M. An Exploratory Application of Multilayer Networks and Pathway Analysis in Pharmacogenomics. Genes (Basel) 2023; 14:1915. [PMID: 37895264 PMCID: PMC10606656 DOI: 10.3390/genes14101915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/26/2023] [Accepted: 10/05/2023] [Indexed: 10/29/2023] Open
Abstract
Over the years, network analysis has become a promising strategy for analysing complex system, i.e., systems composed of a large number of interacting elements. In particular, multilayer networks have emerged as a powerful framework for modelling and analysing complex systems with multiple types of interactions. Network analysis can be applied to pharmacogenomics to gain insights into the interactions between genes, drugs, and diseases. By integrating network analysis techniques with pharmacogenomic data, the goal consists of uncovering complex relationships and identifying key genes to use in pathway enrichment analysis to figure out biological pathways involved in drug response and adverse reactions. In this study, we modelled omics, disease, and drug data together through multilayer network representation. Then, we mined the multilayer network with a community detection algorithm to obtain the top communities. After that, we used the identified list of genes from the communities to perform pathway enrichment analysis (PEA) to figure out the biological function affected by the selected genes. The results show that the genes forming the top community have multiple roles through different pathways.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Experimental and Clinical Medicine, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy; (G.A.); (M.C.)
| | - Giuseppe Agapito
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy; (G.A.); (M.C.)
- Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| | - Mario Cannataro
- Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy; (G.A.); (M.C.)
- Department of Medical and Surgical Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
| |
Collapse
|
11
|
Li M, Cai X, Xu S, Ji H. Metapath-aggregated heterogeneous graph neural network for drug-target interaction prediction. Brief Bioinform 2023; 24:6966534. [PMID: 36592060 DOI: 10.1093/bib/bbac578] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 11/03/2022] [Accepted: 11/26/2022] [Indexed: 01/03/2023] Open
Abstract
Drug-target interaction (DTI) prediction is an essential step in drug repositioning. A few graph neural network (GNN)-based methods have been proposed for DTI prediction using heterogeneous biological data. However, existing GNN-based methods only aggregate information from directly connected nodes restricted in a drug-related or a target-related network and are incapable of capturing high-order dependencies in the biological heterogeneous graph. In this paper, we propose a metapath-aggregated heterogeneous graph neural network (MHGNN) to capture complex structures and rich semantics in the biological heterogeneous graph for DTI prediction. Specifically, MHGNN enhances heterogeneous graph structure learning and high-order semantics learning by modeling high-order relations via metapaths. Additionally, MHGNN enriches high-order correlations between drug-target pairs (DTPs) by constructing a DTP correlation graph with DTPs as nodes. We conduct extensive experiments on three biological heterogeneous datasets. MHGNN favorably surpasses 17 state-of-the-art methods over 6 evaluation metrics, which verifies its efficacy for DTI prediction. The code is available at https://github.com/Zora-LM/MHGNN-DTI.
Collapse
Affiliation(s)
- Mei Li
- Tianjin Key Laboratory of Network and Data Security Technology, China.,College of Computer Science, Nankai University, 300350, Tianjin, China
| | - Xiangrui Cai
- Tianjin Key Laboratory of Network and Data Security Technology, China.,College of Computer Science, Nankai University, 300350, Tianjin, China
| | - Sihan Xu
- Tianjin Key Laboratory of Network and Data Security Technology, China.,College of Cyber Science, Nankai University, 300350, Tianjin, China
| | - Hua Ji
- Tianjin Key Laboratory of Network and Data Security Technology, China.,College of Computer Science, Nankai University, 300350, Tianjin, China
| |
Collapse
|
12
|
Li M, Cai X, Li L, Xu S, Ji H. Heterogeneous Graph Attention Network for Drug-Target Interaction Prediction. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT 2022:1166-1176. [DOI: 10.1145/3511808.3557346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Affiliation(s)
- Mei Li
- Nankai University, Tianjin, China
| | | | - Linyu Li
- Nankai University, Tianjin, China
| | - Sihan Xu
- Nankai University, Tianjin, China
| | - Hua Ji
- Nankai University, Tianjin, China
| |
Collapse
|
13
|
A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins. Int J Mol Sci 2022; 23:ijms231710116. [PMID: 36077513 PMCID: PMC9456548 DOI: 10.3390/ijms231710116] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/29/2022] [Accepted: 08/31/2022] [Indexed: 11/17/2022] Open
Abstract
Thermophilic proteins have various practical applications in theoretical research and in industry. In recent years, the demand for thermophilic proteins on an industrial scale has been increasing; therefore, the engineering of thermophilic proteins has become a hot direction in the field of protein engineering. However, the exact mechanism of thermostability of proteins is not yet known, for engineering thermophilic proteins knowing the basis of thermostability is necessary. In order to understand the basis of the thermostability in proteins, we have made a statistical analysis of the sequences, secondary structures, hydrogen bonds, salt bridges, DHA (Donor-Hydrogen-Accepter) angles, and bond lengths of ten pairs of thermophilic proteins and their non-thermophilic orthologous. Our findings suggest that polar amino acids contribute to thermostability in proteins by forming hydrogen bonds and salt bridges which provide resistance against protein denaturation. Short bond length and a wider DHA angle provide greater bond stability in thermophilic proteins. Moreover, the increased frequency of aromatic amino acids in thermophilic proteins contributes to thermal stability by forming more aromatic interactions. Additionally, the coil, helix, and loop in the secondary structure also contribute to thermostability.
Collapse
|
14
|
Yuan L, Li H, Fu S, Zhang Z. Learning Behavior Evaluation Model and Teaching Strategy Innovation by Social Media Network Following Learning Psychology. Front Psychol 2022; 13:843428. [PMID: 35936300 PMCID: PMC9355304 DOI: 10.3389/fpsyg.2022.843428] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 06/13/2022] [Indexed: 12/02/2022] Open
Abstract
With the development of various network technologies and the spread of coronavirus disease 2019, many online learning platforms have been built. However, some of them may negatively impact student learning outcomes. Therefore, this study aims to improve the online learning effect of students by comprehensively evaluating their learning behavior by using deep learning algorithms. On this basis, new teaching strategies are proposed. According to the structured deep network embedding model, a network representation learning algorithm is proposed with the help of auto-encoders under deep learning. This study elaborates the concept and structure of the encoder model and tests its performance. After the node labels and dataset are trained, the applicable parameter λ2 of the model is 0.3. During the teaching process, the model's reliability in distinguishing users is examined. Therefore, this model can be applied to network teaching, is an innovative teaching strategy, and provides a theoretical basis for improving teaching methods.
Collapse
Affiliation(s)
- Lijuan Yuan
- College of Journalism and Communications, Zhoukou Normal University, Zhoukou, China
| | - Hongming Li
- Southampton Education School, University of Southampton, Southampton, United Kingdom
| | - Shiman Fu
- Southampton Education School, University of Southampton, Southampton, United Kingdom
| | - Zizai Zhang
- Hangzhou Preschool Teachers College, Zhejiang Normal University, Hangzhou, China
| |
Collapse
|
15
|
An integrated pan-cancer analysis of identifying biomarkers about the EGR family genes in human carcinomas. Comput Biol Med 2022; 148:105889. [DOI: 10.1016/j.compbiomed.2022.105889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/25/2022] [Accepted: 07/16/2022] [Indexed: 12/24/2022]
|
16
|
Liu P, Ding Y, Rong Y, Chen D. Prediction of cell penetrating peptides and their uptake efficiency using random forest‐based feature selections. AIChE J 2022. [DOI: 10.1002/aic.17781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Peng Liu
- Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China Chengdu China
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Yijie Ding
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Ying Rong
- Beidahuang Industry Group General Hospital Harbin China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University Quzhou China
| |
Collapse
|
17
|
DTIP-TC2A: An analytical framework for drug-target interactions prediction methods. Comput Biol Chem 2022; 99:107707. [DOI: 10.1016/j.compbiolchem.2022.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 05/01/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022]
|
18
|
Zhao S, Pan Q, Zou Q, Ju Y, Shi L, Su X. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews' correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qingfeng Pan
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
19
|
Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
DNA N6-methyladenine plays an important role in the restriction-modification system to isolate invasion from adventive DNA. The shortcomings of the high time-consumption and high costs of experimental methods have been exposed, and some computational methods have emerged. The support vector machine theory has received extensive attention in the bioinformatics field due to its solid theoretical foundation and many good characteristics.
Objective:
General machine learning methods include an important step of extracting features. The research has omitted this step and replaced with easy-to-obtain sequence distances matrix to obtain better results
Method:
First sequence alignment technology was used to achieve the similarity matrix. Then a novel transformation turned the similarity matrix into a distance matrix. Next, the similarity-distance matrix is made positive semi-definite so that it can be used in the kernel matrix. Finally, the LIBSVM software was applied to solve the support vector machine.
Results:
The five-fold cross-validation of this model on rice and mouse data has achieved excellent accuracy rates of 92.04% and 96.51%, respectively. This shows that the DB-SVM method has obvious advantages compared with traditional machine learning methods. Meanwhile this model achieved 0.943,0.982 and 0.818 accuracy,0.944, 0.982, and 0.838 Matthews correlation coefficient and 0.942, 0.982 and 0.840 F1 scores for the rice, M. musculus and cross-species genome datasets, respectively.
Conclusion:
These outcomes show that this model outperforms the iIM-CNN and csDMA in the prediction of DNA 6mA modification, which are the lastest research on DNA 6mA.
Collapse
Affiliation(s)
- Haoyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chenggang Song
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China
| |
Collapse
|
20
|
Wang Z, Zhang Y, Li Q, Zou Q, Liu Q. A road map for happiness: The psychological factors related cell types in various parts of human body from single cell RNA-seq data analysis. Comput Biol Med 2022; 143:105286. [PMID: 35183972 DOI: 10.1016/j.compbiomed.2022.105286] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/16/2022] [Accepted: 01/24/2022] [Indexed: 12/13/2022]
Abstract
Massive evidence from all sources including zoology, neurobiology and immunology has confirmed that psychological factors can raise remarkable physiological effects. Researchers have long been aware of the potential value of these effects and wanted to harness them in the development of new drugs and therapies, for which the mechanism study is a necessary prerequisite. However, most of these studies are restricted to neuroscience, or starts with blood sample and fall into the area of immunity. In this study, we choose to focus on the psychological factor of happiness, mining existing publicly available single cell RNA sequencing (scRNA-seq) data for the expression of happiness-related genes collected from various sources of literature in all types of cells in the samples, finding that the expression of these genes is not restricted within neuro-regulated cells or tissue-resident immune cells, on the opposite, cell types that are unique to tissue and organ without direct regulation from nervous system account for the majority to express the happiness-related genes. Our research is a preliminary exploration of where our body respond to our mind at cell level, and lays the foundation for more detailed mechanism research.
Collapse
Affiliation(s)
- Ziwei Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China
| | - Ying Zhang
- Department of Anesthesiology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China
| | - Qun Li
- Department of Pain, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China; Yangtze Delta Region Institute Quzhou, University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Qing Liu
- Department of Algology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China.
| |
Collapse
|
21
|
Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: Relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Comput Biol Med 2022; 143:105322. [PMID: 35217342 DOI: 10.1016/j.compbiomed.2022.105322] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 12/21/2022]
Abstract
Recently, a large number of studies have indicated that circRNAs with covalently closed loops play important roles in biological processes and have potential as diagnostic biomarkers. Therefore, research on the circRNA-disease relationship is helpful in disease diagnosis and treatment. However, traditional biological verification methods require considerable labor and time costs. In this paper, we propose a new computational method (RGCNCDA) to predict circRNA-disease associations based on relational graph convolutional networks (R-GCNs). The method first integrates the circRNA similarity network, miRNA similarity network, disease similarity network and association networks among them to construct a global heterogeneous network. Then, it employs the random walk with restart (RWR) and principal component analysis (PCA) models to learn low-dimensional and high-order information from the global heterogeneous network as the topological features. Finally, a prediction model based on an R-GCN encoder and a DistMult decoder is built to predict the potential disease-associated circRNA. The predicted results demonstrate that RGCNCDA performs significantly better than the other six state-of-the-art methods in a 5-fold cross validation. Furthermore, the case study illustrates that RGCNCDA can effectively discover potential circRNA-disease associations.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yanpeng Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University, Foshan, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
22
|
Kang Q, Meng J, Luan Y. RNAI-FRID: novel feature representation method with information enhancement and dimension reduction for RNA-RNA interaction. Brief Bioinform 2022; 23:6555402. [PMID: 35352114 DOI: 10.1093/bib/bbac107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 02/22/2022] [Accepted: 03/02/2022] [Indexed: 11/12/2022] Open
Abstract
Different ribonucleic acids (RNAs) can interact to form regulatory networks that play important role in many life activities. Molecular biology experiments can confirm RNA-RNA interactions to facilitate the exploration of their biological functions, but they are expensive and time-consuming. Machine learning models can predict potential RNA-RNA interactions, which provide candidates for molecular biology experiments to save a lot of time and cost. Using a set of suitable features to represent the sample is crucial for training powerful models, but there is a lack of effective feature representation for RNA-RNA interaction. This study proposes a novel feature representation method with information enhancement and dimension reduction for RNA-RNA interaction (named RNAI-FRID). Diverse base features are first extracted from RNA data to contain more sample information. Then, the extracted base features are used to construct the complex features through an arithmetic-level method. It greatly reduces the feature dimension while keeping the relationship between molecule features. Since the dimension reduction may cause information loss, in the process of complex feature construction, the arithmetic mean strategy is adopted to enhance the sample information further. Finally, three feature ranking methods are integrated for feature selection on constructed complex features. It can adaptively retain important features and remove redundant ones. Extensive experiment results show that RNAI-FRID can provide reliable feature representation for RNA-RNA interaction with higher efficiency and the model trained with generated features obtain better performance than other deep neural network predictors.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning, 116024, China
| |
Collapse
|
23
|
HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins. Comput Biol Med 2022; 145:105395. [PMID: 35334314 DOI: 10.1016/j.compbiomed.2022.105395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/08/2022] [Accepted: 03/08/2022] [Indexed: 12/24/2022]
Abstract
The identification of DNA-binding proteins (DBPs) has always been a hot issue in the field of sequence classification. However, considering that the experimental identification method is very resource-intensive, the construction of a computational prediction model is worthwhile. This study developed and evaluated a hybrid kernel alignment maximization-based multiple kernel model (HKAM-MKM) for predicting DBPs. First, we collected two datasets and performed feature extraction on the sequences to obtain six feature groups, and then constructed the corresponding kernels. To ensure the effective utilisation of the base kernel and avoid ignoring the difference between the sample and its neighbours, we proposed local kernel alignment to calculate the kernel between the sample and its neighbours, with each sample as the centre. We combined the global and local kernel alignments to develop a hybrid kernel alignment model, and balance the relationship between the two through parameters. By maximising the hybrid kernel alignment value, we obtained the weight of each kernel and then linearly combined the kernels in the form of weights. Finally, the fused kernel was input into a support vector machine for training and prediction. Finally, in the independent test sets PDB186 and PDB2272, we obtained the highest Matthew's correlation coefficient (MCC) (0.768 and 0.5962, respectively) and the highest accuracy (87.1% and 78.43%, respectively), which were superior to the other predictors. Therefore, HKAM-MKM is an efficient prediction tool for DBPs.
Collapse
|
24
|
Maximal Information Coefficient-Based Testing to Identify Epistasis in Case-Control Association Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7843990. [PMID: 35211187 PMCID: PMC8863443 DOI: 10.1155/2022/7843990] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 01/12/2022] [Accepted: 01/27/2022] [Indexed: 12/18/2022]
Abstract
Interactions between genetic variants (epistasis) are ubiquitous in the model system and can significantly affect evolutionary adaptation, genetic mapping, and precision medical efforts. In this paper, we proposed a method for epistasis detection, called EpiMIC (epistasis detection through a maximal information coefficient (MIC)). MIC is a promising bivariate dependence measure explicitly designed for rapidly exploring various function types equally and for interpreting and comparing them on the same scale. Most epistasis detection approaches make assumptions about the form of the association between genetic variants, resulting in limited statistical performance. Based on the notion that if two SNPs do not interact, their joint distribution in all samples and in only cases should not be substantially different. We developed a statistic that utilizes the difference of MIC as a signal of epistasis and combined it with a permutation resampling strategy to estimate the empirical distribution of our statistic. Results of simulation and real-world data set showed that EpiMIC outperformed previous approaches for identifying epistasis at varying degrees of heredity.
Collapse
|
25
|
Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022; 12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Collapse
Affiliation(s)
- Ziye Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| |
Collapse
|
26
|
Ma D, Chen Z, He Z, Huang X. A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem. Front Genet 2022; 12:818841. [PMID: 35154261 PMCID: PMC8832978 DOI: 10.3389/fgene.2021.818841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 12/14/2021] [Indexed: 11/13/2022] Open
Abstract
Machine learning has been widely used to solve complex problems in engineering applications and scientific fields, and many machine learning-based methods have achieved good results in different fields. SNAREs are key elements of membrane fusion and required for the fusion process of stable intermediates. They are also associated with the formation of some psychiatric disorders. This study processes the original sequence data with the synthetic minority oversampling technique (SMOTE) to solve the problem of data imbalance and produces the most suitable machine learning model with the iLearnPlus platform for the identification of SNARE proteins. Ultimately, a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the cross-validation dataset, and a sensitivity of 66.67%, specificity of 93.63%, accuracy of 91.33%, and MCC of 0.528 were obtained in the independent dataset (the adaptive skip dipeptide composition descriptor was used for feature extraction, and LightGBM with proper parameters was used as the classifier). These results demonstrate that this combination can perform well in the classification of SNARE proteins and is superior to other methods.
Collapse
|
27
|
Wang S, Song T, Zhang S, Jiang M, Wei Z, Li Z. Molecular substructure tree generative model for de novo drug design. Brief Bioinform 2022; 23:6510156. [PMID: 35039853 DOI: 10.1093/bib/bbab592] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 12/19/2021] [Accepted: 12/19/2021] [Indexed: 01/19/2023] Open
Abstract
Deep learning shortens the cycle of the drug discovery for its success in extracting features of molecules and proteins. Generating new molecules with deep learning methods could enlarge the molecule space and obtain molecules with specific properties. However, it is also a challenging task considering that the connections between atoms are constrained by chemical rules. Aiming at generating and optimizing new valid molecules, this article proposed Molecular Substructure Tree Generative Model, in which the molecule is generated by adding substructure gradually. The proposed model is based on the Variational Auto-Encoder architecture, which uses the encoder to map molecules to the latent vector space, and then builds an autoregressive generative model as a decoder to generate new molecules from Gaussian distribution. At the same time, for the molecular optimization task, a molecular optimization model based on CycleGAN was constructed. Experiments showed that the model could generate valid and novel molecules, and the optimized model effectively improves the molecular properties.
Collapse
Affiliation(s)
- Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
| | - Mingjian Jiang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China
| | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao 266071, China
| |
Collapse
|
28
|
Yan XY, Yin PW, Wu XM, Han JX. Prediction of the Drug-Drug Interaction Types with the Unified Embedding Features from Drug Similarity Networks. Front Pharmacol 2022; 12:794205. [PMID: 34987405 PMCID: PMC8721167 DOI: 10.3389/fphar.2021.794205] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 11/04/2021] [Indexed: 12/12/2022] Open
Abstract
Drug combination therapies are a promising strategy to overcome drug resistance and improve the efficacy of monotherapy in cancer, and it has been shown to lead to a decrease in dose-related toxicities. Except the synergistic reaction between drugs, some antagonistic drug-drug interactions (DDIs) exist, which is the main cause of adverse drug events. Precisely predicting the type of DDI is important for both drug development and more effective drug combination therapy applications. Recently, numerous text mining- and machine learning-based methods have been developed for predicting DDIs. All these methods implicitly utilize the feature of drugs from diverse drug-related properties. However, how to integrate these features more efficiently and improve the accuracy of classification is still a challenge. In this paper, we proposed a novel method (called NMDADNN) to predict the DDI types by integrating five drug-related heterogeneous information sources to extract the unified drug mapping features. NMDADNN first constructs the similarity networks by using the Jaccard coefficient and then implements random walk with restart algorithm and positive pointwise mutual information for extracting the topological similarities. After that, five network-based similarities are unified by using a multimodel deep autoencoder. Finally, NMDADNN implements the deep neural network (DNN) on the unified drug feature to infer the types of DDIs. In comparison with other recent state-of-the-art DNN-based methods, NMDADNN achieves the best results in terms of accuracy, area under the precision-recall curve, area under the ROC curve, F1 score, precision and recall. In addition, many of the promising types of drug-drug pairs predicted by NMDADNN are also confirmed by using the interactions checker tool. These results demonstrate the effectiveness of our NMDADNN method, indicating that NMDADNN has the great potential for predicting DDI types.
Collapse
Affiliation(s)
- Xiao-Ying Yan
- College of Computer Science, Xi'an Shiyou University, Xi'an, China
| | - Peng-Wei Yin
- College of Computer Science, Xi'an Shiyou University, Xi'an, China
| | - Xiao-Meng Wu
- School of Electronic Engineering, Xi'an Shiyou University, Xi'an, China
| | - Jia-Xin Han
- College of Computer Science, Xi'an Shiyou University, Xi'an, China
| |
Collapse
|
29
|
Zhang Z, Gong Y, Gao B, Li H, Gao W, Zhao Y, Dong B. SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles. Front Genet 2022; 12:809001. [PMID: 34987554 PMCID: PMC8721734 DOI: 10.3389/fgene.2021.809001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/15/2021] [Indexed: 12/20/2022] Open
Abstract
Soluble N-ethylmaleimide sensitive factor activating protein receptor (SNARE) proteins are a large family of transmembrane proteins located in organelles and vesicles. The important roles of SNARE proteins include initiating the vesicle fusion process and activating and fusing proteins as they undergo exocytosis activity, and SNARE proteins are also vital for the transport regulation of membrane proteins and non-regulatory vesicles. Therefore, there is great significance in establishing a method to efficiently identify SNARE proteins. However, the identification accuracy of the existing methods such as SNARE CNN is not satisfied. In our study, we developed a method based on a support vector machine (SVM) that can effectively recognize SNARE proteins. We used the position-specific scoring matrix (PSSM) method to extract features of SNARE protein sequences, used the support vector machine recursive elimination correlation bias reduction (SVM-RFE-CBR) algorithm to rank the importance of features, and then screened out the optimal subset of feature data based on the sorted results. We input the feature data into the model when building the model, used 10-fold crossing validation for training, and tested model performance by using an independent dataset. In independent tests, the ability of our method to identify SNARE proteins achieved a sensitivity of 68%, specificity of 94%, accuracy of 92%, area under the curve (AUC) of 84%, and Matthew’s correlation coefficient (MCC) of 0.48. The results of the experiment show that the common evaluation indicators of our method are excellent, indicating that our method performs better than other existing classification methods in identifying SNARE proteins.
Collapse
Affiliation(s)
- Zixiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yue Gong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Benzhi Dong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
30
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
31
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
32
|
Zhao D, Teng Z, Li Y, Chen D. iAIPs: Identifying Anti-Inflammatory Peptides Using Random Forest. Front Genet 2021; 12:773202. [PMID: 34917130 PMCID: PMC8669811 DOI: 10.3389/fgene.2021.773202] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 10/08/2021] [Indexed: 12/25/2022] Open
Abstract
Recently, several anti-inflammatory peptides (AIPs) have been found in the process of the inflammatory response, and these peptides have been used to treat some inflammatory and autoimmune diseases. Therefore, identifying AIPs accurately from a given amino acid sequences is critical for the discovery of novel and efficient anti-inflammatory peptide-based therapeutics and the acceleration of their application in therapy. In this paper, a random forest-based model called iAIPs for identifying AIPs is proposed. First, the original samples were encoded with three feature extraction methods, including g-gap dipeptide composition (GDC), dipeptide deviation from the expected mean (DDE), and amino acid composition (AAC). Second, the optimal feature subset is generated by a two-step feature selection method, in which the feature is ranked by the analysis of variance (ANOVA) method, and the optimal feature subset is generated by the incremental feature selection strategy. Finally, the optimal feature subset is inputted into the random forest classifier, and the identification model is constructed. Experiment results showed that iAIPs achieved an AUC value of 0.822 on an independent test dataset, which indicated that our proposed model has better performance than the existing methods. Furthermore, the extraction of features for peptide sequences provides the basis for evolutionary analysis. The study of peptide identification is helpful to understand the diversity of species and analyze the evolutionary history of species.
Collapse
Affiliation(s)
- Dongxu Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| |
Collapse
|
33
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
34
|
Ao C, Zou Q, Yu L. NmRF: identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences. Brief Bioinform 2021; 23:6446272. [PMID: 34850821 DOI: 10.1093/bib/bbab480] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/05/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
2'-O-methylation (Nm) is a post-transcriptional modification of RNA that is catalyzed by 2'-O-methyltransferase and involves replacing the H on the 2'-hydroxyl group with a methyl group. The 2'-O-methylation modification site is detected in a variety of RNA types (miRNA, tRNA, mRNA, etc.), plays an important role in biological processes and is associated with different diseases. There are few functional mechanisms developed at present, and traditional high-throughput experiments are time-consuming and expensive to explore functional mechanisms. For a deeper understanding of relevant biological mechanisms, it is necessary to develop efficient and accurate recognition tools based on machine learning. Based on this, we constructed a predictor called NmRF based on optimal mixed features and random forest classifier to identify 2'-O-methylation modification sites. The predictor can identify modification sites of multiple species at the same time. To obtain a better prediction model, a two-step strategy is adopted; that is, the optimal hybrid feature set is obtained by combining the light gradient boosting algorithm and incremental feature selection strategy. In 10-fold cross-validation, the accuracies of Homo sapiens and Saccharomyces cerevisiae were 89.069 and 93.885%, and the AUC were 0.9498 and 0.9832, respectively. The rigorous 10-fold cross-validation and independent tests confirm that the proposed method is significantly better than existing tools. A user-friendly web server is accessible at http://lab.malab.cn/∼acy/NmRF.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
35
|
Chen J, Zhang Q, Liu T, Tang H. Roles of M6A Regulators in Hepatocellular Carcinoma: Promotion or Suppression. Curr Gene Ther 2021; 22:40-50. [PMID: 34825870 DOI: 10.2174/1566523221666211126105940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/15/2021] [Accepted: 10/14/2021] [Indexed: 11/22/2022]
Abstract
Hepatocellular carcinoma (HCC) is the sixth globally diagnosed cancer with a poor prognosis. Although the pathological factors of hepatocellular carcinoma are well elucidated, the underlying molecular mechanisms remain unclear. N6-methyladenosine (m6A) is an adenosine methylation occurring at the N6 site, which is the most prevalent modification of eukaryotic mRNA. Recent studies have shown that m6A can regulate gene expression, thus modulating the processes of cell self-renewal, differentiation, and apoptosis. The methyls in m6A are installed by methyltransferases ("writers"), removed by demethylases ("erasers") and recognized by m6A-binding proteins ("readers"). In this review, we discuss the roles of above regulators in the progression and prognosis of HCC, and summarize the clinical association between m6A modification and hepatocellular carcinoma, so as to provide more valuable information for clinical treatment.
Collapse
Affiliation(s)
- Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| |
Collapse
|
36
|
Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021; 19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
37
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
38
|
Zulfiqar H, Sun ZJ, Huang QL, Yuan SS, Lv H, Dao FY, Lin H, Li YW. Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods 2021; 203:558-563. [PMID: 34352373 DOI: 10.1016/j.ymeth.2021.07.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/22/2021] [Accepted: 07/29/2021] [Indexed: 10/20/2022] Open
Abstract
N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Yan-Wen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
39
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
40
|
Zhu W, Guo Y, Zou Q. Prediction of presynaptic and postsynaptic neurotoxins based on feature extraction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:5943-5958. [PMID: 34517517 DOI: 10.3934/mbe.2021297] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A neurotoxin is essentially a protein that mainly acts on the nervous system; it has a selective toxic effect on the central nervous system and neuromuscular nodes, can cause muscle paralysis and respiratory paralysis, and has strong lethality. According to their principle of action, neurotoxins are divided into presynaptic neurotoxins and postsynaptic neurotoxins. Correctly identifying presynaptic and postsynaptic nerve toxins provides important clues for future drug development and the discovery of drug targets. Therefore, a predictive model, Neu_LR, was constructed in this paper. The monoMonokGap method was used to extract the frequency characteristics of presynaptic and postsynaptic neurotoxin sequences and carry out feature selection, then, based on the important features obtained after dimensionality reduction, the prediction model Neu_LR was constructed using a logistic regression algorithm, and ten-fold cross-validation and independent test set validation were used. The final accuracy rates were 99.6078 and 94.1176%, respectively, which proved that the Neu_LR model had good predictive performance and robustness, and could meet the prediction requirements of presynaptic and postsynaptic neurotoxins. The data and source code of the model can be freely download from https://github.com/gyx123681/.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Yuxin Guo
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
41
|
CWLy-RF: A novel approach for identifying cell wall lyases based on random forest classifier. Genomics 2021; 113:2919-2924. [PMID: 34186189 DOI: 10.1016/j.ygeno.2021.06.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/20/2021] [Accepted: 06/25/2021] [Indexed: 02/05/2023]
Abstract
Drug resistance of pathogenic bacteria has become increasingly serious due to the abuse of antibiotics in recent years. Researchers have found that cell wall lyases are effective antibacterial agents that can specifically recognize target bacteria and degrade bacterial peptidoglycan. Traditional wet experiments are usually expensive, time-consuming and laborious for the identification of lyases. Therefore, there is an urgent need to develop prediction tools based on computer methods to identify lyases quickly and accurately. In this paper, a new predictor, CWLy-RF, is proposed based on the random forest (RF) algorithm to identify cell wall lyases. In this method, we combined three features, namely, 400D, 188D and the composition of k-spaced amino acid group pairs, using mixed-feature representation methods. Afterward, we improved the feature representation ability with the selected top 100 features by using the information gain method and trained a predictive model using RF. The constructed prediction model is evaluated by using 10-fold cross-validation. The accuracy obtained was 96.09%, the AUC was 0.993, the MCC was 0.922, the sensitivity was 94.92%, and the specificity was 97.32%. We have proved that the proposed predictor CWLy-RF is superior to other latest models, and it will hopefully become an effective and useful tool for identifying lyases.
Collapse
|