1
|
He SH, Yun L, Yi HC. Accurate prediction of drug combination risk levels based on relational graph convolutional network and multi-head attention. J Transl Med 2024; 22:572. [PMID: 38880914 PMCID: PMC11180398 DOI: 10.1186/s12967-024-05372-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 06/02/2024] [Indexed: 06/18/2024] Open
Abstract
BACKGROUND Accurately identifying the risk level of drug combinations is of great significance in investigating the mechanisms of combination medication and adverse reactions. Most existing methods can only predict whether there is an interaction between two drugs, but cannot directly determine their accurate risk level. METHODS In this study, we propose a multi-class drug combination risk prediction model named AERGCN-DDI, utilizing a relational graph convolutional network with a multi-head attention mechanism. Drug-drug interaction events with varying risk levels are modeled as a heterogeneous information graph. Attribute features of drug nodes and links are learned based on compound chemical structure information. Finally, the AERGCN-DDI model is proposed to predict drug combination risk level based on heterogenous graph neural network and multi-head attention modules. RESULTS To evaluate the effectiveness of the proposed method, five-fold cross-validation and ablation study were conducted. Furthermore, we compared its predictive performance with baseline models and other state-of-the-art methods on two benchmark datasets. Empirical studies demonstrated the superior performances of AERGCN-DDI. CONCLUSIONS AERGCN-DDI emerges as a valuable tool for predicting the risk levels of drug combinations, thereby aiding in clinical medication decision-making, mitigating severe drug side effects, and enhancing patient clinical prognosis.
Collapse
Affiliation(s)
- Shi-Hui He
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education, Kunming, 650500, China
| | - Lijun Yun
- School of Information Science and Technology, Yunnan Normal University, Kunming, 650500, China.
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education, Kunming, 650500, China.
| | - Hai-Cheng Yi
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China.
| |
Collapse
|
2
|
Wang S, Liu T, Ren C, Zhao Y, Qiao S, Zhang Y, Pang S. Heterogeneous graph inference with range constrainted L 2,1-collaborative matrix factorization for small molecule-miRNA association prediction. Comput Biol Chem 2024; 110:108078. [PMID: 38677013 DOI: 10.1016/j.compbiolchem.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/03/2024] [Accepted: 04/16/2024] [Indexed: 04/29/2024]
Abstract
MicroRNAs (miRNAs) play a vital role in regulating gene expression and various biological processes. As a result, they have been identified as effective targets for small molecule (SM) drugs in disease treatment. Heterogeneous graph inference stands as a classical approach for predicting SM-miRNA associations, showcasing commendable convergence accuracy and speed. However, most existing methods do not adequately address the inherent sparsity in SM-miRNA association networks, and imprecise SM/miRNA similarity metrics reduce the accuracy of predicting SM-miRNA associations. In this research, we proposed a heterogeneous graph inference with range constrained L2,1-collaborative matrix factorization (HGIRCLMF) method to predict potential SM-miRNA associations. First, we computed the multi-source similarities of SM/miRNA and integrated these similarity information into a comprehensive SM/miRNA similarity. This step improved the accuracy of SM and miRNA similarity, ensuring reliability for the subsequent inference of the heterogeneity map. Second, we used a range constrained L2,1-collaborative matrix factorization (RCLMF) model to pre-populate the SM-miRNA association matrix with missing values. In this step, we developed a novel matrix decomposition method that enhances the robustness and formative nature of SM-miRNA edges between SM networks and miRNA networks. Next, we built a well-established SM-miRNA heterogeneous network utilizing the processed biological information. Finally, HGIRCLMF used this network data to infer unknown association pair scores. We implemented four cross-validation experiments on two distinct datasets, and HGIRCLMF acquired the highest areas under the curve, surpassing six state-of-the-art computational approaches. Furthermore, we performed three case studies to validate the predictive power of our method in practical application.
Collapse
Affiliation(s)
- Shudong Wang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Tiyao Liu
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Chuanru Ren
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yawu Zhao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Sibo Qiao
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266525, China.
| | - Shanchen Pang
- College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
3
|
Liu W, Zhang J, Qiao G, Bian J, Dong B, Li Y. HMMF: a hybrid multi-modal fusion framework for predicting drug side effect frequencies. BMC Bioinformatics 2024; 25:196. [PMID: 38769492 DOI: 10.1186/s12859-024-05806-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 05/08/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND The identification of drug side effects plays a critical role in drug repositioning and drug screening. While clinical experiments yield accurate and reliable information about drug-related side effects, they are costly and time-consuming. Computational models have emerged as a promising alternative to predict the frequency of drug-side effects. However, earlier research has primarily centered on extracting and utilizing representations of drugs, like molecular structure or interaction graphs, often neglecting the inherent biomedical semantics of drugs and side effects. RESULTS To address the previously mentioned issue, we introduce a hybrid multi-modal fusion framework (HMMF) for predicting drug side effect frequencies. Considering the wealth of biological and chemical semantic information related to drugs and side effects, incorporating multi-modal information offers additional, complementary semantics. HMMF utilizes various encoders to understand molecular structures, biomedical textual representations, and attribute similarities of both drugs and side effects. It then models drug-side effect interactions using both coarse and fine-grained fusion strategies, effectively integrating these multi-modal features. CONCLUSIONS HMMF exhibits the ability to successfully detect previously unrecognized potential side effects, demonstrating superior performance over existing state-of-the-art methods across various evaluation metrics, including root mean squared error and area under receiver operating characteristic curve, and shows remarkable performance in cold-start scenarios.
Collapse
Affiliation(s)
- Wuyong Liu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Jingyu Zhang
- Department of Neurology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
| | - Guanyu Qiao
- Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Jilong Bian
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Benzhi Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China
| | - Yang Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150006, China.
| |
Collapse
|
4
|
Zhang Y, Deng Z, Xu X, Feng Y, Junliang S. Application of Artificial Intelligence in Drug-Drug Interactions Prediction: A Review. J Chem Inf Model 2024; 64:2158-2173. [PMID: 37458400 DOI: 10.1021/acs.jcim.3c00582] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Drug-drug interactions (DDI) are a critical aspect of drug research that can have adverse effects on patients and can lead to serious consequences. Predicting these events accurately can significantly improve clinicians' ability to make better decisions and establish optimal treatment regimens. However, manually detecting these interactions is time-consuming and labor-intensive. Utilizing the advancements in Artificial Intelligence (AI) is essential for achieving accurate forecasts of DDIs. In this review, DDI prediction tasks are classified into three types according to the type of DDI prediction: undirected DDI prediction, DDI events prediction, and Asymmetric DDI prediction. The paper then reviews the progress of AI for each of these three prediction tasks in DDI and provides a summary of the data sets used as well as the representative methods used in these three prediction directions. In this review, we aim to provide a comprehensive overview of drug interaction prediction. The first section introduces commonly used databases and presents an overview of current research advancements and techniques across three domains of DDI. Additionally, we introduce classical machine learning techniques for predicting undirected drug interactions and provide a timeline for the progression of the predicted drug interaction events. At last, we debate the difficulties and prospects of AI approaches at predicting DDI, emphasizing their potential for improving clinical decision-making and patient outcomes.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Zengqian Deng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Xiaoyu Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Yinfei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao,266000,China
| | - Shang Junliang
- School of Information Science and Engineering, Qufu Normal University, Rizhao, 276800, China
| |
Collapse
|
5
|
Luo H, Yin W, Wang J, Zhang G, Liang W, Luo J, Yan C. Drug-drug interactions prediction based on deep learning and knowledge graph: A review. iScience 2024; 27:109148. [PMID: 38405609 PMCID: PMC10884936 DOI: 10.1016/j.isci.2024.109148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024] Open
Abstract
Drug-drug interactions (DDIs) can produce unpredictable pharmacological effects and lead to adverse events that have the potential to cause irreversible damage to the organism. Traditional methods to detect DDIs through biological or pharmacological analysis are time-consuming and expensive, therefore, there is an urgent need to develop computational methods to effectively predict drug-drug interactions. Currently, deep learning and knowledge graph techniques which can effectively extract features of entities have been widely utilized to develop DDI prediction methods. In this research, we aim to systematically review DDI prediction researches applying deep learning and graph knowledge. The available biomedical data and public databases related to drugs are firstly summarized in this review. Then, we discuss the existing drug-drug interactions prediction methods which have utilized deep learning and knowledge graph techniques and group them into three main classes: deep learning-based methods, knowledge graph-based methods, and methods that combine deep learning with knowledge graph. We comprehensively analyze the commonly used drug related data and various DDI prediction methods, and compare these prediction methods on benchmark datasets. Finally, we briefly discuss the challenges related to drug-drug interactions prediction, including asymmetric DDIs prediction and high-order DDI prediction.
Collapse
Affiliation(s)
- Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Weijie Yin
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Academy for Advanced Interdisciplinary Studies, Zhengzhou, China
| | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, China
| | - Wenjuan Liang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
- Academy for Advanced Interdisciplinary Studies, Zhengzhou, China
| |
Collapse
|
6
|
Sharma R, Saghapour E, Chen JY. An NLP-based technique to extract meaningful features from drug SMILES. iScience 2024; 27:109127. [PMID: 38455979 PMCID: PMC10918220 DOI: 10.1016/j.isci.2024.109127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 09/30/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024] Open
Abstract
NLP is a well-established field in ML for developing language models that capture the sequence of words in a sentence. Similarly, drug molecule structures can also be represented as sequences using the SMILES notation. However, unlike natural language texts, special characters in drug SMILES have specific meanings and cannot be ignored. We introduce a novel NLP-based method that extracts interpretable sequences and essential features from drug SMILES notation using N-grams. Our method compares these features to Morgan fingerprint bit-vectors using UMAP-based embedding, and we validate its effectiveness through two personalized drug screening (PSD) case studies. Our NLP-based features are sparse and, when combined with gene expressions and disease phenotype features, produce better ML models for PSD. This approach provides a new way to analyze drug molecule structures represented as SMILES notation, which can help accelerate drug discovery efforts. We have also made our method accessible through a Python library.
Collapse
Affiliation(s)
- Rahul Sharma
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ehsan Saghapour
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
7
|
Yan X, Gu C, Feng Y, Han J. Predicting Drug-drug Interaction with Graph Mutual Interaction Attention Mechanism. Methods 2024; 223:16-25. [PMID: 38262485 DOI: 10.1016/j.ymeth.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/04/2024] [Accepted: 01/19/2024] [Indexed: 01/25/2024] Open
Abstract
Effective representation of molecules is a crucial step in AI-driven drug design and drug discovery, especially for drug-drug interaction (DDIs) prediction. Previous work usually models the drug information from the drug-related knowledge graph or the single drug molecules, but the interaction information between molecular substructures of drug pair is seldom considered, thus often ignoring the influence of bond information on atom node representation, leading to insufficient drug representation. Moreover, key molecular substructures have significant contribution to the DDIs prediction results. Therefore, in this work, we propose a novel Graph learning framework of Mutual Interaction Attention mechanism (called GMIA) to predict DDIs by effectively representing the drug molecules. Specifically, we build the node-edge message communication encoder to aggregate atom node and the incoming edge information for atom node representation and design the mutual interaction attention decoder to capture the mutual interaction context between molecular graphs of drug pairs. GMIA can bridge the gap between two encoders for the single drug molecules by attention mechanism. We also design a co-attention matrix to analyze the significance of different-size substructures obtained from the encoder-decoder layer and provide interpretability. In comparison with other recent state-of-the-art methods, our GMIA achieves the best results in terms of area under the precision-recall-curve (AUPR), area under the ROC curve (AUC), and F1 score on two different scale datasets. The case study indicates that our GMIA can detect the key substructure for potential DDIs, demonstrating the enhanced performance and interpretation ability of GMIA.
Collapse
Affiliation(s)
- Xiaoying Yan
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China.
| | - Chi Gu
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| | - Yuehua Feng
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| | - Jiaxin Han
- College of Computer Science, Xi'an Shiyou University, Xi'an 710065, China
| |
Collapse
|
8
|
Hao Y, Chen X, Fei A, Jia Q, Chen Y, Shao J, Pandiyan S, Wang L. SG-ATT: A Sequence Graph Cross-Attention Representation Architecture for Molecular Property Prediction. Molecules 2024; 29:492. [PMID: 38276570 PMCID: PMC10819071 DOI: 10.3390/molecules29020492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/06/2024] [Accepted: 01/14/2024] [Indexed: 01/27/2024] Open
Abstract
Existing formats based on the simplified molecular input line entry system (SMILES) encoding and molecular graph structure are designed to encode the complete semantic and structural information of molecules. However, the physicochemical properties of molecules are complex, and a single encoding of molecular features from SMILES sequences or molecular graph structures cannot adequately represent molecular information. Aiming to address this problem, this study proposes a sequence graph cross-attention (SG-ATT) representation architecture for a molecular property prediction model to efficiently use domain knowledge to enhance molecular graph feature encoding and combine the features of molecular SMILES sequences. The SG-ATT fuses the two-dimensional molecular features so that the current model input molecular information contains molecular structure information and semantic information. The SG-ATT was tested on nine molecular property prediction tasks. Among them, the biggest SG-ATT model performance improvement was 4.5% on the BACE dataset, and the average model performance improvement was 1.83% on the full dataset. Additionally, specific model interpretability studies were conducted to showcase the performance of the SG-ATT model on different datasets. In-depth analysis was provided through case studies of in vitro validation. Finally, network tools for molecular property prediction were developed for the use of researchers.
Collapse
Affiliation(s)
- Yajie Hao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Xing Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Ailu Fei
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Qifeng Jia
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Yu Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Jinsong Shao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Sanjeevi Pandiyan
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Li Wang
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| |
Collapse
|
9
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
10
|
Li Z, Tu X, Chen Y, Lin W. HetDDI: a pre-trained heterogeneous graph neural network model for drug-drug interaction prediction. Brief Bioinform 2023; 24:bbad385. [PMID: 37903412 DOI: 10.1093/bib/bbad385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/12/2023] [Accepted: 09/13/2023] [Indexed: 11/01/2023] Open
Abstract
The simultaneous use of two or more drugs due to multi-disease comorbidity continues to increase, which may cause adverse reactions between drugs that seriously threaten public health. Therefore, the prediction of drug-drug interaction (DDI) has become a hot topic not only in clinics but also in bioinformatics. In this study, we propose a novel pre-trained heterogeneous graph neural network (HGNN) model named HetDDI, which aggregates the structural information in drug molecule graphs and rich semantic information in biomedical knowledge graph to predict DDIs. In HetDDI, we first initialize the parameters of the model with different pre-training methods. Then we apply the pre-trained HGNN to learn the feature representation of drugs from multi-source heterogeneous information, which can more effectively utilize drugs' internal structure and abundant external biomedical knowledge, thus leading to better DDI prediction. We evaluate our model on three DDI prediction tasks (binary-class, multi-class and multi-label) with three datasets and further assess its performance on three scenarios (S1, S2 and S3). The results show that the accuracy of HetDDI can achieve 98.82% in the binary-class task, 98.13% in the multi-class task and 96.66% in the multi-label one on S1, which outperforms the state-of-the-art methods by at least 2%. On S2 and S3, our method also achieves exciting performance. Furthermore, the case studies confirm that our model performs well in predicting unknown DDIs. Source codes are available at https://github.com/LinsLab/HetDDI.
Collapse
Affiliation(s)
- Zhe Li
- School of Computer Science, University of South China, Hengyang, 421001 Hunan, China
| | - Xinyi Tu
- School of Computer Science, University of South China, Hengyang, 421001 Hunan, China
| | - Yuping Chen
- School of Pharmacy, University of South China, Hengyang 421001, China
| | - Wenbin Lin
- School of Mathematics and Physics, University of South China, Hengyang 421001, China
| |
Collapse
|
11
|
Pan L, Xiao X, Liu S, Peng S. An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions. J Comput Biol 2023; 30:1034-1045. [PMID: 37707993 DOI: 10.1089/cmb.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.
Collapse
Affiliation(s)
- Liang Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xia Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China
| |
Collapse
|
12
|
Lin X, Dai L, Zhou Y, Yu ZG, Zhang W, Shi JY, Cao DS, Zeng L, Chen H, Song B, Yu PS, Zeng X. Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction. Brief Bioinform 2023:bbad235. [PMID: 37401373 DOI: 10.1093/bib/bbad235] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 07/05/2023] Open
Abstract
Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.
Collapse
Affiliation(s)
- Xuan Lin
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Lichang Dai
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Yafang Zhou
- College of Computer Science, Xiangtan University, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, China
| | - Jian-Yu Shi
- Northwestern Polytechnical University, Xian, China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, China
| | - Li Zeng
- AIDD department of Yuyao Biotech, Shanghai, China
| | - Haowen Chen
- College of Computer Science and Electronic Engineering, Hunan University, 410013 Changsha, P. R. China
| | - Bosheng Song
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Philip S Yu
- University of Illinois at Chicago and also holds the Wexler Chair in Information Technology
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
13
|
Kang H, Hou L, Gu Y, Lu X, Li J, Li Q. Drug-disease association prediction with literature based multi-feature fusion. Front Pharmacol 2023; 14:1205144. [PMID: 37284317 PMCID: PMC10239876 DOI: 10.3389/fphar.2023.1205144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/09/2023] [Indexed: 06/08/2023] Open
Abstract
Introduction: Exploring the potential efficacy of a drug is a valid approach for drug development with shorter development times and lower costs. Recently, several computational drug repositioning methods have been introduced to learn multi-features for potential association prediction. However, fully leveraging the vast amount of information in the scientific literature to enhance drug-disease association prediction is a great challenge. Methods: We constructed a drug-disease association prediction method called Literature Based Multi-Feature Fusion (LBMFF), which effectively integrated known drugs, diseases, side effects and target associations from public databases as well as literature semantic features. Specifically, a pre-training and fine-tuning BERT model was introduced to extract literature semantic information for similarity assessment. Then, we revealed drug and disease embeddings from the constructed fusion similarity matrix by a graph convolutional network with an attention mechanism. Results: LBMFF achieved superior performance in drug-disease association prediction with an AUC value of 0.8818 and an AUPR value of 0.5916. Discussion: LBMFF achieved relative improvements of 31.67% and 16.09%, respectively, over the second-best results, compared to single feature methods and seven existing state-of-the-art prediction methods on the same test datasets. Meanwhile, case studies have verified that LBMFF can discover new associations to accelerate drug development. The proposed benchmark dataset and source code are available at: https://github.com/kang-hongyu/LBMFF.
Collapse
Affiliation(s)
- Hongyu Kang
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Li Hou
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yaowen Gu
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao Lu
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qin Li
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
14
|
Zhao W, Yuan X, Shen X, Jiang X, Shi C, He T, Hu X. Improving drug-drug interactions prediction with interpretability via meta-path-based information fusion. Brief Bioinform 2023; 24:7030845. [PMID: 36750041 DOI: 10.1093/bib/bbad041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 01/01/2023] [Accepted: 01/18/2023] [Indexed: 02/09/2023] Open
Abstract
Drug-drug interactions (DDIs) are compound effects when patients take two or more drugs at the same time, which may weaken the efficacy of drugs or cause unexpected side effects. Thus, accurately predicting DDIs is of great significance for the drug development and the drug safety surveillance. Although many methods have been proposed for the task, the biological knowledge related to DDIs is not fully utilized and the complex semantics among drug-related biological entities are not effectively captured in existing methods, leading to suboptimal performance. Moreover, the lack of interpretability for the predicted results also limits the wide application of existing methods for DDIs prediction. In this study, we propose a novel framework for predicting DDIs with interpretability. Specifically, we construct a heterogeneous information network (HIN) by explicitly utilizing the biological knowledge related to the procedure of inducing DDIs. To capture the complex semantics in HIN, a meta-path-based information fusion mechanism is proposed to learn high-quality representations of drugs. In addition, an attention mechanism is designed to combine semantic information obtained from meta-paths with different lengths to obtain final representations of drugs for DDIs prediction. Comprehensive experiments are conducted on 2410 approved drugs, and the results of predictive performance comparison show that our proposed framework outperforms selected representative baselines on the task of DDIs prediction. The results of ablation study and cold-start scenario indicate that the meta-path-based information fusion mechanism red is beneficial for capturing the complex semantics among drug-related biological entities. Moreover, the results of case study demonstrate that the designed attention mechanism is able to provide partial interpretability for the predicted DDIs. Therefore, the proposed method will be a feasible solution to the task of predicting DDIs.
Collapse
Affiliation(s)
- Weizhong Zhao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China
- School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, PR China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Xueling Yuan
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Xianjun Shen
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Chuan Shi
- School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, PR China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, Hubei 430079, PR China
| | - Xiaohua Hu
- College of Computing & Informatics, Drexel University, Philadelphia, PA 19104, USA
| |
Collapse
|
15
|
GCNSA: DNA storage encoding with a graph convolutional network and self-attention. iScience 2023; 26:106231. [PMID: 36876131 PMCID: PMC9982308 DOI: 10.1016/j.isci.2023.106231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/31/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open
Abstract
DNA Encoding, as a key step in DNA storage, plays an important role in reading and writing accuracy and the storage error rate. However, currently, the encoding efficiency is not high enough and the encoding speed is not fast enough, which limits the performance of DNA storage systems. In this work, a DNA storage encoding system with a graph convolutional network and self-attention (GCNSA) is proposed. The experimental results show that DNA storage code constructed by GCNSA increases by 14.4% on average under the basic constraints, and by 5%-40% under other constraints. The increase of DNA storage codes effectively improves the storage density of 0.7-2.2% in the DNA storage system. The GCNSA predicted more DNA storage codes in less time while ensuring the quality of codes, which lays a foundation for higher read and write efficiency in DNA storage.
Collapse
|
16
|
Song T, Ren Y, Wang S, Han P, Wang L, Li X, Rodriguez-Patón A. DNMG: Deep molecular generative model by fusion of 3D information for de novo drug design. Methods 2023; 211:10-22. [PMID: 36764588 DOI: 10.1016/j.ymeth.2023.02.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 01/18/2023] [Accepted: 02/01/2023] [Indexed: 02/11/2023] Open
Abstract
Deep learning is improving and changing the process of de novo molecular design at a rapid pace. In recent years, great progress has been made in drug discovery and development by using deep generative models for de novo molecular design. However, most of the existing methods are string-based or graph-based and are limited by the lack of some very important properties, such as the three-dimensional information of molecules. We propose DNMG, a deep generative adversarial network (GAN) combined with transfer learning. Specifically, we use a Wasserstein-variant GAN based network architecture that considers the 3D grid spatial information of the ligand with atomic physicochemical properties to generate a representation of the molecule, which is then parsed into SMILES strings using an improved captioning network. Comprehensive in experiments demonstrate the ability of DNMG to generate valid and novel drug-like ligands. The DNMG model is used to design inhibitors for three targets, MK14, FNTA, and CDK2. The computational results show that the molecules generated by DNMG have better binding ability to the target proteins and better physicochemical properties. Overall, our deep generative model has excellent potential to generate molecules with high binding affinity for targets and explore the space of drug-like chemistry.
Collapse
Affiliation(s)
- Tao Song
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China; Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Campus de Montegancedo, Boadilla del Monte 28660, Madrid, Spain.
| | - Yongqi Ren
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China.
| | - Peifu Han
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Lulu Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Alfonso Rodriguez-Patón
- Department of Artificial Intelligence, Faculty of Computer Science, Polytechnical University of Madrid, Campus de Montegancedo, Boadilla del Monte 28660, Madrid, Spain
| |
Collapse
|
17
|
Liu L, Wang X, Guan M, Fan Y, Yang Z, Li D, Bai Y, Li H. A mixed reality-based navigation method for dental implant navigation method: A pilot study. Comput Biol Med 2023; 154:106568. [PMID: 36739818 DOI: 10.1016/j.compbiomed.2023.106568] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 12/28/2022] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
This in vitro study aimed to put forward the development and investigation of a novel Mixed Reality (MR)-based dental implant navigation method and evaluate implant accuracy. Data were collected using 3D-cone beam computed tomography. The MR-based navigation system included a Hololens headset, an NDI (Northern Digital Inc.) Polaris optical tracking system, and a computer. A software system was developed. Resin models of dentition defects were created for a randomized comparison study with the MR-based navigation implantation system (MR group, n = 25) and the conventional free-hand approach (FH group, n = 25). Implant surgery on the models was completed by an oral surgeon. The precision and feasibility of the MR-based navigation method in dental implant surgery were assessed and evaluated by calculating the entry deviation, middle deviation, apex deviation, and angular deviation values of the implant. The system, including both the hardware and software, for the MR-based dental implant navigation method were successfully developed and a workflow of the method was established. Three-Dimensional (3D) reconstruction and visualization of the surgical instruments, dentition, and jawbone were achieved. Real-time tracking of implant tools and jaw model, holographic display via the MR headset, surgical guidance, and visualization of the intraoperative implant trajectory deviation from the planned trajectory were captured by our system. The MR-based navigation system was with better precise than the free-hand approach for entry deviation (MR: 0.6914 ± 0.2507 mm, FH: 1.571 ± 0.5004 mm, P = 0.000), middle deviation (MR: 0.7156 ± 0.2127 mm, FH: 1.170 ± 0.3448 mm, P = 0.000), apex deviation (MR: 0.7869 ± 0.2298 mm, FH: 0.9190 ± 0.3319 mm, P = 0.1082), and angular deviation (MR: 1.849 ± 0.6120°, FH: 4.933 ± 1.650°, P = 0.000).
Collapse
Affiliation(s)
- Lin Liu
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China
| | - Xiaoyu Wang
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China; Department of Stomatology, PLA Strategic Support Force Special Medical Center, Beijing, 100101, China
| | - Miaosheng Guan
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China; PLA Rocket Force Characteristic Medical Center, Beijing, 100088, China
| | - Yiping Fan
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China
| | - Zhongliang Yang
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China
| | - Deyu Li
- Beijing Visual 3D Medical Science and Technology Development Co., LTD., Beijing, 100000, China.
| | - Yuming Bai
- Beijing Visual 3D Medical Science and Technology Development Co., LTD., Beijing, 100000, China
| | - Hongbo Li
- Department of Stomatology, The First Medical Center of PLA General Hospital, Beijing, 100853, China.
| |
Collapse
|
18
|
Alrowais F, Alotaibi SS, Hilal AM, Marzouk R, Mohsen H, Osman AE, Alneil AA, Eldesouki MI. Clinical Decision Support Systems to Predict Drug-Drug Interaction Using Multilabel Long Short-Term Memory with an Autoencoder. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:2696. [PMID: 36768060 PMCID: PMC9916256 DOI: 10.3390/ijerph20032696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
Big Data analytics is a technique for researching huge and varied datasets and it is designed to uncover hidden patterns, trends, and correlations, and therefore, it can be applied for making superior decisions in healthcare. Drug-drug interactions (DDIs) are a main concern in drug discovery. The main role of precise forecasting of DDIs is to increase safety potential, particularly, in drug research when multiple drugs are co-prescribed. Prevailing conventional method machine learning (ML) approaches mainly depend on handcraft features and lack generalization. Today, deep learning (DL) techniques that automatically study drug features from drug-related networks or molecular graphs have enhanced the capability of computing approaches for forecasting unknown DDIs. Therefore, in this study, we develop a sparrow search optimization with deep learning-based DDI prediction (SSODL-DDIP) technique for healthcare decision making in big data environments. The presented SSODL-DDIP technique identifies the relationship and properties of the drugs from various sources to make predictions. In addition, a multilabel long short-term memory with an autoencoder (MLSTM-AE) model is employed for the DDI prediction process. Moreover, a lexicon-based approach is involved in determining the severity of interactions among the DDIs. To improve the prediction outcomes of the MLSTM-AE model, the SSO algorithm is adopted in this work. To assure better performance of the SSODL-DDIP technique, a wide range of simulations are performed. The experimental results show the promising performance of the SSODL-DDIP technique over recent state-of-the-art algorithms.
Collapse
Affiliation(s)
- Fadwa Alrowais
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Saud S. Alotaibi
- Department of Information Systems, College of Computing and Information System, Umm Al-Qura University, Makkah 24211, Saudi Arabia
| | - Anwer Mustafa Hilal
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, Al Kharj 16436, Saudi Arabia
| | - Radwa Marzouk
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Heba Mohsen
- Department of Computer Science, Faculty of Computers and Information Technology, Future University in Egypt, New Cairo 11835, Egypt
| | - Azza Elneil Osman
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, Al Kharj 16436, Saudi Arabia
| | - Amani A. Alneil
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, Al Kharj 16436, Saudi Arabia
| | - Mohamed I. Eldesouki
- Department of Information System, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al Kharj 16436, Saudi Arabia
| |
Collapse
|
19
|
Lin J, Wu L, Zhu J, Liang X, Xia Y, Xie S, Qin T, Liu TY. R2-DDI: relation-aware feature refinement for drug-drug interaction prediction. Brief Bioinform 2023; 24:6961471. [PMID: 36573491 DOI: 10.1093/bib/bbac576] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 11/14/2022] [Accepted: 11/25/2022] [Indexed: 12/28/2022] Open
Abstract
Precisely predicting the drug-drug interaction (DDI) is an important application and host research topic in drug discovery, especially for avoiding the adverse effect when using drug combination treatment for patients. Nowadays, machine learning and deep learning methods have achieved great success in DDI prediction. However, we notice that most of the works ignore the importance of the relation type when building the DDI prediction models. In this work, we propose a novel R$^2$-DDI framework, which introduces a relation-aware feature refinement module for drug representation learning. The relation feature is integrated into drug representation and refined in the framework. With the refinement features, we also incorporate the consistency training method to regularize the multi-branch predictions for better generalization. Through extensive experiments and studies, we demonstrate our R$^2$-DDI approach can significantly improve the DDI prediction performance over multiple real-world datasets and settings, and our method shows better generalization ability with the help of the feature refinement design.
Collapse
Affiliation(s)
- Jiacheng Lin
- Department of Automation, Tsinghua University, 30 Shuangqing Rd, Haidian District, 100084 Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, No. 5 Dan Ling Street, Haidian District, 100080 Beijing, China
| | - Jinhua Zhu
- CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China, No. 96, JinZhai Road Baohe District, 230026 Hefei, Anhui Province, China
| | - Xiaobo Liang
- Institute of Artificial Intelligence, Soochow University, No. 178, Yucai Rd, Gusu District, 215006 Soochow, Jaingsu Province, China
| | - Yingce Xia
- Microsoft Research AI4Science, No. 5 Dan Ling Street, Haidian District, 100080 Beijing, China
| | - Shufang Xie
- Microsoft Research AI4Science, No. 5 Dan Ling Street, Haidian District, 100080 Beijing, China
| | - Tao Qin
- Microsoft Research AI4Science, No. 5 Dan Ling Street, Haidian District, 100080 Beijing, China
| | - Tie-Yan Liu
- Microsoft Research AI4Science, No. 5 Dan Ling Street, Haidian District, 100080 Beijing, China
| |
Collapse
|
20
|
Li X, Han P, Chen W, Gao C, Wang S, Song T, Niu M, Rodriguez-Patón A. MARPPI: boosting prediction of protein-protein interactions with multi-scale architecture residual network. Brief Bioinform 2023; 24:6887309. [PMID: 36502435 DOI: 10.1093/bib/bbac524] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 09/29/2022] [Accepted: 11/04/2022] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions (PPIs) are a major component of the cellular biochemical reaction network. Rich sequence information and machine learning techniques reduce the dependence of exploring PPIs on wet experiments, which are costly and time-consuming. This paper proposes a PPI prediction model, multi-scale architecture residual network for PPIs (MARPPI), based on dual-channel and multi-feature. Multi-feature leverages Res2vec to obtain the association information between residues, and utilizes pseudo amino acid composition, autocorrelation descriptors and multivariate mutual information to achieve the amino acid composition and order information, physicochemical properties and information entropy, respectively. Dual channel utilizes multi-scale architecture improved ResNet network which extracts protein sequence features to reduce protein feature loss. Compared with other advanced methods, MARPPI achieves 96.03%, 99.01% and 91.80% accuracy in the intraspecific datasets of Saccharomyces cerevisiae, Human and Helicobacter pylori, respectively. The accuracy on the two interspecific datasets of Human-Bacillus anthracis and Human-Yersinia pestis is 97.29%, and 95.30%, respectively. In addition, results on specific datasets of disease (neurodegenerative and metabolic disorders) demonstrate the ability to detect hidden interactions. To better illustrate the performance of MARPPI, evaluations on independent datasets and PPIs network suggest that MARPPI can be used to predict cross-species interactions. The above shows that MARPPI can be regarded as a concise, efficient and accurate tool for PPI datasets.
Collapse
Affiliation(s)
- Xue Li
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Peifu Han
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Wenqi Chen
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Changnan Gao
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Muyuan Niu
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Alfonso Rodriguez-Patón
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
21
|
PETrans: De Novo Drug Design with Protein-Specific Encoding Based on Transfer Learning. Int J Mol Sci 2023; 24:ijms24021146. [PMID: 36674658 PMCID: PMC9865828 DOI: 10.3390/ijms24021146] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/29/2022] [Accepted: 01/04/2023] [Indexed: 01/11/2023] Open
Abstract
Recent years have seen tremendous success in the design of novel drug molecules through deep generative models. Nevertheless, existing methods only generate drug-like molecules, which require additional structural optimization to be developed into actual drugs. In this study, a deep learning method for generating target-specific ligands was proposed. This method is useful when the dataset for target-specific ligands is limited. Deep learning methods can extract and learn features (representations) in a data-driven way with little or no human participation. Generative pretraining (GPT) was used to extract the contextual features of the molecule. Three different protein-encoding methods were used to extract the physicochemical properties and amino acid information of the target protein. Protein-encoding and molecular sequence information are combined to guide molecule generation. Transfer learning was used to fine-tune the pretrained model to generate molecules with better binding ability to the target protein. The model was validated using three different targets. The docking results show that our model is capable of generating new molecules with higher docking scores for the target proteins.
Collapse
|
22
|
Hong E, Jeon J, Kim HU. Recent development of machine learning models for the prediction of drug-drug interactions. KOREAN J CHEM ENG 2023; 40:276-285. [PMID: 36748027 PMCID: PMC9894510 DOI: 10.1007/s11814-023-1377-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 12/09/2022] [Accepted: 12/16/2022] [Indexed: 02/05/2023]
Abstract
Polypharmacy, the co-administration of multiple drugs, has become an area of concern as the elderly population grows and an unexpected infection, such as COVID-19 pandemic, keeps emerging. However, it is very costly and time-consuming to experimentally examine the pharmacological effects of polypharmacy. To address this challenge, machine learning models that predict drug-drug interactions (DDIs) have actively been developed in recent years. In particular, the growing volume of drug datasets and the advances in machine learning have facilitated the model development. In this regard, this review discusses the DDI-predicting machine learning models that have been developed since 2018. Our discussion focuses on dataset sources used to develop the models, featurization approaches of molecular structures and biological information, and types of DDI prediction outcomes from the models. Finally, we make suggestions for research opportunities in this field.
Collapse
Affiliation(s)
- Eujin Hong
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea ,BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141 Korea
| |
Collapse
|
23
|
A Complex Heterogeneous Network Model of Disease Regulated by Noncoding RNAs: A Case Study of Unstable Angina Pectoris. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5852089. [PMID: 36590836 PMCID: PMC9803582 DOI: 10.1155/2022/5852089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/27/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022]
Abstract
MicroRNAs (miRNAs) are important types of noncoding RNAs, and there is a lack of holistic and systematic understanding of the functions they play in disease. We proposed a research strategy, including two parts network analysis and network modelling, to analyze, model, and predict the regulatory network of miRNAs from a network perspective, using unstable angina pectoris as an example. In the network analysis section, we proposed the WGCNA & SimCluster method using both correlation and similarity to find hub miRNAs, and validation on two datasets showed better results than the methods using correlation or similarity alone. In the network modelling section, we used six knowledge graph or graph neural network models for link prediction of three types of edges and multilabel classification of two types of nodes. Comparative experiments showed that the RotatE model was a good model for link prediction, while the RGCN model was the best model for multilabel classification. Potential target genes were predicted for hub miRNAs and validation of hub miRNA-target gene interactions, target genes as biomarkers and target gene functions were performed using a three-step validation approach. In conclusion, our study provides a new strategy to analyze and model miRNA regulatory networks.
Collapse
|
24
|
Pan D, Quan L, Jin Z, Chen T, Wang X, Xie J, Wu T, Lyu Q. Multisource Attention-Mechanism-Based Encoder-Decoder Model for Predicting Drug-Drug Interaction Events. J Chem Inf Model 2022; 62:6258-6270. [PMID: 36449561 DOI: 10.1021/acs.jcim.2c01112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Many computational methods have been proposed to predict drug-drug interactions (DDIs), which can occur when combining drugs to treat various diseases, but most mainly utilize single-source features of drugs, which is inadequate for drug representation. To fill this gap, we propose two attention-mechanism-based encoder-decoder models that incorporate multisource information: one is MAEDDI, which can predict DDIs, and the other is MAEDDIE, which can make further DDI-associated event predictions for drug pairs with DDIs. To better express the drug feature, we used three encoding methods to encode the drugs, integrating the self-attention mechanism, cross-attention mechanism, and graph attention network to construct a multisource feature fusion network. Experiments showed that both MAEDDI and MAEDDIE performed better than some state-of-the-art methods in various validation attempts at different experimental tasks. The visualization analysis showed that the semantic features of drug pairs learned from our models had a good drug representation. In practice, MAEDDIE successfully screened 43 DDI events on favipiravir, an influenza antiviral drug, with a success rate of nearly 50%. Our model achieved competitive results, mainly owing to the design of sequence-based, structural, biochemical, and statistical multisource features. Moreover, different encoders constructed based on different features learn the interrelationship information between drug pairs, and the different representations of these drug pairs are incorporated to predict the target problem. All of these encoders were designed to better characterize the complex DDI relationships, allowing us to achieve high generalization in DDI and DDI-associated event predations.
Collapse
Affiliation(s)
- Deng Pan
- School of Computer Science and Technology, Soochow University, Suzhou215006, China
| | - Lijun Quan
- School of Computer Science and Technology, Soochow University, Suzhou215006, China.,Province Key Lab for Information Processing Technologies, Soochow University, Suzhou215006, China.,Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing210000, China
| | - Zhi Jin
- School of Computer Science and Technology, Soochow University, Suzhou215006, China
| | - Taoning Chen
- School of Computer Science and Technology, Soochow University, Suzhou215006, China
| | - Xuejiao Wang
- School of Computer Science and Technology, Soochow University, Suzhou215006, China
| | - Jingxin Xie
- School of Computer Science and Technology, Soochow University, Suzhou215006, China
| | - Tingfang Wu
- School of Computer Science and Technology, Soochow University, Suzhou215006, China.,Province Key Lab for Information Processing Technologies, Soochow University, Suzhou215006, China.,Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing210000, China
| | - Qiang Lyu
- School of Computer Science and Technology, Soochow University, Suzhou215006, China.,Province Key Lab for Information Processing Technologies, Soochow University, Suzhou215006, China.,Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing210000, China
| |
Collapse
|
25
|
FMG: An observable DNA storage coding method based on frequency matrix game graphs. Comput Biol Med 2022; 151:106269. [PMID: 36356390 DOI: 10.1016/j.compbiomed.2022.106269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/20/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022]
Abstract
Using complex biomolecules for storage is a new carbon-based storage method. For example, DNA has the potential to be a good method for archival long-term data storage. Reasonable and efficient coding is the first and most important step in DNA storage. However, current coding methods, such as altruism algorithm, have the problem of low coding efficiency and high complexity, and coding constraints and sets make it difficult to see the coding results visually. In this study, a new DNA storage coding method based on frequency matrix game graph (FMG) is proposed to generate DNA storage coding satisfying combinatorial constraints. Compared with the randomness of the heuristic algorithm that satisfies the constraints, the coding method based on the FMG is deterministic and can clearly explain the coding process. In addition, the constraints and coding results have observable characteristics and are better than the previously published results for the size of the coding set. For example, when length of the code n = 10, hamming distance d = 4, the results obtained by proposed approach combining chaos game and graph are 24% better than the previous results. The proposed coding scheme successfully constructs high-quality coding sets with less complexity, which effectively promotes the development of carbon-based storage coding.
Collapse
|
26
|
Song T, Dai H, Wang S, Wang G, Zhang X, Zhang Y, Jiao L. TransCluster: A Cell-Type Identification Method for single-cell RNA-Seq data using deep learning based on transformer. Front Genet 2022; 13:1038919. [PMID: 36303549 PMCID: PMC9592860 DOI: 10.3389/fgene.2022.1038919] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 09/23/2022] [Indexed: 11/25/2022] Open
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) have accelerated the development of techniques to classify thousands of cells through transcriptome profiling. As more and more scRNA-seq data become available, supervised cell type classification methods using externally well-annotated source data become more popular than unsupervised clustering algorithms. However, accurate cellular annotation of single cell transcription data remains a significant challenge. Here, we propose a hybrid network structure called TransCluster, which uses linear discriminant analysis and a modified Transformer to enhance feature learning. It is a cell-type identification tool for single-cell transcriptomic maps. It shows high accuracy and robustness in many cell data sets of different human tissues. It is superior to other known methods in external test data set. To our knowledge, TransCluster is the first attempt to use Transformer for annotating cell types of scRNA-seq, which greatly improves the accuracy of cell-type identification.
Collapse
Affiliation(s)
- Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- Department of Artificial Intelligence, Faculty of Computer Science, Campus de Montegancedo, Polytechnical University of Madrid, Boadilla Del Monte, Madrid, Spain
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Huanhuan Dai
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Gan Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Xudong Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Ying Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
27
|
Qian Y, Wu J, Zhang Q. CAT-CPI: Combining CNN and transformer to learn compound image features for predicting compound-protein interactions. Front Mol Biosci 2022; 9:963912. [PMID: 36188230 PMCID: PMC9520300 DOI: 10.3389/fmolb.2022.963912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
Abstract
Compound-protein interaction (CPI) prediction is a foundational task for drug discovery, which process is time-consuming and costly. The effectiveness of CPI prediction can be greatly improved using deep learning methods to accelerate drug development. Large number of recent research results in the field of computer vision, especially in deep learning, have proved that the position, geometry, spatial structure and other features of objects in an image can be well characterized. We propose a novel molecular image-based model named CAT-CPI (combining CNN and transformer to predict CPI) for CPI task. We use Convolution Neural Network (CNN) to learn local features of molecular images and then use transformer encoder to capture the semantic relationships of these features. To extract protein sequence feature, we propose to use a k-gram based method and obtain the semantic relationships of sub-sequences by transformer encoder. In addition, we build a Feature Relearning (FR) module to learn interaction features of compounds and proteins. We evaluated CAT-CPI on three benchmark datasets—Human, Celegans, and Davis—and the experimental results demonstrate that CAT-CPI presents competitive performance against state-of-the-art predictors. In addition, we carry out Drug-Drug Interaction (DDI) experiments to verify the strong potential of the methods based on molecular images and FR module.
Collapse
|
28
|
Chen W, Wang S, Song T, Li X, Han P, Gao C. DCSE:Double-Channel-Siamese-Ensemble model for protein protein interaction prediction. BMC Genomics 2022; 23:555. [PMID: 35922751 PMCID: PMC9351149 DOI: 10.1186/s12864-022-08772-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/15/2022] [Indexed: 11/15/2022] Open
Abstract
Background Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction. Results We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model’s performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{1}$$\end{document}F1, Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_{1}$$\end{document}F1, Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent. Conclusion Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08772-6.
Collapse
Affiliation(s)
- Wenqi Chen
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China.
| | - Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China.,Department of Artificial Intelligence, Polytechnical University of Madrid, Madrid, Spain
| | - Xue Li
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Peifu Han
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Changnan Gao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
29
|
Analysis of the Basic Characteristics and Teaching Environment and Mode of Music Appreciation Course Based on Core Literacy. JOURNAL OF ENVIRONMENTAL AND PUBLIC HEALTH 2022; 2022:7709053. [PMID: 35958382 PMCID: PMC9357672 DOI: 10.1155/2022/7709053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/25/2022] [Accepted: 06/27/2022] [Indexed: 11/29/2022]
Abstract
The main goals of a music appreciation course are to develop college students' emotions, improve their musical abilities, and strengthen their conceptual framework. This essay primarily introduces the fundamental elements and instructional approach of a music appreciation course built on core literacy. In order to clarify the need for and orientation of basic literacy in college music appreciation teaching, as well as to understand the special value of the subject, we conducted research on the relationship between basic literacy and college music appreciation teaching. Based on this, a new detection algorithm that fuses feature fusion and AM (attention mechanism) is proposed, and visual AM is added to the algorithm based on AM features to help the agent learn the best course of action quickly. The findings indicate that this method's cumulative normalized discount gain increases by about 0.07 and that the recommended list's overall quality has significantly improved. The hit ranking is decreased by about 0.05 when compared to the collaborative neural network filtering method without AM.
Collapse
|
30
|
Li X, Han P, Wang G, Chen W, Wang S, Song T. SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction. BMC Genomics 2022; 23:474. [PMID: 35761175 PMCID: PMC9235110 DOI: 10.1186/s12864-022-08687-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 06/10/2022] [Indexed: 12/20/2022] Open
Abstract
Background Protein-protein interactions (PPIs) dominate intracellular molecules to perform a series of tasks such as transcriptional regulation, information transduction, and drug signalling. The traditional wet experiment method to obtain PPIs information is costly and time-consuming. Result In this paper, SDNN-PPI, a PPI prediction method based on self-attention and deep learning is proposed. The method adopts amino acid composition (AAC), conjoint triad (CT), and auto covariance (AC) to extract global and local features of protein sequences, and leverages self-attention to enhance DNN feature extraction to more effectively accomplish the prediction of PPIs. In order to verify the generalization ability of SDNN-PPI, a 5-fold cross-validation on the intraspecific interactions dataset of Saccharomyces cerevisiae (core subset) and human is used to measure our model in which the accuracy reaches 95.48% and 98.94% respectively. The accuracy of 93.15% and 88.33% are obtained in the interspecific interactions dataset of human-Bacillus Anthracis and Human-Yersinia pestis, respectively. In the independent data set Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, all prediction accuracy is 100%, which is higher than the previous PPIs prediction methods. To further evaluate the advantages and disadvantages of the model, the one-core and crossover network are conducted to predict PPIs, and the data show that the model correctly predicts the interaction pairs in the network. Conclusion In this paper, AAC, CT and AC methods are used to encode the sequence, and SDNN-PPI method is proposed to predict PPIs based on self-attention deep learning neural network. Satisfactory results are obtained on interspecific and intraspecific data sets, and good performance is also achieved in cross-species prediction. It can also correctly predict the protein interaction of cell and tumor information contained in one-core network and crossover network.The SDNN-PPI proposed in this paper not only explores the mechanism of protein-protein interaction, but also provides new ideas for drug design and disease prevention.
Collapse
|
31
|
Multi-TransDTI: Transformer for Drug–Target Interaction Prediction Based on Simple Universal Dictionaries with Multi-View Strategy. Biomolecules 2022; 12:biom12050644. [PMID: 35625572 PMCID: PMC9138327 DOI: 10.3390/biom12050644] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 04/19/2022] [Accepted: 04/25/2022] [Indexed: 01/03/2023] Open
Abstract
Prediction on drug–target interaction has always been a crucial link for drug discovery and repositioning, which have witnessed tremendous progress in recent years. Despite many efforts made, the existing representation learning or feature generation approaches of both drugs and proteins remain complicated as well as in high dimension. In addition, it is difficult for current methods to extract local important residues from sequence information while remaining focused on global structure. At the same time, massive data is not always easily accessible, which makes model learning from small datasets imminent. As a result, we propose an end-to-end learning model with SUPD and SUDD methods to encode drugs and proteins, which not only leave out the complicated feature extraction process but also greatly reduce the dimension of the embedding matrix. Meanwhile, we use a multi-view strategy with a transformer to extract local important residues of proteins for better representation learning. Finally, we evaluate our model on the BindingDB dataset in comparisons with different state-of-the-art models from comprehensive indicators. In results of 100% BindingDB, our AUC, AUPR, ACC, and F1-score reached 90.9%, 89.8%, 84.2%, and 84.3% respectively, which successively exceed the average values of other models by 2.2%, 2.3%, 2.6%, and 2.6%. Moreover, our model also generally surpasses their performance on 30% and 50% BindingDB datasets.
Collapse
|
32
|
Wang X, Zhang Z, Zhang C, Meng X, Shi X, Qu P. TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture. Int J Mol Sci 2022; 23:ijms23084263. [PMID: 35457080 PMCID: PMC9029334 DOI: 10.3390/ijms23084263] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/04/2022] [Accepted: 04/09/2022] [Indexed: 02/06/2023] Open
Abstract
Protein phosphorylation is one of the most critical post-translational modifications of proteins in eukaryotes, which is essential for a variety of biological processes. Plenty of attempts have been made to improve the performance of computational predictors for phosphorylation site prediction. However, most of them are based on extra domain knowledge or feature selection. In this article, we present a novel deep learning-based predictor, named TransPhos, which is constructed using a transformer encoder and densely connected convolutional neural network blocks, for predicting phosphorylation sites. Data experiments are conducted on the datasets of PPA (version 3.0) and Phospho. ELM. The experimental results show that our TransPhos performs better than several deep learning models, including Convolutional Neural Networks (CNN), Long-term and short-term memory networks (LSTM), Recurrent neural networks (RNN) and Fully connected neural networks (FCNN), and some state-of-the-art deep learning-based prediction tools, including GPS2.1, NetPhos, PPRED, Musite, PhosphoSVM, SKIPHOS, and DeepPhos. Our model achieves a good performance on the training datasets of Serine (S), Threonine (T), and Tyrosine (Y), with AUC values of 0.8579, 0.8335, and 0.6953 using 10-fold cross-validation tests, respectively, and demonstrates that the presented TransPhos tool considerably outperforms competing predictors in general protein phosphorylation site prediction.
Collapse
Affiliation(s)
- Xun Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
- State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
- Correspondence:
| | - Zhiyuan Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Chaogang Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xiangyu Meng
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xin Shi
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Peng Qu
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| |
Collapse
|