1
|
Xie W, Yu J, Huang L, For LS, Zheng Z, Chen X, Wang Y, Liu Z, Peng C, Wong KC. DeepSeq2Drug: An expandable ensemble end-to-end anti-viral drug repurposing benchmark framework by multi-modal embeddings and transfer learning. Comput Biol Med 2024; 175:108487. [PMID: 38653064 DOI: 10.1016/j.compbiomed.2024.108487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/26/2024] [Accepted: 04/15/2024] [Indexed: 04/25/2024]
Abstract
Drug repurposing is promising in multiple scenarios, such as emerging viral outbreak controls and cost reductions of drug discovery. Traditional graph-based drug repurposing methods are limited to fast, large-scale virtual screens, as they constrain the counts for drugs and targets and fail to predict novel viruses or drugs. Moreover, though deep learning has been proposed for drug repurposing, only a few methods have been used, including a group of pre-trained deep learning models for embedding generation and transfer learning. Hence, we propose DeepSeq2Drug to tackle the shortcomings of previous methods. We leverage multi-modal embeddings and an ensemble strategy to complement the numbers of drugs and viruses and to guarantee the novel prediction. This framework (including the expanded version) involves four modal types: six NLP models, four CV models, four graph models, and two sequence models. In detail, we first make a pipeline and calculate the predictive performance of each pair of viral and drug embeddings. Then, we select the best embedding pairs and apply an ensemble strategy to conduct anti-viral drug repurposing. To validate the effect of the proposed ensemble model, a monkeypox virus (MPV) case study is conducted to reflect the potential predictive capability. This framework could be a benchmark method for further pre-trained deep learning optimization and anti-viral drug repurposing tasks. We also build software further to make the proposed model easier to reuse. The code and software are freely available at http://deepseq2drug.cs.cityu.edu.hk.
Collapse
Affiliation(s)
- Weidun Xie
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Lei Huang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Lek Shyuen For
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Zetian Zheng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Xingjian Chen
- Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Yuchen Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
| | - Zhichao Liu
- Sir William Dunn School of Pathology, University of Oxford, UK
| | - Chengbin Peng
- College of Information Science and Engineering, Ningbo University, Ningbo, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China; Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China; Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Kovtun V, Grochla K, Kharchenko V, Haq MA, Semenov A. Stochastic forecasting of variable small data as a basis for analyzing an early stage of a cyber epidemic. Sci Rep 2023; 13:22810. [PMID: 38129492 PMCID: PMC10739954 DOI: 10.1038/s41598-023-49007-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 12/02/2023] [Indexed: 12/23/2023] Open
Abstract
Security Information and Event Management (SIEM) technologies play an important role in the architecture of modern cyber protection tools. One of the main scenarios for the use of SIEM is the detection of attacks on protected information infrastructure. Consorting that ISO 27001, NIST SP 800-61, and NIST SP 800-83 standards objectively do not keep up with the evolution of cyber threats, research aimed at forecasting the development of cyber epidemics is relevant. The article proposes a stochastic concept of describing variable small data on the Shannon entropy basis. The core of the concept is the description of small data by linear differential equations with stochastic characteristic parameters. The practical value of the proposed concept is embodied in the method of forecasting the development of a cyber epidemic at an early stage (in conditions of a lack of empirical information). In the context of the research object, the stochastic characteristic parameters of the model are the generation rate, the death rate, and the independent coefficient of variability of the measurement of the initial parameter of the research object. Analytical expressions for estimating the probability distribution densities of these characteristic parameters are proposed. It is assumed that these stochastic parameters of the model are imposed on the intervals, which allows for manipulation of the nature and type of the corresponding functions of the probability distribution densities. The task of finding optimal functions of the probability distribution densities of the characteristic parameters of the model with maximum entropy is formulated. The proposed method allows for generating sets of trajectories of values of characteristic parameters with optimal functions of the probability distribution densities. The example demonstrates both the flexibility and reliability of the proposed concept and method in comparison with the concepts of forecasting numerical series implemented in the base of Matlab functions.
Collapse
Affiliation(s)
- Viacheslav Kovtun
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland.
| | - Krzysztof Grochla
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
| | | | - Mohd Anul Haq
- College of Computer and Information Sciences, Majmaah University, Al Majma'ah, Saudi Arabia
| | - Andriy Semenov
- Vinnytsia National Technical University, Vinnytsia, Ukraine
| |
Collapse
|
3
|
Qu J, Song Z, Cheng X, Jiang Z, Zhou J. A new integrated framework for the identification of potential virus-drug associations. Front Microbiol 2023; 14:1179414. [PMID: 37675432 PMCID: PMC10478006 DOI: 10.3389/fmicb.2023.1179414] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction With the increasingly serious problem of antiviral drug resistance, drug repurposing offers a time-efficient and cost-effective way to find potential therapeutic agents for disease. Computational models have the ability to quickly predict potential reusable drug candidates to treat diseases. Methods In this study, two matrix decomposition-based methods, i.e., Matrix Decomposition with Heterogeneous Graph Inference (MDHGI) and Bounded Nuclear Norm Regularization (BNNR), were integrated to predict anti-viral drugs. Moreover, global leave-one-out cross-validation (LOOCV), local LOOCV, and 5-fold cross-validation were implemented to evaluate the performance of the proposed model based on datasets of DrugVirus that consist of 933 known associations between 175 drugs and 95 viruses. Results The results showed that the area under the receiver operating characteristics curve (AUC) of global LOOCV and local LOOCV are 0.9035 and 0.8786, respectively. The average AUC and the standard deviation of the 5-fold cross-validation for DrugVirus datasets are 0.8856 ± 0.0032. We further implemented cross-validation based on MDAD and aBiofilm, respectively, to evaluate the performance of the model. In particle, MDAD (aBiofilm) dataset contains 2,470 (2,884) known associations between 1,373 (1,470) drugs and 173 (140) microbes. In addition, two types of case studies were carried out further to verify the effectiveness of the model based on the DrugVirus and MDAD datasets. The results of the case studies supported the effectiveness of MHBVDA in identifying potential virus-drug associations as well as predicting potential drugs for new microbes.
Collapse
Affiliation(s)
- Jia Qu
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zihao Song
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Xiaolong Cheng
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, Jiangsu, China
| | - Zhibin Jiang
- School of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| | - Jie Zhou
- School of Computer Science and Engineering, Shaoxing University, Shaoxing, Zhejiang, China
| |
Collapse
|
4
|
Choo HY, Wee J, Shen C, Xia K. Fingerprint-Enhanced Graph Attention Network (FinGAT) Model for Antibiotic Discovery. J Chem Inf Model 2023; 63:2928-2935. [PMID: 37167016 DOI: 10.1021/acs.jcim.3c00045] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Artificial Intelligence (AI) techniques are of great potential to fundamentally change antibiotic discovery industries. Efficient and effective molecular featurization is key to all highly accurate learning models for antibiotic discovery. In this paper, we propose a fingerprint-enhanced graph attention network (FinGAT) model by the combination of sequence-based 2D fingerprints and structure-based graph representation. In our feature learning process, sequence information is transformed into a fingerprint vector, and structural information is encoded through a GAT module into another vector. These two vectors are concatenated and input into a multilayer perceptron (MLP) for antibiotic activity classification. Our model is extensively tested and compared with existing models. It has been found that our FinGAT can outperform various state-of-the-art GNN models in antibiotic discovery.
Collapse
Affiliation(s)
- Hou Yee Choo
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| | - Cong Shen
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University, Singapore 637371
| |
Collapse
|
5
|
A computationally efficient method for assessing the impact of an active viral cyber threat on a high-availability cluster. EGYPTIAN INFORMATICS JOURNAL 2022. [DOI: 10.1016/j.eij.2022.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
6
|
Das B, Kutsal M, Das R. A geometric deep learning model for display and prediction of potential drug-virus interactions against SARS-CoV-2. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS : AN INTERNATIONAL JOURNAL SPONSORED BY THE CHEMOMETRICS SOCIETY 2022; 229:104640. [PMID: 36042844 PMCID: PMC9400382 DOI: 10.1016/j.chemolab.2022.104640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 08/17/2022] [Accepted: 08/19/2022] [Indexed: 05/04/2023]
Abstract
Although the coronavirus epidemic spread rapidly with the Omicron variant, it lost its lethality rate with the effect of vaccine and immunity. The hospitalization and intense demand decreased. However, there is no definite information about when this disease will end or how dangerous the different variants could be. In addition, it is not possible to end the risk of variants that will continue to circulate among animals in nature. After this stage, drug-virus interactions should be examined in order to be able to prepare against possible new types of viruses and variants and to rapidly-produce drugs or vaccines against possible viruses. Despite experimental methods that are expensive, laborious, and time-consuming, geometric deep learning(GDL) is an alternative method that can be used to make this process faster and cheaper. In this study, we propose a new model based on geometric deep learning for the prediction of drug-virus interaction against COVID-19. First, we use the antiviral drug data in the SMILES molecular structure representation to generate too many features and better describe the structure of chemical species. Then the data is converted into a molecular representation and then into a graphical structure that the GDL model can understand. The node feature vectors are transferred to a different space with the Message Passing Neural Network (MPNN) for the training process to take place. We develop a geometric neural network architecture where the graph embedding values are passed through the fully connected layer and the prediction is actualized. The results indicate that the proposed method outperforms existing methods with 97% accuracy in predicting drug-virus interactions.
Collapse
Affiliation(s)
- Bihter Das
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| | - Mucahit Kutsal
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| | - Resul Das
- Department of Software Engineering, Technology Faculty, Firat University, 23119, Elazig, Turkey
| |
Collapse
|
7
|
Cho YR, Hu X. Network-based approaches in bioinformatics and biomedicine. Methods 2021; 198:1-2. [PMID: 34958915 DOI: 10.1016/j.ymeth.2021.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Affiliation(s)
- Young-Rae Cho
- Division of Software, Yonsei University - Mirae Campus, Wonju, Republic of Korea.
| | - Xiaohua Hu
- College of Computing & Informatics, Drexel University, Philadelphia, PA, USA
| |
Collapse
|