1
|
Zomorodi M, Ghodsollahee I, Martin JH, Talley NJ, Salari V, Pławiak P, Rahimi K, Acharya UR. RECOMED: A comprehensive pharmaceutical recommendation system. Artif Intell Med 2024; 157:102981. [PMID: 39306906 DOI: 10.1016/j.artmed.2024.102981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 09/06/2024] [Accepted: 09/10/2024] [Indexed: 11/14/2024]
Abstract
OBJECTIVES To build datasets containing useful information from drug databases and recommend a list of drugs to physicians and patients with high accuracy by considering a wide range of features of people, diseases, and chemicals. METHODS A comprehensive pharmaceutical recommendation system was designed based on the features of people, diseases, and medicines extracted from two major drug databases and the created datasets of patients and drug information. Then, the recommendation was given based on recommender system algorithms using patient and caregiver ratings and the knowledge obtained from drug specifications and interactions. Sentiment analysis was employed by natural language processing approaches in pre-processing, along with neural network-based methods and recommender system algorithms for modelling the system. Patient conditions and medicine features were used to make two models based on matrix factorization. Then, we used drug interaction criteria to filter drugs with severe or mild interactions with other drugs. We developed a deep learning model for recommending drugs using data from 2304 patients as a training set and 660 patients as our validation set. We used knowledge from drug information and combined the model's outcome into a knowledge-based system with the rules obtained from constraints on taking medicine. RESULTS Our recommendation system can recommend an acceptable combination of medicines similar to the existing prescriptions available in real life. Compared with conventional matrix factorization, our proposed model improves the accuracy, sensitivity, and hit rate by 26 %, 34 %, and 40 %, respectively. In addition, it improves the accuracy, sensitivity, and hit rate by an average of 31 %, 29 %, and 28 % compared to other machine learning methods. We have open-sourced our implementation in Python. CONCLUSION Compared to conventional machine learning approaches, we obtained average accuracy, sensitivity, and hit rates of 31 %, 29 %, and 28 %, respectively. Compared to conventional matrix factorisation our proposed method improved the accuracy, sensitivity, and hit rate by 26 %, 34 %, and 40 %, respectively. However, it is acknowledged that this is not the same as clinical accuracy or sensitivity, and more accurate results can be obtained by gathering larger datasets.
Collapse
Affiliation(s)
- Mariam Zomorodi
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Krakow, Poland.
| | | | - Jennifer H Martin
- NHMRC Centre for Research Excellence in Digestive Health, Hunter Medical Research Institute (HMRI), The University of Newcastle, Callaghan, New South Wales, Australia
| | - Nicholas J Talley
- NHMRC Centre for Research Excellence in Digestive Health, Hunter Medical Research Institute (HMRI), The University of Newcastle, Callaghan, New South Wales, Australia
| | - Vahid Salari
- Institute for Quantum Science and Technology, Department of Physics and Astronomy, University of Calgary, Alberta, Canada
| | - Paweł Pławiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Krakow, Poland; Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
| | - Kazem Rahimi
- Deep Medicine, Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - U R Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD, Australia
| |
Collapse
|
2
|
Behnoudfar D, Simon CM, Schrier J. Data-Driven Imputation of Miscibility of Aqueous Solutions via Graph-Regularized Logistic Matrix Factorization. J Phys Chem B 2023; 127:7964-7973. [PMID: 37682958 DOI: 10.1021/acs.jpcb.3c03789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2023]
Abstract
Aqueous, two-phase systems (ATPSs) may form upon mixing two solutions of independently water-soluble compounds. Many separation, purification, and extraction processes rely on ATPSs. Predicting the miscibility of solutions can accelerate and reduce the cost of the discovery of new ATPSs for these applications. Whereas previous machine learning approaches to ATPS prediction used physicochemical properties of each solute as a descriptor, in this work, we show how to impute missing miscibility outcomes directly from an incomplete collection of pairwise miscibility experiments. We use graph-regularized logistic matrix factorization (GR-LMF) to learn a latent vector of each solution from (i) the observed entries in the pairwise miscibility matrix and (ii) a graph where each node is a solution and edges are relationships indicating the general category of the solute (i.e., polymer, surfactant, salt, protein). For an experimental data set of the pairwise miscibility of 68 solutions from Peacock et al. [ACS Appl. Mater. Interfaces 2021, 13, 11449-11460], we find that GR-LMF more accurately predicts missing (im)miscibility outcomes of pairs of solutions than ordinary logistic matrix factorization and random forest classifiers that use physicochemical features of the solutes. GR-LMF obviates the need for features of the solutions and solutions to impute missing miscibility outcomes, but it cannot predict the miscibility of a new solution without some observations of its miscibility with other solutions in the training data set.
Collapse
Affiliation(s)
- Diba Behnoudfar
- School of Chemical, Biological, and Environmental Engineering, Oregon State University, Corvallis, Oregon 97331, United States
| | - Cory M Simon
- School of Chemical, Biological, and Environmental Engineering, Oregon State University, Corvallis, Oregon 97331, United States
| | - Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| |
Collapse
|
3
|
Li H, Zou L, Kowah JAH, He D, Liu Z, Ding X, Wen H, Wang L, Yuan M, Liu X. A compact review of progress and prospects of deep learning in drug discovery. J Mol Model 2023; 29:117. [PMID: 36976427 DOI: 10.1007/s00894-023-05492-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 02/27/2023] [Indexed: 03/29/2023]
Abstract
BACKGROUND Drug discovery processes, such as new drug development, drug synergy, and drug repurposing, consume significant yearly resources. Computer-aided drug discovery can effectively improve the efficiency of drug discovery. Traditional computer methods such as virtual screening and molecular docking have achieved many gratifying results in drug development. However, with the rapid growth of computer science, data structures have changed considerably; with more extensive and dimensional data and more significant amounts of data, traditional computer methods can no longer be applied well. Deep learning methods are based on deep neural network structures that can handle high-dimensional data very well, so they are used in current drug development. RESULTS This review summarized the applications of deep learning methods in drug discovery, such as drug target discovery, drug de novo design, drug recommendation, drug synergy, and drug response prediction. While applying deep learning methods to drug discovery suffers from a lack of data, transfer learning is an excellent solution to this problem. Furthermore, deep learning methods can extract deeper features and have higher predictive power than other machine learning methods. Deep learning methods have great potential in drug discovery and are expected to facilitate drug discovery development.
Collapse
Affiliation(s)
- Huijun Li
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Lin Zou
- College of Medicine, Guangxi University, Nanning, 530004, China
| | | | - Dongqiong He
- College of Chemistry and Chemical Engineering, Guangxi University, Nanning, 530004, China
| | - Zifan Liu
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Xuejie Ding
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Hao Wen
- College of Chemistry and Chemical Engineering, Guangxi University, Nanning, 530004, China
| | - Lisheng Wang
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Mingqing Yuan
- College of Medicine, Guangxi University, Nanning, 530004, China
| | - Xu Liu
- College of Medicine, Guangxi University, Nanning, 530004, China.
| |
Collapse
|
4
|
Sosnina EA, Sosnin S, Fedorov MV. Improvement of multi-task learning by data enrichment: application for drug discovery. J Comput Aided Mol Des 2023; 37:183-200. [PMID: 36943645 DOI: 10.1007/s10822-023-00500-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 02/21/2023] [Indexed: 03/23/2023]
Abstract
Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026.
| | - Sergey Sosnin
- Department of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1190, Vienna, Austria
| | - Maxim V Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30/1, Moscow, Russia, 143026
- Sirius University of Science and Technology, Olympiisky Prospect 1, Sochi, Russia, 354340
| |
Collapse
|
5
|
Zhang Y, Zhu G, Li K, Li F, Huang L, Duan M, Zhou F. HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction. Brief Bioinform 2022; 23:6581432. [PMID: 35514183 PMCID: PMC9487590 DOI: 10.1093/bib/bbac173] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 03/29/2022] [Accepted: 04/18/2022] [Indexed: 12/11/2022] Open
Abstract
Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http://www.healthinformaticslab.org/supp/resources.php.
Collapse
Affiliation(s)
- Yaqi Zhang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Gancheng Zhu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Kewei Li
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Fei Li
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Lan Huang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Meiyu Duan
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| |
Collapse
|
6
|
Barros M, Moitinho A, Couto FM. Hybrid semantic recommender system for chemical compounds in large-scale datasets. J Cheminform 2021; 13:15. [PMID: 33622374 PMCID: PMC7903631 DOI: 10.1186/s13321-021-00495-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 02/10/2021] [Indexed: 12/16/2022] Open
Abstract
The large, and increasing, number of chemical compounds poses challenges to the exploration of such datasets. In this work, we propose the usage of recommender systems to identify compounds of interest to scientific researchers. Our approach consists of a hybrid recommender model suitable for implicit feedback datasets and focused on retrieving a ranked list according to the relevance of the items. The model integrates collaborative-filtering algorithms for implicit feedback (Alternating Least Squares and Bayesian Personalized Ranking) and a new content-based algorithm, using the semantic similarity between the chemical compounds in the ChEBI ontology. The algorithms were assessed on an implicit dataset of chemical compounds, CheRM-20, with more than 16.000 items (chemical compounds). The hybrid model was able to improve the results of the collaborative-filtering algorithms, by more than ten percentage points in most of the assessed evaluation metrics.
Collapse
Affiliation(s)
- Marcia Barros
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal. .,CENTRA, Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| | - Andre Moitinho
- CENTRA, Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| | - Francisco M Couto
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| |
Collapse
|
7
|
Elbadawi M, Gaisford S, Basit AW. Advanced machine-learning techniques in drug discovery. Drug Discov Today 2020; 26:769-777. [PMID: 33290820 DOI: 10.1016/j.drudis.2020.12.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/16/2020] [Accepted: 12/02/2020] [Indexed: 01/20/2023]
Abstract
The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery.
Collapse
Affiliation(s)
- Moe Elbadawi
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Simon Gaisford
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK
| | - Abdul W Basit
- Department of Pharmaceutics, UCL School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK; FabRx Ltd, 3 Romney Road, Ashford, TN24 0RW, UK.
| |
Collapse
|