1
|
Genç M. Penalized logistic regression with prior information for microarray gene expression classification. Int J Biostat 2024; 20:107-122. [PMID: 36427223 DOI: 10.1515/ijb-2022-0025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 11/07/2022] [Indexed: 02/17/2024]
Abstract
Cancer classification and gene selection are important applications in DNA microarray gene expression data analysis. Since DNA microarray data suffers from the high-dimensionality problem, automatic gene selection methods are used to enhance the classification performance of expert classifier systems. In this paper, a new penalized logistic regression method that performs simultaneous gene coefficient estimation and variable selection in DNA microarray data is discussed. The method employs prior information about the gene coefficients to improve the classification accuracy of the underlying model. The coordinate descent algorithm with screening rules is given to obtain the gene coefficient estimates of the proposed method efficiently. The performance of the method is examined on five high-dimensional cancer classification datasets using the area under the curve, the number of selected genes, misclassification rate and F-score measures. The real data analysis results indicate that the proposed method achieves a good cancer classification performance with a small misclassification rate, large area under the curve and F-score by trading off some sparsity level of the underlying model. Hence, the proposed method can be seen as a reliable penalized logistic regression method in the scope of high-dimensional cancer classification.
Collapse
Affiliation(s)
- Murat Genç
- Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Tarsus University Mersin, Mersin 33400, Türkiye
| |
Collapse
|
2
|
Yu R, Cai L, Gong Y, Sun X, Li K, Cao Q, Yang X, Lu Q. MRI-Based Machine Learning Radiomics for Preoperative Assessment of Human Epidermal Growth Factor Receptor 2 Status in Urothelial Bladder Carcinoma. J Magn Reson Imaging 2024. [PMID: 38456745 DOI: 10.1002/jmri.29342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND The human epidermal growth factor receptor 2 (HER2) has recently emerged as hotspot in targeted therapy for urothelial bladder cancer (UBC). The HER2 status is mainly identified by immunohistochemistry (IHC), preoperative and noninvasive methods for determining HER2 status in UBC remain in searching. PURPOSES To investigate whether radiomics features extracted from MRI using machine learning algorithms can noninvasively evaluate the HER2 status in UBC. STUDY TYPE Retrospective. POPULATION One hundred ninety-five patients (age: 68.7 ± 10.5 years) with 14.3% females from January 2019 to May 2023 were divided into training (N = 156) and validation (N = 39) cohorts, and 43 patients (age: 67.1 ± 13.1 years) with 13.9% females from June 2023 to January 2024 constituted the test cohort (N = 43). FIELD STRENGTH/SEQUENCE 3 T, T2-weighted imaging (turbo spin-echo), diffusion-weighted imaging (breathing-free spin echo). ASSESSMENT The HER2 status were assessed by IHC. Radiomics features were extracted from MRI images. Pearson correlation coefficient and the least absolute shrinkage and selection operator (LASSO) were applied for feature selection, and six machine learning models were established with optimal features to identify the HER2 status in UBC. STATISTICAL TESTS Mann-Whitney U-test, chi-square test, LASSO algorithm, receiver operating characteristic analysis, and DeLong test. RESULTS Three thousand forty-five radiomics features were extracted from each lesion, and 22 features were retained for analysis. The Support Vector Machine model demonstrated the best performance, with an AUC of 0.929 (95% CI: 0.888-0.970) and accuracy of 0.859 in the training cohort, AUC of 0.886 (95% CI: 0.780-0.993) and accuracy of 0.846 in the validation cohort, and AUC of 0.712 (95% CI: 0.535-0.889) and accuracy of 0.744 in the test cohort. DATA CONCLUSION MRI-based radiomics features combining machine learning algorithm provide a promising approach to assess HER2 status in UBC noninvasively and preoperatively. EVIDENCE LEVEL 2 TECHNICAL EFFICACY: Stage 3.
Collapse
Affiliation(s)
- Ruixi Yu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Lingkai Cai
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
- Department of Urology, Wuxi Medical Center, Nanjing Medical University, Wuxi, China
| | - Yuxi Gong
- Department of Pathology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xueying Sun
- Department of Radiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Kai Li
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Qiang Cao
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiao Yang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Qiang Lu
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
3
|
Zhu W, Wang Y, Niu Y, Zhang L, Liu Z. Current Trends and Challenges in Drug-Likeness Prediction: Are They Generalizable and Interpretable? HEALTH DATA SCIENCE 2023; 3:0098. [PMID: 38487200 PMCID: PMC10880170 DOI: 10.34133/hds.0098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 10/20/2023] [Indexed: 03/17/2024]
Abstract
Importance: Drug-likeness of a compound is an overall assessment of its potential to succeed in clinical trials, and is essential for economizing research expenditures by filtering compounds with unfavorable properties and poor development potential. To this end, a robust drug-likeness prediction method is indispensable. Various approaches, including discriminative rules, statistical models, and machine learning models, have been developed to predict drug-likeness based on physiochemical properties and structural features. Notably, recent advancements in novel deep learning techniques have significantly advanced drug-likeness prediction, especially in classification performance. Highlights: In this review, we addressed the evolving landscape of drug-likeness prediction, with emphasis on methods employing novel deep learning techniques, and highlighted the current challenges in drug-likeness prediction, specifically regarding the aspects of generalization and interpretability. Moreover, we explored potential remedies and outlined promising avenues for future research. Conclusion: Despite the hurdles of generalization and interpretability, novel deep learning techniques have great potential in drug-likeness prediction and are worthy of further research efforts.
Collapse
Affiliation(s)
- Wenyu Zhu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Yan Niu
- Department of Medicinal Chemistry,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs,
School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
4
|
Mishra SS, Kumar N, Karkara BB, Sharma CS, Kalra S. Identification of potential inhibitors of Zika virus targeting NS3 helicase using molecular dynamics simulations and DFT studies. Mol Divers 2023; 27:1689-1701. [PMID: 36063275 DOI: 10.1007/s11030-022-10522-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 08/26/2022] [Indexed: 10/14/2022]
Abstract
Despite the various research efforts towards the drug discovery program for Zika virus treatment, no antiviral drugs or vaccines have yet been discovered. The spread of the mosquito vector and ZIKV infection exposure is expected to accelerate globally due to continuing global travel. The NS3-Hel is a non-structural protein part and involved in different functions such as polyprotein processing, genome replication, etc. It makes an NS3-Hel protein an attractive target for designing novel drugs for ZIKV treatment. This investigation identifies the novel, potent ZIKV inhibitors by virtual screening and elucidates the binding pattern using molecular docking and molecular dynamics simulation studies. The molecular dynamics simulation results indicate dynamic stability between protein and ligand complexes, and the structures keep significantly unchanged at the binding site during the simulation period. All inhibitors found within the acceptable range having drug-likeness properties. The synthetic feasibility score suggests that all screened inhibitors can be easily synthesizable. Therefore, possible inhibitors obtained from this study can be considered a potential inhibitor for NS3 Hel, and further, it could be provided as a lead for drug development.
Collapse
Affiliation(s)
- Shashank Shekher Mishra
- Department of Pharmaceutical Chemistry, School of Pharmaceutical & Populations Health Informatics, DIT University, Dehradun, 248009, India.
| | - Neeraj Kumar
- Department of Pharmaceutical Chemistry, Bhupal Nobles' College of Pharmacy, Bhupal Nobles' University, Udaipur, 313001, India
| | - Bidhu Bhusan Karkara
- Department of Pharmaceutical Sciences, Vignan's Foundation for Science, Technology and Research, Vadlamudi, Guntur, 522213, India
| | - C S Sharma
- Department of Pharmaceutical Chemistry, Bhupal Nobles' College of Pharmacy, Bhupal Nobles' University, Udaipur, 313001, India
| | - Sourav Kalra
- National Institute of Pharmaceutical Education & Research, Mohali, Punjab, India
| |
Collapse
|
5
|
Mostafaei S, Hoang MT, Jurado PG, Xu H, Zacarias-Pons L, Eriksdotter M, Chatterjee S, Garcia-Ptacek S. Machine learning algorithms for identifying predictive variables of mortality risk following dementia diagnosis: a longitudinal cohort study. Sci Rep 2023; 13:9480. [PMID: 37301891 PMCID: PMC10257644 DOI: 10.1038/s41598-023-36362-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023] Open
Abstract
Machine learning (ML) could have advantages over traditional statistical models in identifying risk factors. Using ML algorithms, our objective was to identify the most important variables associated with mortality after dementia diagnosis in the Swedish Registry for Cognitive/Dementia Disorders (SveDem). From SveDem, a longitudinal cohort of 28,023 dementia-diagnosed patients was selected for this study. Sixty variables were considered as potential predictors of mortality risk, such as age at dementia diagnosis, dementia type, sex, body mass index (BMI), mini-mental state examination (MMSE) score, time from referral to initiation of work-up, time from initiation of work-up to diagnosis, dementia medications, comorbidities, and some specific medications for chronic comorbidities (e.g., cardiovascular disease). We applied sparsity-inducing penalties for three ML algorithms and identified twenty important variables for the binary classification task in mortality risk prediction and fifteen variables to predict time to death. Area-under-ROC curve (AUC) measure was used to evaluate the classification algorithms. Then, an unsupervised clustering algorithm was applied on the set of twenty-selected variables to find two main clusters which accurately matched surviving and dead patient clusters. A support-vector-machines with an appropriate sparsity penalty provided the classification of mortality risk with accuracy = 0.7077, AUROC = 0.7375, sensitivity = 0.6436, and specificity = 0.740. Across three ML algorithms, the majority of the identified twenty variables were compatible with literature and with our previous studies on SveDem. We also found new variables which were not previously reported in literature as associated with mortality in dementia. Performance of basic dementia diagnostic work-up, time from referral to initiation of work-up, and time from initiation of work-up to diagnosis were found to be elements of the diagnostic process identified by the ML algorithms. The median follow-up time was 1053 (IQR = 516-1771) days in surviving and 1125 (IQR = 605-1770) days in dead patients. For prediction of time to death, the CoxBoost model identified 15 variables and classified them in order of importance. These highly important variables were age at diagnosis, MMSE score, sex, BMI, and Charlson Comorbidity Index with selection scores of 23%, 15%, 14%, 12% and 10%, respectively. This study demonstrates the potential of sparsity-inducing ML algorithms in improving our understanding of mortality risk factors in dementia patients and their application in clinical settings. Moreover, ML methods can be used as a complement to traditional statistical methods.
Collapse
Affiliation(s)
- Shayan Mostafaei
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden.
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden.
| | - Minh Tuan Hoang
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Pol Grau Jurado
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden
| | - Hong Xu
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden
| | - Lluis Zacarias-Pons
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden
- Vascular Health Research Group of Girona (ISV-Girona), Institut Universitari d'Investigació en Atenció Primària Jordi Gol i Gurina (IDIAP Jordi Gol), Girona, Spain
- Network for Research on Chronicity, Primary Care, and Health Promotion (RICAPPS), Tenerife, Spain
| | - Maria Eriksdotter
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden
- Aging and Inflammation Theme, Karolinska University Hospital, Stockholm, Sweden
| | - Saikat Chatterjee
- Division of Information Science and Engineering, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Sara Garcia-Ptacek
- Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institute, Stockholm, Sweden.
- Aging and Inflammation Theme, Karolinska University Hospital, Stockholm, Sweden.
| |
Collapse
|
6
|
Win ZM, Cheong AMY, Hopkins WS. Using Machine Learning To Predict Partition Coefficient (Log P) and Distribution Coefficient (Log D) with Molecular Descriptors and Liquid Chromatography Retention Time. J Chem Inf Model 2023; 63:1906-1913. [PMID: 36926888 DOI: 10.1021/acs.jcim.2c01373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
During preclinical evaluations of drug candidates, several physicochemical (p-chem) properties are measured and employed as metrics to estimate drug efficacy in vivo. Two such p-chem properties are the octanol-water partition coefficient, Log P, and distribution coefficient, Log D, which are useful in estimating the distribution of drugs within the body. Log P and Log D are traditionally measured using the shake-flask method and high-performance liquid chromatography. However, it is challenging to measure these properties for species that are very hydrophobic (or hydrophilic) owing to the very low equilibrium concentrations partitioned into octanol (or aqueous) phases. Moreover, the shake-flask method is relatively time-consuming and can require multistep dilutions as the range of analyte concentrations can differ by several orders of magnitude. Here, we circumvent these limitations by using machine learning (ML) to correlate Log P and Log D with liquid chromatography (LC) retention time (RT). Predictive models based on four ML algorithms, which used molecular descriptors and LC RTs as features, were extensively tested and compared. The inclusion of RT as an additional descriptor improves model performance (MAE = 0.366 and R2 = 0.89), and Shapley additive explanations analysis indicates that RT has the highest impact on model accuracy.
Collapse
Affiliation(s)
- Zaw-Myo Win
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada
| | - Allen M Y Cheong
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,School of Optometry, The Hong Kong Polytechnic University, Kowloon 999077, Hong Kong
| | - W Scott Hopkins
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong.,Department of Chemistry, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,Waterloo Institute for Nanotechnology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.,WaterMine Innovation, Inc., Waterloo, Ontario N0B 2T0, Canada
| |
Collapse
|
7
|
E B, D B, Elumalai VK, K U. Data-driven gait analysis for diagnosis and severity rating of Parkinson's disease. Med Eng Phys 2021; 91:54-64. [PMID: 34074466 DOI: 10.1016/j.medengphy.2021.03.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 03/03/2021] [Accepted: 03/19/2021] [Indexed: 10/21/2022]
Abstract
Parkinsons disease (PD) is the second most neurodegenerative disease, which results in gradual loss of movements. To diagnose PD in a clinical setting, clinicians generally use clinical manifestations like motor and non-motor symptoms and rate the severity based on unified Parkinsons disease rating scale (UPDRS). Such clinical assessment largely depends on the expertise and experience of the clinicians and it is subjective leading to variation in assessment between clinicians. As the gait of people with Parkinson's generally differs from gait of healthy age-matched adults, the assessment of gait abnormalities can lead to not only the diagnosis of PD but also the rating of severity level based on motor symptoms. Hence, in this paper, a data-driven gait classification framework using the supervised machine learning algorithms is presented. Using the publicly available gait datasets acquired using vertical ground reaction force (VGRF) sensors, we present a correlation based feature extraction technique for improved stage classification of PD. Significant biomarkers from spatiotemporal gait features are obtained based on the correlation, and the normal distribution of the gait dataset is assessed using the Shapiro-Wilk test. Subsequently, four supervised machine learning algorithms, namely, K-nearest neighbours (KNN), Naive Bayes (NB), Ensemble classifier (EC) and Support vector machine (SVM) are used to rate the severity level of PD according to the Hoehn and Yahr (H&Y) scale. The performance of the classifiers, assessed using the confusion matrix and parallel coordinate plots, highlights that SVM can result in a classification accuracy of 98.4%. Moreover, with minimal gait feature set acquired based on the rank correlation, the proposed approach outperforms several other state-of-the-art methods that have used the same dataset for PD stage classification.
Collapse
Affiliation(s)
- Balaji E
- Department of Biomedical Engineering, PSG College of Technology, Coimbatore 641004, India.
| | - Brindha D
- Department of Biomedical Engineering, PSG College of Technology, Coimbatore 641004, India
| | - Vinodh Kumar Elumalai
- School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamilnadu 632014, India
| | - Umesh K
- Department of Biomedical Engineering, PSG College of Technology, Coimbatore 641004, India
| |
Collapse
|
8
|
Wang MWH, Goodman JM, Allen TEH. Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models. Chem Res Toxicol 2020; 34:217-239. [PMID: 33356168 DOI: 10.1021/acs.chemrestox.0c00316] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In recent times, machine learning has become increasingly prominent in predictive toxicology as it has shifted from in vivo studies toward in silico studies. Currently, in vitro methods together with other computational methods such as quantitative structure-activity relationship modeling and absorption, distribution, metabolism, and excretion calculations are being used. An overview of machine learning and its applications in predictive toxicology is presented here, including support vector machines (SVMs), random forest (RF) and decision trees (DTs), neural networks, regression models, naïve Bayes, k-nearest neighbors, and ensemble learning. The recent successes of these machine learning methods in predictive toxicology are summarized, and a comparison of some models used in predictive toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in predictive toxicology and offers insights into the possible areas of improvement in the field.
Collapse
Affiliation(s)
- Marcus W H Wang
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Timothy E H Allen
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.,MRC Toxicology Unit, University of Cambridge, Hodgkin Building, Lancaster Road, Leicester LE1 7HB, United Kingdom
| |
Collapse
|
9
|
Patel L, Shukla T, Huang X, Ussery DW, Wang S. Machine Learning Methods in Drug Discovery. Molecules 2020; 25:E5277. [PMID: 33198233 PMCID: PMC7696134 DOI: 10.3390/molecules25225277] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 12/30/2022] Open
Abstract
The advancements of information technology and related processing techniques have created a fertile base for progress in many scientific fields and industries. In the fields of drug discovery and development, machine learning techniques have been used for the development of novel drug candidates. The methods for designing drug targets and novel drug discovery now routinely combine machine learning and deep learning algorithms to enhance the efficiency, efficacy, and quality of developed outputs. The generation and incorporation of big data, through technologies such as high-throughput screening and high through-put computational analysis of databases used for both lead and target discovery, has increased the reliability of the machine learning and deep learning incorporated techniques. The use of these virtual screening and encompassing online information has also been highlighted in developing lead synthesis pathways. In this review, machine learning and deep learning algorithms utilized in drug discovery and associated techniques will be discussed. The applications that produce promising results and methods will be reviewed.
Collapse
Affiliation(s)
- Lauv Patel
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Tripti Shukla
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, AR 72467, USA;
| | - David W. Ussery
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - Shanzhi Wang
- Chemistry Department, University of Arkansas at Little Rock, Little Rock, AR 72204, USA; (L.P.); (T.S.)
| |
Collapse
|
10
|
Hussain W, Rasool N, Khan YD. Insights into Machine Learning-based Approaches for Virtual Screening in Drug Discovery: Existing Strategies and Streamlining Through FP-CADD. Curr Drug Discov Technol 2020; 18:463-472. [PMID: 32767944 DOI: 10.2174/1570163817666200806165934] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 07/01/2020] [Accepted: 07/03/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Machine learning is an active area of research in computer science by the availability of big data collection of all sorts prompting interest in the development of novel tools for data mining. Machine learning methods have wide applications in computer-aided drug discovery methods. Most incredible approaches to machine learning are used in drug designing, which further aid the process of biological modelling in drug discovery. Mainly, two main categories are present which are Ligand-Based Virtual Screening (LBVS) and Structure-Based Virtual Screening (SBVS), however, the machine learning approaches fall mostly in the category of LBVS. OBJECTIVES This study exposits the major machine learning approaches being used in LBVS. Moreover, we have introduced a protocol named FP-CADD which depicts a 4-steps rule of thumb for drug discovery, the four protocols of computer-aided drug discovery (FP-CADD). Various important aspects along with SWOT analysis of FP-CADD are also discussed in this article. CONCLUSION By this thorough study, we have observed that in LBVS algorithms, Support Vector Machines (SVM) and Random Forest (RF) are those which are widely used due to high accuracy and efficiency. These virtual screening approaches have the potential to revolutionize the drug designing field. Also, we believe that the process flow presented in this study, named FP-CADD, can streamline the whole process of computer-aided drug discovery. By adopting this rule, the studies related to drug discovery can be made homogeneous and this protocol can also be considered as an evaluation criterion in the peer-review process of research articles.
Collapse
Affiliation(s)
| | | | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
11
|
Korkmaz S. Deep Learning-Based Imbalanced Data Classification for Drug Discovery. J Chem Inf Model 2020; 60:4180-4190. [DOI: 10.1021/acs.jcim.9b01162] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Selçuk Korkmaz
- Trakya University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Edirne, Turkey
| |
Collapse
|
12
|
Distinguishing drug/non-drug-like small molecules in drug discovery using deep belief network. Mol Divers 2020; 25:827-838. [PMID: 32193758 DOI: 10.1007/s11030-020-10065-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Accepted: 02/26/2020] [Indexed: 10/24/2022]
Abstract
The advent of computational methods for efficient prediction of the druglikeness of small molecules and their ever-burgeoning applications in the fields of medicinal chemistry and drug industries have been a profound scientific development, since only a few amounts of the small molecule libraries were identified as approvable drugs. In this study, a deep belief network was utilized to construct a druglikeness classification model. For this purpose, small molecules and approved drugs from the ZINC database were selected for the unsupervised pre-training step and supervised training step. Various binary fingerprints such as Macc 166 bit, PubChem 881 bit, and Morgan 2048 bit as data features were investigated. The report revealed that using an unsupervised pre-training phase can lead to a good performance model and generalizability capability. Accuracy, precision, and recall of the model for Macc features were 97%, 96%, and 99%, respectively. For more consideration about the generalizability of the model, the external data by expression and investigational drugs in drug banks as drug data and randomly selected data from the ZINC database as non-drug were created. The results confirmed the good performance and generalizability capability of the model. Also, the outcomes depicted that a large proportion of misclassified non-drug small molecules ascertain the bioavailability conditions and could be investigated as a drug in the future. Furthermore, our model attempted to tap potential opportunities as a drug filter in drug discovery.
Collapse
|
13
|
Saddala MS, Lennikov A, Huang H. Discovery of Small-Molecule Activators for Glucose-6-Phosphate Dehydrogenase (G6PD) Using Machine Learning Approaches. Int J Mol Sci 2020; 21:ijms21041523. [PMID: 32102234 PMCID: PMC7073180 DOI: 10.3390/ijms21041523] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 02/18/2020] [Accepted: 02/21/2020] [Indexed: 02/06/2023] Open
Abstract
Glucose-6-Phosphate Dehydrogenase (G6PD) is a ubiquitous cytoplasmic enzyme converting glucose-6-phosphate into 6-phosphogluconate in the pentose phosphate pathway (PPP). The G6PD deficiency renders the inability to regenerate glutathione due to lack of Nicotine Adenosine Dinucleotide Phosphate (NADPH) and produces stress conditions that can cause oxidative injury to photoreceptors, retinal cells, and blood barrier function. In this study, we constructed pharmacophore-based models based on the complex of G6PD with compound AG1 (G6PD activator) followed by virtual screening. Fifty-three hit molecules were mapped with core pharmacophore features. We performed molecular descriptor calculation, clustering, and principal component analysis (PCA) to pharmacophore hit molecules and further applied statistical machine learning methods. Optimal performance of pharmacophore modeling and machine learning approaches classified the 53 hits as drug-like (18) and nondrug-like (35) compounds. The drug-like compounds further evaluated our established cheminformatics pipeline (molecular docking and in silico ADMET (absorption, distribution, metabolism, excretion and toxicity) analysis). Finally, five lead molecules with different scaffolds were selected by binding energies and in silico ADMET properties. This study proposes that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability to find potential G6PD activators used for G6PD deficiency diseases. Moreover, these compounds can be considered as safe agents for further validation studies at the cell level, animal model, and even clinic setting.
Collapse
|
14
|
Cerruela-García G, Pérez-Parra Toledano J, de Haro-García A, García-Pedrajas N. Influence of feature rankers in the construction of molecular activity prediction models. J Comput Aided Mol Des 2020; 34:305-325. [PMID: 31893338 DOI: 10.1007/s10822-019-00273-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 12/20/2019] [Indexed: 02/07/2023]
Abstract
In the construction of activity prediction models, the use of feature ranking methods is a useful mechanism for extracting information for ranking features in terms of their significance to develop predictive models. This paper studies the influence of feature rankers in the construction of molecular activity prediction models; for this purpose, a comparative study of fourteen rankings methods for feature selection was conducted. The activity prediction models were constructed using four well-known classifiers and a wide collection of datasets. The ranking algorithms were compared considering the performance of these classifiers using different metrics and the consistency of the ranked features.
Collapse
Affiliation(s)
- Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain.
| | - José Pérez-Parra Toledano
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| | - Aída de Haro-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| |
Collapse
|
15
|
Hassanzadeh P, Atyabi F, Dinarvand R. The significance of artificial intelligence in drug delivery system design. Adv Drug Deliv Rev 2019; 151-152:169-190. [PMID: 31071378 DOI: 10.1016/j.addr.2019.05.001] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/14/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
Over the last decade, increasing interest has been attracted towards the application of artificial intelligence (AI) technology for analyzing and interpreting the biological or genetic information, accelerated drug discovery, and identification of the selective small-molecule modulators or rare molecules and prediction of their behavior. Application of the automated workflows and databases for rapid analysis of the huge amounts of data and artificial neural networks (ANNs) for development of the novel hypotheses and treatment strategies, prediction of disease progression, and evaluation of the pharmacological profiles of drug candidates may significantly improve treatment outcomes. Target fishing (TF) by rapid prediction or identification of the biological targets might be of great help for linking targets to the novel compounds. AI and TF methods in association with human expertise may indeed revolutionize the current theranostic strategies, meanwhile, validation approaches are necessary to overcome the potential challenges and ensure higher accuracy. In this review, the significance of AI and TF in the development of drugs and delivery systems and the potential challenging issues have been highlighted.
Collapse
Affiliation(s)
- Parichehr Hassanzadeh
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Fatemeh Atyabi
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Rassoul Dinarvand
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| |
Collapse
|
16
|
Onay A, Onay M. A Drug Decision Support System for Developing a Successful Drug Candidate Using Machine Learning Techniques. Curr Comput Aided Drug Des 2019; 16:407-419. [PMID: 31438830 DOI: 10.2174/1573409915666190716143601] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 04/24/2019] [Accepted: 05/06/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Virtual screening of candidate drug molecules using machine learning techniques plays a key role in pharmaceutical industry to design and discovery of new drugs. Computational classification methods can determine drug types according to the disease groups and distinguish approved drugs from withdrawn ones. INTRODUCTION Classification models developed in this study can be used as a simple filter in drug modelling to eliminate potentially inappropriate molecules in the early stages. In this work, we developed a Drug Decision Support System (DDSS) to classify each drug candidate molecule as potentially drug or non-drug and to predict its disease group. METHODS Molecular descriptors were identified for the determination of a number of rules in drug molecules. They were derived using ADRIANA.Code program and Lipinski's rule of five. We used Artificial Neural Network (ANN) to classify drug molecules correctly according to the types of diseases. Closed frequent molecular structures in the form of subgraph fragments were also obtained with Gaston algorithm included in ParMol Package to find common molecular fragments for withdrawn drugs. RESULTS We observed that TPSA, XlogP Natoms, HDon_O and TPSA are the most distinctive features in the pool of the molecular descriptors and evaluated the performances of classifiers on all datasets and found that classification accuracies are very high on all the datasets. Neural network models achieved 84.6% and 83.3% accuracies on test sets including cardiac therapy, anti-epileptics and anti-parkinson drugs with approved and withdrawn drugs for drug classification problems. CONCLUSION The experimental evaluation shows that the system is promising at determination of potential drug molecules to classify drug molecules correctly according to the types of diseases.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, Faculty of Engineering & Architecture, Kafkas University, Kars, 36100, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Faculty of Engineering, Van Yuzuncu Yil University, 65100, Van, Turkey
| |
Collapse
|
17
|
Recognition of Pharmacological Bi-Heterocyclic Compounds by Using Terahertz Time Domain Spectroscopy and Chemometrics. SENSORS 2019; 19:s19153349. [PMID: 31366175 PMCID: PMC6696483 DOI: 10.3390/s19153349] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 07/20/2019] [Accepted: 07/27/2019] [Indexed: 12/22/2022]
Abstract
In this study, we presented the concept and implementation of a fully functional system for the recognition of bi-heterocyclic compounds. We have conducted research into the application of machine learning methods to correctly recognize compounds based on THz spectra, and we have described the process of selecting optimal parameters for the kernel support vector machine (KSVM) with an additional `unknown' class. The chemical compounds used in the study contain a target molecule, used in pharmacy to combat inflammatory states formed in living organisms. Ready-made medical products with similar properties are commonly referred to as non-steroidal anti-inflammatory drugs (NSAIDs) once authorised on the pharmaceutical market. It was crucial to clearly determine whether the tested sample is a chemical compound known to researchers or is a completely new structure which should be additionally tested using other spectrometric methods. Our approach allows us to achieve 100% accuracy of the classification of the tested chemical compounds in the time of several milliseconds counted for 30 samples of the test set. It fits perfectly into the concept of rapid recognition of bi-heterocyclic compounds without the need to analyse the percentage composition of compound components, assuming that the sample is classified in a known group. The method allows us to minimize testing costs and significant reduction of the time of analysis.
Collapse
|
18
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 343] [Impact Index Per Article: 68.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
19
|
Shen Y, Yuan K, Yang M, Tang B, Li Y, Du N, Lei K. KMR: knowledge-oriented medicine representation learning for drug-drug interaction and similarity computation. J Cheminform 2019; 11:22. [PMID: 30874969 PMCID: PMC6419809 DOI: 10.1186/s13321-019-0342-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 03/01/2019] [Indexed: 02/07/2023] Open
Abstract
Efficient representations of drugs provide important support for healthcare analytics, such as drug-drug interaction (DDI) prediction and drug-drug similarity (DDS) computation. However, incomplete annotated data and drug feature sparseness create substantial barriers for drug representation learning, making it difficult to accurately identify new drug properties prior to public release. To alleviate these deficiencies, we propose KMR, a knowledge-oriented feature-driven method which can learn drug related knowledge with an accurate representation. We conduct series of experiments on real-world medical datasets to demonstrate that KMR is capable of drug representation learning. KMR can support to discover meaningful DDI with an accuracy rate of 92.19%, demonstrating that techniques developed in KMR significantly improve the prediction quality for new drugs not seen at training. Experimental results also indicate that KMR can identify DDS with an accuracy rate of 88.7% by facilitating drug knowledge, outperforming existing state-of-the-art drug similarity measures.
Collapse
Affiliation(s)
- Ying Shen
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
| | - Kaiqi Yuan
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
| | - Min Yang
- SIAT, Chinese Academy of Sciences, 518055 Shenzhen, People’s Republic of China
| | - Buzhou Tang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055 People’s Republic of China
| | | | - Nan Du
- Tencent Medical AI Lab, Palo Alto, USA
| | - Kai Lei
- The Shenzhen Key Lab for Information Centric Networking and Blockchain Techologies(ICNLab), School of Electronics and Computer Engineering, Peking University Shenzhen Graduate School, 518055 Shenzhen, People’s Republic of China
- PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
20
|
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM. Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 2018; 14:23-33. [PMID: 30488731 DOI: 10.1080/17460441.2019.1549033] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Novel drug discovery remains an enormous challenge, with various computer-aided drug design (CADD) approaches having been widely employed for this purpose. CADD, specifically the commonly used support vector machines (SVMs), can employ machine learning techniques. SVMs and their variations offer numerous drug discovery applications, which range from the classification of substances (as active or inactive) to the construction of regression models and the ranking/virtual screening of databased compounds. Areas covered: Herein, the authors consider some of the applications of SVMs in medicinal chemistry, illustrating their main advantages and disadvantages, as well as trends in their utilization, via the available published literature. The aim of this review is to provide an up-to-date review of the recent applications of SVMs in drug discovery as described by the literature, thereby highlighting their strengths, weaknesses, and future challenges. Expert opinion: Techniques based on SVMs are considered as powerful approaches in early drug discovery. The ability of SVMs to classify active or inactive compounds has enabled the prioritization of substances for virtual screening. Indeed, one of the main advantages of SVMs is related to their potential in the analysis of nonlinear problems. However, despite successes in employing SVMs, the challenges of improving accuracy remain.
Collapse
Affiliation(s)
- Vinicius Gonçalves Maltarollo
- a Departamento de Produtos Farmacêuticos, Faculdade de Farmácia , Universidade Federal de Minas Gerais , Belo Horizonte , Brazil
| | - Thales Kronenberger
- b Department of Internal Medicine VIII , University Hospital of Tübingen , Tübingen , Germany
| | - Gabriel Zarzana Espinoza
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Patricia Rufino Oliveira
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Kathia Maria Honorio
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil.,d Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , Santo André , Brazil
| |
Collapse
|
21
|
A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0334-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
22
|
Petinrin OO, Saeed F. Bioactive molecule prediction using majority voting-based ensemble method. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-169596] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Faisal Saeed
- College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
- Department of Information Systems, Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru, Johor, Malaysia
| |
Collapse
|
23
|
Qasim MK, Algamal ZY, Ali HTM. A binary QSAR model for classifying neuraminidase inhibitors of influenza A viruses (H1N1) using the combined minimum redundancy maximum relevancy criterion with the sparse support vector machine. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:517-527. [PMID: 30037283 DOI: 10.1080/1062936x.2018.1491414] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2018] [Indexed: 06/08/2023]
Abstract
Quantitative structure-activity relationship (QSAR) classification modelling with descriptor selection has become increasingly important because of the existence of large datasets in terms of either the number of compounds or the number of descriptors. Descriptor selection can improve the accuracy of QSAR classification studies and reduce their computation complexity by removing the irrelevant and redundant descriptors. In this paper, a two-stage classification approach is proposed by combining the minimum redundancy maximum relevancy criterion with the sparse support vector machine. The experimental results of classifying the neuraminidase inhibitors of influenza A (H1N1) viruses show that the proposed method is able to effectively outperform other sparse alternatives methods in terms of classification performance and the number of selected descriptors.
Collapse
Affiliation(s)
- M K Qasim
- a Department of General Science , University of Mosul , Mosul , Iraq
| | - Z Y Algamal
- b Department of Statistics and Informatics , University of Mosul , Mosul , Iraq
| | - H T Mohammad Ali
- c College of Computers and Information Technology , Nawroz University , Kurdistan region , Iraq
| |
Collapse
|
24
|
Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models. Comput Biol Chem 2017; 69:110-119. [DOI: 10.1016/j.compbiolchem.2017.05.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 12/18/2016] [Accepted: 05/26/2017] [Indexed: 12/23/2022]
|
25
|
Algamal ZY, Qasim MK, Ali HTM. A QSAR classification model for neuraminidase inhibitors of influenza A viruses (H1N1) based on weighted penalized support vector machine. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:415-426. [PMID: 28539063 DOI: 10.1080/1062936x.2017.1326402] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 05/01/2017] [Indexed: 06/07/2023]
Abstract
Descriptor selection is a procedure widely used in chemometrics. The aim is to select the best subset of descriptors relevant to the quantitative structure-activity relationship (QSAR) study being considered. In this paper, a new descriptor selection method for the QSAR classification model is proposed by adding a new weight inside L1-norm. The experimental results from classifying the neuraminidase inhibitors of influenza A viruses (H1N1) demonstrate that the proposed method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance and the number of selected descriptors.
Collapse
Affiliation(s)
- Z Y Algamal
- a Department of Statistics and Informatics , University of Mosul , Mosul , Iraq
| | - M K Qasim
- b Department of General Science , University of Mosul , Mosul , Iraq
| | - H T M Ali
- c College of Computers and Information Technology , Nawroz University , Duhok , Iraq
| |
Collapse
|
26
|
Onay A, Onay M, Abul O. Classification of nervous system withdrawn and approved drugs with ToxPrint features via machine learning strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 142:9-19. [PMID: 28325450 DOI: 10.1016/j.cmpb.2017.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 01/20/2017] [Accepted: 02/08/2017] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVES Early-phase virtual screening of candidate drug molecules plays a key role in pharmaceutical industry from data mining and machine learning to prevent adverse effects of the drugs. Computational classification methods can distinguish approved drugs from withdrawn ones. We focused on 6 data sets including maximum 110 approved and 110 withdrawn drugs for all and nervous system diseases to distinguish approved drugs from withdrawn ones. METHODS In this study, we used support vector machines (SVMs) and ensemble methods (EMs) such as boosted and bagged trees to classify drugs into approved and withdrawn categories. Also, we used CORINA Symphony program to identify Toxprint chemotypes including over 700 predefined chemotypes for determination of risk and safety assesment of candidate drug molecules. In addition, we studied nervous system withdrawn drugs to determine the key fragments with The ParMol package including gSpan algorithm. RESULTS According to our results, the descriptors named as the number of total chemotypes and bond CN_amine_aliphatic_generic were more significant descriptors. The developed Medium Gaussian SVM model reached 78% prediction accuracy on test set for drug data set including all disease. Here, bagged tree and linear SVM models showed 89% of accuracies for phycholeptics and psychoanaleptics drugs. A set of discriminative fragments in nervous system withdrawn drug (NSWD) data sets was obtained. These fragments responsible for the drugs removed from market were benzene, toluene, N,N-dimethylethylamine, crotylamine, 5-methyl-2,4-heptadiene, octatriene and carbonyl group. CONCLUSION This paper covers the development of computational classification methods to distinguish approved drugs from withdrawn ones. In addition, the results of this study indicated the identification of discriminative fragments is of significance to design a new nervous system approved drugs with interpretation of the structures of the NSWDs.
Collapse
Affiliation(s)
- Aytun Onay
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| | - Melih Onay
- Department of Environmental Engineering, Computational & Experimental Biochemistry Lab, Yuzuncu Yil University, 65080, Van, Turkey.
| | - Osman Abul
- Department of Computer Engineering, TOBB University of Economics & Technology, 06560, Ankara, Turkey
| |
Collapse
|
27
|
Lima AN, Philot EA, Trossini GHG, Scott LPB, Maltarollo VG, Honorio KM. Use of machine learning approaches for novel drug discovery. Expert Opin Drug Discov 2016; 11:225-39. [PMID: 26814169 DOI: 10.1517/17460441.2016.1146250] [Citation(s) in RCA: 136] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
INTRODUCTION The use of computational tools in the early stages of drug development has increased in recent decades. Machine learning (ML) approaches have been of special interest, since they can be applied in several steps of the drug discovery methodology, such as prediction of target structure, prediction of biological activity of new ligands through model construction, discovery or optimization of hits, and construction of models that predict the pharmacokinetic and toxicological (ADMET) profile of compounds. AREAS COVERED This article presents an overview on some applications of ML techniques in drug design. These techniques can be employed in ligand-based drug design (LBDD) and structure-based drug design (SBDD) studies, such as similarity searches, construction of classification and/or prediction models of biological activity, prediction of secondary structures and binding sites docking and virtual screening. EXPERT OPINION Successful cases have been reported in the literature, demonstrating the efficiency of ML techniques combined with traditional approaches to study medicinal chemistry problems. Some ML techniques used in drug design are: support vector machine, random forest, decision trees and artificial neural networks. Currently, an important application of ML techniques is related to the calculation of scoring functions used in docking and virtual screening assays from a consensus, combining traditional and ML techniques in order to improve the prediction of binding sites and docking solutions.
Collapse
Affiliation(s)
- Angélica Nakagawa Lima
- a Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , São Paulo , Brazil
| | - Eric Allison Philot
- a Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , São Paulo , Brazil
| | | | - Luis Paulo Barbour Scott
- c Centro de Matemática, Computação e Cognição , Universidade Federal do ABC , São Paulo , Brazil
| | | | - Kathia Maria Honorio
- a Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , São Paulo , Brazil.,d Escola de Artes, Ciências e Humanidades , Universidade de São Paulo , São Paulo , Brazil
| |
Collapse
|
28
|
Takeda S, Kaneko H, Funatsu K. Chemical-Space-Based de Novo Design Method To Generate Drug-Like Molecules. J Chem Inf Model 2016; 56:1885-1893. [PMID: 27632418 DOI: 10.1021/acs.jcim.6b00038] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
To discover drug compounds in chemical space containing an enormous number of compounds, a structure generator is required to produce virtual drug-like chemical structures. The de novo design algorithm for exploring chemical space (DAECS) visualizes the activity distribution on a two-dimensional plane corresponding to chemical space and generates structures in a target area on a plane selected by the user. In this study, we modify the DAECS to enable the user to select a target area to consider properties other than activity and improve the diversity of the generated structures by visualizing the drug-likeness distribution and the activity distribution, generating structures by substructure-based structural changes, including addition, deletion, and substitution of substructures, as well as the slight structural changes used in the DAECS. Through case studies using ligand data for the human adrenergic alpha2A receptor and the human histamine H1 receptor, the modified DAECS can generate high diversity drug-like structures, and the usefulness of the modification of the DAECS is verified.
Collapse
Affiliation(s)
- Shunichi Takeda
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Hiromasa Kaneko
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical Systems Engineering, The University of Tokyo , 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
29
|
Sari S, Karakurt A, Uslu H, Kaynak FB, Çalış Ü, Dalkara S. New (arylalkyl)azole derivatives showing anticonvulsant effects could have VGSC and/or GABA AR affinity according to molecular modeling studies. Eur J Med Chem 2016; 124:407-416. [PMID: 27597416 DOI: 10.1016/j.ejmech.2016.08.032] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Revised: 07/11/2016] [Accepted: 08/14/2016] [Indexed: 01/06/2023]
Abstract
(Arylalkyl)azoles (AAAs) emerged as a novel class of antiepileptic agents with the invention of nafimidone and denzimol. Several AAA derivatives with potent anticonvulsant activities have been reported so far, however neurotoxicity was usually an issue. We prepared a set of ester derivatives of 1-(2-naphthyl)-2-(1H-1,2,4-triazol-1-yl)ethanone oxime and evaluated their anticonvulsant and neurotoxic effects in mice. Most of our compounds were protective against maximal electroshock (MES)- and/or subcutaneous metrazol (s.c. MET)-induced seizures whereas none of them showed neurotoxicity. Nafimidone and denzimol have an activity profile similar to that of phenytoin or carbamazepine, both of which are known to inhibit voltage-gated sodium channels (VGSCs) as well as to enhance γ-aminobutiric acid (GABA)-mediated response. In order to get insights into the effects of our compounds on VGSCs and A-type GABA receptors (GABAARs) we performed docking studies using homology model of Na+ channel inner pore and GABAAR as docking scaffolds. We found that our compounds bind VGSCs in similar ways as phenytoin, carbamazepine, and lamotrigine. They showed strong affinity to benzodiazepine (BZD) binding site and their binding interactions were mainly complied with the experimental data and the reported BZD binding model.
Collapse
Affiliation(s)
- Suat Sari
- Hacettepe University, Faculty of Pharmacy, Department of Pharmaceutical Chemistry, 06100, Ankara, Turkey
| | - Arzu Karakurt
- İnönü University, Faculty of Pharmacy, Department of Pharmaceutical Chemistry, 44280, Malatya, Turkey.
| | - Harun Uslu
- İnönü University, Faculty of Pharmacy, Department of Pharmaceutical Chemistry, 44280, Malatya, Turkey
| | - F Betül Kaynak
- Hacettepe University, Faculty of Engineering, Department of Physics Engineering, 06532, Ankara, Turkey
| | - Ünsal Çalış
- Hacettepe University, Faculty of Pharmacy, Department of Pharmaceutical Chemistry, 06100, Ankara, Turkey
| | - Sevim Dalkara
- Hacettepe University, Faculty of Pharmacy, Department of Pharmaceutical Chemistry, 06100, Ankara, Turkey
| |
Collapse
|
30
|
Kumar H, Raj U, Gupta S, Varadwaj PK. In-silico identification of inhibitors against mutated BCR-ABL protein of chronic myeloid leukemia: a virtual screening and molecular dynamics simulation study. J Biomol Struct Dyn 2016; 34:2171-83. [PMID: 26479578 DOI: 10.1080/07391102.2015.1110046] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Aberrant and proliferative expression of the oncogene BCR-ABL in the bone marrow cells had been proven as the prime cause of chronic myeloid leukemia (CML). It has been established that tyrosine kinase domain of BCR-ABL protein is a potential therapeutic target for the treatment of CML. Imatinib is considered as a first-generation drug that can inhibit the enzymatic action by inhibiting the ATP binding with BCR-ABL protein. Later on, insensitivity of CML cells towards Imatinib has been observed may be due to mutation in tyrosine kinase domain of the ABL receptor. Subsequently, some other second-generation drugs have also been reported viz. Baustinib, Nilotinib, Dasatinib, Ponatinib, Bafetinib, etc., which can able to combat against mutated domain of ABL tyrosine kinase protein. By taking into account of bioavailability and resistance developed, there is an utmost need to find some more inhibitors for the mutated ABL tyrosine kinase protein. For virtual screening, a data-set has been generated by collecting the all available drug like natural compounds from ZINC and Drug Bank databases. Comparative docking analysis was also carried out on the active site of ABL tyrosine kinase receptor with reported reference inhibitors. Molecular dynamics simulation of the best screened interacting complex was done for 50 ns to validate the stability of the system. These selected inhibitors were further validated and analyzed through pharmacokinetics properties and series of ADMET parameters by in silico methods. Considering the above said parameters proposed molecules are concluded as potential leads for drug designing pipeline against CML.
Collapse
Affiliation(s)
- Himansu Kumar
- a Department of Bioinformatics , Indian Institute of Information Technology Allahabad , Allahabad 211012 , Uttar Pradesh , India
| | - Utkarsh Raj
- a Department of Bioinformatics , Indian Institute of Information Technology Allahabad , Allahabad 211012 , Uttar Pradesh , India
| | - Saurabh Gupta
- a Department of Bioinformatics , Indian Institute of Information Technology Allahabad , Allahabad 211012 , Uttar Pradesh , India
| | - Pritish Kumar Varadwaj
- a Department of Bioinformatics , Indian Institute of Information Technology Allahabad , Allahabad 211012 , Uttar Pradesh , India
| |
Collapse
|
31
|
Fang L, Zhao H, Wang P, Yu M, Yan J, Cheng W, Chen P. Feature selection method based on mutual information and class separability for dimension reduction in multidimensional time series for clinical data. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2015.05.011] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
32
|
Colak C, Karaman E, Turtay MG. Application of knowledge discovery process on the prediction of stroke. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:181-185. [PMID: 25827533 DOI: 10.1016/j.cmpb.2015.03.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Revised: 02/17/2015] [Accepted: 03/04/2015] [Indexed: 06/04/2023]
Abstract
OBJECTIVE Stroke is a prominent life-threatening disease in the world. The current study was performed to predict the outcome of stroke using knowledge discovery process (KDP) methods, artificial neural networks (ANN) and support vector machine (SVM) models. MATERIALS AND METHODS The records of 297 (130 sick and 167 healthy) individuals were acquired from the databases of the department of emergency medicine. Nine predictors (coronary artery disease, diabetes mellitus, hypertension, history of cerebrovascular disease, atrial fibrillation, smoking, the findings of carotid Doppler ultrasonography [normal, plaque, plaque+stenosis≥50%], the levels of cholesterol and C-reactive protein) were used for predicting the stroke. Feature selection based on the Cramer's V test was carried out for reducing the predictors. Multilayer perceptron (MLP) ANN and SVM with radial basis function (RBF) kernel were used for the prediction based on the selected predictors. RESULTS The accuracy values were 81.82% for ANN and 80.38% for SVM in the training dataset (n=209), and 85.9% for ANN and 84.62% for SVM in the testing dataset (n=78), respectively. ANN and SVM models yielded area under curve (AUC) values of 0.905 and 0.899 in the training dataset, and 0.928 and 0.91 in the testing dataset, consecutively. CONCLUSION The findings of the current study pointed out that ANN had more predictive performance when compared with SVM in predicting stroke. The proposed ANN model would be useful when making clinical decisions regarding stroke.
Collapse
Affiliation(s)
- Cemil Colak
- Inonu University, Faculty of Medicine, Department of Biostatistics and Medical Informatics, Malatya, Turkey.
| | - Esra Karaman
- Inonu University, Faculty of Medicine, Department of Emergency Medicine, Malatya, Turkey
| | - M Gokhan Turtay
- Inonu University, Faculty of Medicine, Department of Emergency Medicine, Malatya, Turkey
| |
Collapse
|
33
|
Korkmaz S, Zararsiz G, Goksuluk D. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS One 2015; 10:e0124600. [PMID: 25928885 PMCID: PMC4415797 DOI: 10.1371/journal.pone.0124600] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 03/03/2015] [Indexed: 12/18/2022] Open
Abstract
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
- * E-mail:
| | - Gokmen Zararsiz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| |
Collapse
|
34
|
Clustering molecular dynamics trajectories for optimizing docking experiments. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2015; 2015:916240. [PMID: 25873944 PMCID: PMC4385651 DOI: 10.1155/2015/916240] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 03/05/2015] [Indexed: 12/03/2022]
Abstract
Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.
Collapse
|