1
|
Liu Q, He D, Fan M, Wang J, Cui Z, Wang H, Mi Y, Li N, Meng Q, Hou Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J Chem Inf Model 2024. [PMID: 38949724 DOI: 10.1021/acs.jcim.4c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
Collapse
Affiliation(s)
- Qing Liu
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Dakuo He
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Mengmeng Fan
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Jinpeng Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Zeyu Cui
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Hao Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Yan Mi
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Ning Li
- School of Traditional Chinese Materia Medica, Key Laboratory for TCM Material Basis Study and Innovative Drug Development of Shenyang City, Shenyang Pharmaceutical University, Shenyang 110016, P. R. China
| | - Qingqi Meng
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Yue Hou
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| |
Collapse
|
2
|
Julkaew S, Wongsirichot T, Damkliang K, Sangthawan P. Improving accuracy of vascular access quality classification in hemodialysis patients using deep learning with K highest score feature selection. J Int Med Res 2024; 52:3000605241232519. [PMID: 38573764 PMCID: PMC10996358 DOI: 10.1177/03000605241232519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/26/2024] [Indexed: 04/05/2024] Open
Abstract
OBJECTIVE To develop and evaluate a novel feature selection technique, using photoplethysmography (PPG) sensors, for enhancing the performance of deep learning models in classifying vascular access quality in hemodialysis patients. METHODS This cross-sectional study involved creating a novel feature selection method based on SelectKBest principles, specifically designed to optimize deep learning models for PPG sensor data, in hemodialysis patients. The method effectiveness was assessed by comparing the performance of multiple deep learning models using the feature selection approach versus complete feature set. The model with the highest accuracy was then trained and tested using a 70:30 approach, respectively, with the full dataset and the SelectKBest dataset. Performance results were compared using Student's paired t-test. RESULTS Data from 398 hemodialysis patients were included. The 1-dimensional convolutional neural network (CNN1D) displayed the highest accuracy among different models. Implementation of the SelectKBest-based feature selection technique resulted in a statistically significant improvement in the CNN1D model's performance, achieving an accuracy of 92.05% (with feature selection) versus 90.79% (with full feature set). CONCLUSION These findings suggest that the newly developed feature selection approach might aid in accurately predicting vascular access quality in hemodialysis patients. This advancement may contribute to the development of reliable diagnostic tools for identifying vascular complications, such as stenosis, potentially improving patient outcomes and their quality of life.
Collapse
Affiliation(s)
- Sarayut Julkaew
- College of Digital Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand
| | - Thakerng Wongsirichot
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand
| | - Kasikrit Damkliang
- Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand
| | - Pornpen Sangthawan
- Division of Nephrology, Department of Medicine, Faculty of Medicine, Prince of Songkhla University, Hat Yai, Songkhla, Thailand
| |
Collapse
|
3
|
Lai J, Chen Z, Liu J, Zhu C, Huang H, Yi Y, Cai G, Liao N. A radiogenomic multimodal and whole-transcriptome sequencing for preoperative prediction of axillary lymph node metastasis and drug therapeutic response in breast cancer: a retrospective, machine learning and international multicohort study. Int J Surg 2024; 110:2162-2177. [PMID: 38215256 PMCID: PMC11019980 DOI: 10.1097/js9.0000000000001082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 12/27/2023] [Indexed: 01/14/2024]
Abstract
BACKGROUND Axillary lymph nodes (ALN) status serves as a crucial prognostic indicator in breast cancer (BC). The aim of this study was to construct a radiogenomic multimodal model, based on machine learning and whole-transcriptome sequencing (WTS), to accurately evaluate the risk of ALN metastasis (ALNM), drug therapeutic response and avoid unnecessary axillary surgery in BC patients. METHODS In this study, conducted a retrospective analysis of 1078 BC patients from The Cancer Genome Atlas (TCGA), The Cancer Imaging Archive (TCIA), and Foshan cohort. These patients were divided into the TCIA cohort ( N =103), TCIA validation cohort ( N =51), Duke cohort ( N =138), Foshan cohort ( N =106), and TCGA cohort ( N =680). Radiological features were extracted from BC radiological images and differentially expressed gene expression was calibrated using technology. A support vector machine model was employed to screen radiological and genetic features, and a multimodal model was established based on radiogenomic and clinical pathological features to predict ALNM. The accuracy of the model predictions was assessed using the area under the curve (AUC) and the clinical benefit was measured using decision curve analysis. Risk stratification analysis of BC patients was performed by gene set enrichment analysis, differential comparison of immune checkpoint gene expression, and drug sensitivity testing. RESULTS For the prediction of ALNM, rad-score was able to significantly differentiate between ALN- and ALN+ patients in both the Duke and Foshan cohorts ( P <0.05). Similarly, the gene-score was able to significantly differentiate between ALN- and ALN+ patients in the TCGA cohort ( P <0.05). The radiogenomic multimodal nomogram demonstrated satisfactory performance in the TCIA cohort (AUC 0.82, 95% CI: 0.74-0.91) and the TCIA validation cohort (AUC 0.77, 95% CI: 0.63-0.91). In the risk sub-stratification analysis, there were significant differences in gene pathway enrichment between high and low-risk groups ( P <0.05). Additionally, different risk groups may exhibit varying treatment responses ( P <0.05). CONCLUSION Overall, the radiogenomic multimodal model employs multimodal data, including radiological images, genetic, and clinicopathological typing. The radiogenomic multimodal nomogram can precisely predict ALNM and drug therapeutic response in BC patients.
Collapse
Affiliation(s)
- Jianguo Lai
- Department of Breast Cancer, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Yuexiu District, Guangzhou, Guangdong
| | - Zijun Chen
- The Second Clinical School of Southern Medical University, Guangzhou
| | - Jie Liu
- Department of Breast Cancer, Affiliated Foshan Maternity and Child Healthcare Hospital, Southern Medical University
| | - Chao Zhu
- Department of Blood Transfusion, The First Affiliated Hospital of Nanchang University
| | - Haoxuan Huang
- Department of Urology, Third Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, People’s Republic of China
| | - Ying Yi
- Department of Radiology, The First People's Hospital of Foshan, Foshan, Guangdong
| | - Gengxi Cai
- Department of Breast Surgery, The First People’s Hospital of Foshan, Foshan, Guangdong
| | - Ning Liao
- Department of Breast Cancer, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Yuexiu District, Guangzhou, Guangdong
| |
Collapse
|
4
|
Cabral L, Calabro FJ, Foran W, Parr AC, Ojha A, Rasmussen J, Ceschin R, Panigrahy A, Luna B. Multivariate and regional age-related change in basal ganglia iron in neonates. Cereb Cortex 2024; 34:bhad456. [PMID: 38059685 DOI: 10.1093/cercor/bhad456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 10/31/2023] [Accepted: 11/01/2023] [Indexed: 12/08/2023] Open
Abstract
In the perinatal period, reward and cognitive systems begin trajectories, influencing later psychiatric risk. The basal ganglia is important for reward and cognitive processing but early development has not been fully characterized. To assess age-related development, we used a measure of basal ganglia physiology, specifically brain tissue iron, obtained from nT2* signal in resting-state functional magnetic resonance imaging (rsfMRI), associated with dopaminergic processing. We used data from the Developing Human Connectome Project (n = 464) to assess how moving from the prenatal to the postnatal environment affects rsfMRI nT2*, modeling gestational and postnatal age separately for basal ganglia subregions in linear models. We did not find associations with tissue iron and gestational age [range: 24.29-42.29] but found positive associations with postnatal age [range:0-17.14] in the pallidum and putamen, but not the caudate. We tested if there was an interaction between preterm birth and postnatal age, finding early preterm infants (GA < 35 wk) had higher iron levels and changed less over time. To assess multivariate change, we used support vector regression to predict age from voxel-wise-nT2* maps. We could predict postnatal but not gestational age when maps were residualized for the other age term. This provides evidence subregions differentially change with postnatal experience and preterm birth may disrupt trajectories.
Collapse
Affiliation(s)
- Laura Cabral
- Department of Radiology University of Pittsburgh, Pittsburgh, PA 15224, United States
| | - Finnegan J Calabro
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Department of Bioengineering, University of Pittsburgh, 15213, United States
| | - Will Foran
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Ashley C Parr
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Amar Ojha
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Jerod Rasmussen
- Development, Health and Disease Research Program, University of California, Irvine, CA 92697, United States
- Department of Pediatrics, University of California, Irvine, CA 92697, United States
| | - Rafael Ceschin
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15224, United States
| | - Ashok Panigrahy
- Department of Radiology University of Pittsburgh, Pittsburgh, PA 15224, United States
| | - Beatriz Luna
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|
5
|
Ferdous J, Rahman ME, Sraboni FS, Dutta AK, Rahman MS, Ali MR, Sikdar B, Khan A, Hasan MF. Assessment of the hypoglycemic and anti-hemostasis effects of Paederia foetida (L.) in controlling diabetes and thrombophilia combining in vivo and computational analysis. Comput Biol Chem 2023; 107:107954. [PMID: 37738820 DOI: 10.1016/j.compbiolchem.2023.107954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/03/2023] [Accepted: 09/05/2023] [Indexed: 09/24/2023]
Abstract
Paederia foetida is valued for its folk medicinal properties. This research aimed to assess the acute toxicity, hypoglycemic and anti-hemostasis properties of the methanolic extract of P. foetida leaves (PFLE). Acute toxicity of PFLE was performed on a mice model. Hypoglycemic and anti-hemostasis properties of PFLE were investigated on normal and streptozotocin-induced mice models. Deep learning, molecular docking, density functional theory, and molecular simulation techniques were employed to understand the underlying mechanisms through in silico study. Oral administration of PFLE at a dosage of 300 µg/kg body weight (BW) showed no signs of toxicity. Treatment with PFLE (300 µg/kg/BW) for 14 days resulted in a hypoglycemic condition and a 30.47% increase in body weight. Additionally, PFLE mixed with blood exhibited a 44.6% anti-hemostasis effect. Deep learning predicted the inhibitory concentration (pIC50, nM) of Cleomiscosins against SGLT2 and FXa to be 7.478 and 6.017, respectively. Molecular docking analysis revealed strong binding interactions of Cleomiscosins with crucial residues of the target proteins, exhibiting binding energies of -8.2 kcal/mol and -7.1 kcal/mol, respectively. ADME/Tox predictions indicated favorable pharmacokinetic properties of Cleomiscosins, and DFT calculations of frontier molecular orbitals analyzed the stability and reactivity of these compounds. Molecular simulation dynamics, principal component analysis and MM-PBSA calculation demonstrated the stable, compact, and rigid nature of the protein-ligand complexes. The methanolic PFLE exhibited significant hypoglycemic and anti-hemostasis properties. Cleomiscosin may have inhibitory properties for the development of novel drugs to manage diabetes and thrombophilia in the near future.
Collapse
Affiliation(s)
- Jannatul Ferdous
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Ekhtiar Rahman
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Farzana Sayed Sraboni
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Amit Kumar Dutta
- Department of Microbiology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Siddikur Rahman
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Roushan Ali
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Biswanath Sikdar
- Department of Microbiology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Alam Khan
- Department of Pharmacy, University of Rajshahi, Rajshahi 6205, Bangladesh
| | - Md Faruk Hasan
- Department of Microbiology, University of Rajshahi, Rajshahi 6205, Bangladesh.
| |
Collapse
|
6
|
Moreira-Filho JT, Neves BJ, Cajas RA, Moraes JD, Andrade CH. Artificial intelligence-guided approach for efficient virtual screening of hits against Schistosoma mansoni. Future Med Chem 2023; 15:2033-2050. [PMID: 37937522 DOI: 10.4155/fmc-2023-0152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 10/06/2023] [Indexed: 11/09/2023] Open
Abstract
Background: The impact of schistosomiasis, which affects over 230 million people, emphasizes the urgency of developing new antischistosomal drugs. Artificial intelligence is vital in accelerating the drug discovery process. Methodology & results: We developed classification and regression machine learning models to predict the schistosomicidal activity of compounds not experimentally tested. The prioritized compounds were tested on schistosomula and adult stages of Schistosoma mansoni. Four compounds demonstrated significant activity against schistosomula, with 50% effective concentration values ranging from 9.8 to 32.5 μM, while exhibiting no toxicity in animal and human cell lines. Conclusion: These findings represent a significant step forward in the discovery of antischistosomal drugs. Further optimization of these active compounds can pave the way for their progression into preclinical studies.
Collapse
Affiliation(s)
- José Teófilo Moreira-Filho
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
| | - Bruno Junior Neves
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
| | - Rayssa Araujo Cajas
- Research Center on Neglected Diseases (NPDN), Universidade Guarulhos, Guarulhos, 07023-070, Brazil
| | - Josué de Moraes
- Research Center on Neglected Diseases (NPDN), Universidade Guarulhos, Guarulhos, 07023-070, Brazil
| | - Carolina Horta Andrade
- Laboratory of Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, 74605-170, Brazil
- Center for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirao Preto, University of São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
7
|
Han R, Yoon H, Kim G, Lee H, Lee Y. Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery. Pharmaceuticals (Basel) 2023; 16:1259. [PMID: 37765069 PMCID: PMC10537003 DOI: 10.3390/ph16091259] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/24/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Artificial intelligence (AI) has permeated various sectors, including the pharmaceutical industry and research, where it has been utilized to efficiently identify new chemical entities with desirable properties. The application of AI algorithms to drug discovery presents both remarkable opportunities and challenges. This review article focuses on the transformative role of AI in medicinal chemistry. We delve into the applications of machine learning and deep learning techniques in drug screening and design, discussing their potential to expedite the early drug discovery process. In particular, we provide a comprehensive overview of the use of AI algorithms in predicting protein structures, drug-target interactions, and molecular properties such as drug toxicity. While AI has accelerated the drug discovery process, data quality issues and technological constraints remain challenges. Nonetheless, new relationships and methods have been unveiled, demonstrating AI's expanding potential in predicting and understanding drug interactions and properties. For its full potential to be realized, interdisciplinary collaboration is essential. This review underscores AI's growing influence on the future trajectory of medicinal chemistry and stresses the importance of ongoing synergies between computational and domain experts.
Collapse
Affiliation(s)
| | | | | | | | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
8
|
Houssein EH, Hassan HN, Samee NA, Jamjoom MM. A Novel Hybrid Runge Kutta Optimizer with Support Vector Machine on Gene Expression Data for Cancer Classification. Diagnostics (Basel) 2023; 13:diagnostics13091621. [PMID: 37175012 PMCID: PMC10178557 DOI: 10.3390/diagnostics13091621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/05/2023] [Accepted: 04/18/2023] [Indexed: 05/15/2023] Open
Abstract
It is crucial to accurately categorize cancers using microarray data. Researchers have employed a variety of computational intelligence approaches to analyze gene expression data. It is believed that the most difficult part of the problem of cancer diagnosis is determining which genes are informative. Therefore, selecting genes to study as a starting point for cancer classification is common practice. We offer a novel approach that combines the Runge Kutta optimizer (RUN) with a support vector machine (SVM) as the classifier to select the significant genes in the detection of cancer tissues. As a means of dealing with the high dimensionality that characterizes microarray datasets, the preprocessing stage of the ReliefF method is implemented. The proposed RUN-SVM approach is tested on binary-class microarray datasets (Breast2 and Prostate) and multi-class microarray datasets in order to assess its efficacy (i.e., Brain Tumor1, Brain Tumor2, Breast3, and Lung Cancer). Based on the experimental results obtained from analyzing six different cancer gene expression datasets, the proposed RUN-SVM approach was found to statistically beat the other competing algorithms due to its innovative search technique.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia 61519, Egypt
| | - Hager N Hassan
- Faculty of Computers and Information, Minia University, Minia 61519, Egypt
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| | - Mona M Jamjoom
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
| |
Collapse
|
9
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
10
|
Chadha A, Dara R, Pearl DL, Sharif S, Poljak Z. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques. Prev Vet Med 2023; 216:105924. [PMID: 37224663 DOI: 10.1016/j.prevetmed.2023.105924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 03/17/2023] [Accepted: 04/21/2023] [Indexed: 05/26/2023]
Abstract
Over the past decades, avian influenza (AI) outbreaks have been reported across different parts of the globe, resulting in large-scale economic and livestock loss and, in some cases raising concerns about their zoonotic potential. The virulence and pathogenicity of H5Nx (e.g., H5N1, H5N2) AI strains for poultry could be inferred through various approaches, and it has been frequently performed by detecting certain pathogenicity markers in their haemagglutinin (HA) gene. The utilization of predictive modeling methods represents a possible approach to exploring this genotypic-phenotypic relationship for assisting experts in determining the pathogenicity of circulating AI viruses. Therefore, the main objective of this study was to evaluate the predictive performance of different machine learning (ML) techniques for in-silico prediction of pathogenicity of H5Nx viruses in poultry, using complete genetic sequences of the HA gene. We annotated 2137 H5Nx HA gene sequences based on the presence of the polybasic HA cleavage site (HACS) with 46.33% and 53.67% of sequences previously identified as highly pathogenic (HP) and low pathogenic (LP), respectively. We compared the performance of different ML classifiers (e.g., logistic regression (LR) with the lasso and ridge regularization, random forest (RF), K-nearest neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), and convolutional neural network (CNN)) for pathogenicity classification of raw H5Nx nucleotide and protein sequences using a 10-fold cross-validation technique. We found that different ML techniques can be successfully used for the pathogenicity classification of H5 sequences with ∼99% classification accuracy. Our results indicate that for pathogenicity classification of (1) aligned deoxyribonucleic acid (DNA) and protein sequences, with NB classifier had the lowest accuracies of 98.41% (+/-0.89) and 98.31% (+/-1.06), respectively; (2) aligned DNA and protein sequences, with LR (L1/L2), KNN, SVM (radial basis function (RBF)) and CNN classifiers had the highest accuracies of 99.20% (+/-0.54) and 99.20% (+/-0.38), respectively; (3) unaligned DNA and protein sequences, with CNN's achieved accuracies of 98.54% (+/-0.68) and 99.20% (+/-0.50), respectively. ML methods show potential for regular classification of H5Nx virus pathogenicity for poultry species, particularly when sequences containing regular markers were frequently present in the training dataset.
Collapse
Affiliation(s)
- Akshay Chadha
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada.
| | - Rozita Dara
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - David L Pearl
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Shayan Sharif
- Department of Pathobiology, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| |
Collapse
|
11
|
Astray G, Soria-Lopez A, Barreiro E, Mejuto JC, Cid-Samamed A. Machine Learning to Predict the Adsorption Capacity of Microplastics. NANOMATERIALS (BASEL, SWITZERLAND) 2023; 13:1061. [PMID: 36985954 PMCID: PMC10051191 DOI: 10.3390/nano13061061] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 03/10/2023] [Accepted: 03/11/2023] [Indexed: 06/18/2023]
Abstract
Nowadays, there is an extensive production and use of plastic materials for different industrial activities. These plastics, either from their primary production sources or through their own degradation processes, can contaminate ecosystems with micro- and nanoplastics. Once in the aquatic environment, these microplastics can be the basis for the adsorption of chemical pollutants, favoring that these chemical pollutants disperse more quickly in the environment and can affect living beings. Due to the lack of information on adsorption, three machine learning models (random forest, support vector machine, and artificial neural network) were developed to predict different microplastic/water partition coefficients (log Kd) using two different approximations (based on the number of input variables). The best-selected machine learning models present, in general, correlation coefficients above 0.92 in the query phase, which indicates that these types of models could be used for the rapid estimation of the absorption of organic contaminants on microplastics.
Collapse
Affiliation(s)
- Gonzalo Astray
- Universidade de Vigo, Departamento de Química Física, Facultade de Ciencias, 32004 Ourense, Spain
| | - Anton Soria-Lopez
- Universidade de Vigo, Departamento de Química Física, Facultade de Ciencias, 32004 Ourense, Spain
| | - Enrique Barreiro
- Universidade de Vigo, Departamento de Informática, Escola Superior de Enxeñaría Informática, 32004 Ourense, Spain
| | - Juan Carlos Mejuto
- Universidade de Vigo, Departamento de Química Física, Facultade de Ciencias, 32004 Ourense, Spain
| | - Antonio Cid-Samamed
- Universidade de Vigo, Departamento de Química Física, Facultade de Ciencias, 32004 Ourense, Spain
| |
Collapse
|
12
|
Kovačević S, Banjac MK, Podunavac-Kuzmanović S, Ajduković J, Salaković B, Rárová L, Đorđević M, Ivanov M. Local QSAR modeling of cytotoxic activity of newly designed androstane 3-oximes towards malignant melanoma cells. J Mol Struct 2023. [DOI: 10.1016/j.molstruc.2023.135272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
13
|
Shi M, Huang Z, Xiao G, Xu B, Ren Q, Zhao H. Estimating the Depth of Anesthesia from EEG Signals Based on a Deep Residual Shrinkage Network. SENSORS (BASEL, SWITZERLAND) 2023; 23:1008. [PMID: 36679805 PMCID: PMC9865536 DOI: 10.3390/s23021008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 06/17/2023]
Abstract
The reliable monitoring of the depth of anesthesia (DoA) is essential to control the anesthesia procedure. Electroencephalography (EEG) has been widely used to estimate DoA since EEG could reflect the effect of anesthetic drugs on the central nervous system (CNS). In this study, we propose that a deep learning model consisting mainly of a deep residual shrinkage network (DRSN) and a 1 × 1 convolution network could estimate DoA in terms of patient state index (PSI) values. First, we preprocessed the four raw channels of EEG signals to remove electrical noise and other physiological signals. The proposed model then takes the preprocessed EEG signals as inputs to predict PSI values. Then we extracted 14 features from the preprocessed EEG signals and implemented three conventional feature-based models as comparisons. A dataset of 18 patients was used to evaluate the models' performances. The results of the five-fold cross-validation show that there is a relatively high similarity between the ground-truth PSI values and the predicted PSI values of our proposed model, which outperforms the conventional models, and further, that the Spearman's rank correlation coefficient is 0.9344. In addition, an ablation experiment was conducted to demonstrate the effectiveness of the soft-thresholding module for EEG-signal processing, and a cross-subject validation was implemented to illustrate the robustness of the proposed method. In summary, the procedure is not merely feasible for estimating DoA by mimicking PSI values but also inspired us to develop a precise DoA-estimation system with more convincing assessments of anesthetization levels.
Collapse
Affiliation(s)
- Meng Shi
- School of Electronics, Peking University, Beijing 100084, China
| | - Ziyu Huang
- Department of Anesthesiology, Peking University People’s Hospital, Beijing 100044, China
| | - Guowen Xiao
- School of Electronics, Peking University, Beijing 100084, China
| | - Bowen Xu
- School of Electronics, Peking University, Beijing 100084, China
| | - Quansheng Ren
- School of Electronics, Peking University, Beijing 100084, China
| | - Hong Zhao
- Department of Anesthesiology, Peking University People’s Hospital, Beijing 100044, China
| |
Collapse
|
14
|
Machine Learning Models to Predict Protein-Protein Interaction Inhibitors. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27227986. [PMID: 36432086 PMCID: PMC9694076 DOI: 10.3390/molecules27227986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/09/2022] [Accepted: 11/16/2022] [Indexed: 11/19/2022]
Abstract
Protein-protein interaction (PPI) inhibitors have an increasing role in drug discovery. It is hypothesized that machine learning (ML) algorithms can classify or identify PPI inhibitors. This work describes the performance of different algorithms and molecular fingerprints used in chemoinformatics to develop a classification model to identify PPI inhibitors making the codes freely available to the community, particularly the medicinal chemistry research groups working with PPI inhibitors. We found that classification algorithms have different performances according to various features employed in the training process. Random forest (RF) models with the extended connectivity fingerprint radius 2 (ECFP4) had the best classification abilities compared to those models trained with ECFP6 o MACCS keys (166-bits). In general, logistic regression (LR) models had lower performance metrics than RF models, but ECFP4 was the representation most appropriate for LR. ECFP4 also generated models with high-performance metrics with support vector machines (SVM). We also constructed ensemble models based on the top-performing models. As part of this work and to help non-computational experts, we developed a pipeline code freely available.
Collapse
|
15
|
Houssein EH, Hosney ME, Mohamed WM, Ali AA, Younis EMG. Fuzzy-based hunger games search algorithm for global optimization and feature selection using medical data. Neural Comput Appl 2022; 35:5251-5275. [PMID: 36340595 PMCID: PMC9628476 DOI: 10.1007/s00521-022-07916-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 09/30/2022] [Indexed: 11/06/2022]
Abstract
Feature selection (FS) is one of the basic data preprocessing steps in data mining and machine learning. It is used to reduce feature size and increase model generalization. In addition to minimizing feature dimensionality, it also enhances classification accuracy and reduces model complexity, which are essential in several applications. Traditional methods for feature selection often fail in the optimal global solution due to the large search space. Many hybrid techniques have been proposed depending on merging several search strategies which have been used individually as a solution to the FS problem. This study proposes a modified hunger games search algorithm (mHGS), for solving optimization and FS problems. The main advantages of the proposed mHGS are to resolve the following drawbacks that have been raised in the original HGS; (1) avoiding the local search, (2) solving the problem of premature convergence, and (3) balancing between the exploitation and exploration phases. The mHGS has been evaluated by using the IEEE Congress on Evolutionary Computation 2020 (CEC'20) for optimization test and ten medical and chemical datasets. The data have dimensions up to 20000 features or more. The results of the proposed algorithm have been compared to a variety of well-known optimization methods, including improved multi-operator differential evolution algorithm (IMODE), gravitational search algorithm, grey wolf optimization, Harris Hawks optimization, whale optimization algorithm, slime mould algorithm and hunger search games search. The experimental results suggest that the proposed mHGS can generate effective search results without increasing the computational cost and improving the convergence speed. It has also improved the SVM classification performance.
Collapse
Affiliation(s)
- Essam H. Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Mosa E. Hosney
- Faculty of Computers and Information, Luxor University, Luxor, Egypt
| | - Waleed M. Mohamed
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Abdelmgeid A. Ali
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - Eman M. G. Younis
- Faculty of Computers and Information, Minia University, Minia, Egypt
| |
Collapse
|
16
|
Deep Transfer Learning for Question Classification Based on Semantic Information Features of Category Labels. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7178818. [PMID: 36211009 PMCID: PMC9546665 DOI: 10.1155/2022/7178818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 08/29/2022] [Accepted: 09/06/2022] [Indexed: 11/25/2022]
Abstract
Question classification is an important component of the question answering system (QA system), which is designed to restrict the answer types and accurately locate the answers. Therefore, the classification results of the questions affect the quality and performance of the QA system. Most question classification methods in the past have relied on a large amount of manually labeled training data. However, in real situations, especially in new domains, it is very difficult to obtain a large amount of labeled data. Transfer learning is an effective approach to solve the problem with the scarcity of annotated data in new domains. We compare the effects of different deep transfer learning methods on cross-domain question classification. On the basis of the ALBERT fine-tuning model, we extract the category labels of the source domain, the question text, and the predicted category labels of the target domain as input to extract the category labels. Additionally, the semantic information of the category labels is extracted to achieve cross-domain question classification. Furthermore, WordNet is used to expand the question, which further improves the classification accuracy of the target domain. Experimental results show that the above methods can further improve the classification accuracy in new domains based on deep transfer learning.
Collapse
|
17
|
Zhang X, Bai Y, Ngando FJ, Qu H, Shang Y, Ren L, Guo Y. Predicting the Weathering Time by the Empty Puparium of Sarcophaga peregrina (Diptera: Sarcophagidae) with the ANN Models. INSECTS 2022; 13:insects13090808. [PMID: 36135509 PMCID: PMC9502838 DOI: 10.3390/insects13090808] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/20/2022] [Accepted: 08/30/2022] [Indexed: 06/01/2023]
Abstract
Empty puparium are frequently collected at crime scenes and may provide valuable evidence in cases with a long postmortem interval (PMI). Here, we collected the puparium of Sarcophaga peregrina (Diptera: Sarcophagidae) (Robineau-Desvoidy, 1830) for 120 days at three temperatures (10 °C, 25 °C, and 40 °C) with the aim to estimate the weathering time of empty puparium. The CHC profiles were analyzed by gas chromatography-mass spectrometry (GC-MS). The partial least squares (PLS), support vector regression (SVR), and artificial neural network (ANN) models were used to estimate the weathering time. This identified 49 CHCs with a carbon chain length between 10 and 33 in empty puparium. The three models demonstrate that the variation tendency of hydrocarbon could be used to estimate the weathering time, while the ANN models show the best predictive ability among these three models. This work indicated that puparial hydrocarbon weathering has certain regularity with weathering time and can gain insight into estimating PMI in forensic investigations.
Collapse
Affiliation(s)
- Xiangyan Zhang
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Yang Bai
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Fernand Jocelin Ngando
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Hongke Qu
- School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Yanjie Shang
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Lipin Ren
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| | - Yadong Guo
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha 410013, China
| |
Collapse
|
18
|
Asahara R, Miyao T. Extended Connectivity Fingerprints as a Chemical Reaction Representation for Enantioselective Organophosphorus-Catalyzed Asymmetric Reaction Prediction. ACS OMEGA 2022; 7:26952-26964. [PMID: 35936487 PMCID: PMC9352214 DOI: 10.1021/acsomega.2c03812] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Predicting the outcomes of organic reactions using data-driven approaches aids in the acceleration of research. In laboratory-scale experiments, only a small number of reaction data can be accessed for machine learning model construction, where reaction representations play a pivotal role in the success of model construction. Nevertheless, representation comparison for a small data set is not adequate. Herein, focusing on the enantioselectivity of phosphoric-acid-catalyzed reactions, various two-dimensional and three-dimensional reaction representations (descriptors) were compared. Overall, the concatenated form of the extended connectivity fingerprints showed the best predictive capability for the two types of data sets: high-throughput experimental data and manually collected literature data sets. Furthermore, highlighting the substructure contribution to the prediction outcome was shown to be informative for guiding catalyst development.
Collapse
Affiliation(s)
- Ryosuke Asahara
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara 630-0192, Japan
| |
Collapse
|
19
|
Ikemoto K, Akiyoshi M, Mio T, Nishioka K, Sato S, Isobe H. Synthesis of a Negatively Curved Nanocarbon Molecule with an Octagonal Omphalos via Design-of-Experiments Optimizations Supplemented by Machine Learning. Angew Chem Int Ed Engl 2022; 61:e202204035. [PMID: 35603558 DOI: 10.1002/anie.202204035] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Indexed: 12/16/2022]
Abstract
A saddle-shaped nanocarbon molecule was synthesized, which revealed the existence of negative Gauss curvatures on a >3-nm molecular structure possessing 192 π-electrons. The synthesis was facilitated by a protocol developed with Design-of-Experiments optimizations and machine-learning predictions, and spectroscopy and crystallography were used to reveal the saddle-shaped structure of the molecule. Solution-phase analyses showed the presence of dimeric assembly, and crystallographic analyses revealed the stacked dimeric structures. The stacked crystal structure was scrutinized by various methods, including Gauss curvatures derived from the discrete surface theory of geometry, to reveal the important role of the molecular Gauss curvature in dimeric assembly.
Collapse
Affiliation(s)
- Koki Ikemoto
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Misato Akiyoshi
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Tatsuru Mio
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Kaito Nishioka
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Sota Sato
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan.,Present address: Department of Applied Chemistry, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Hiroyuki Isobe
- Department of Chemistry, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
| |
Collapse
|
20
|
Ikemoto K, Akiyoshi M, Mio T, Nishioka K, Sato S, Isobe H. Synthesis of a Negatively Curved Nanocarbon Molecule with an Octagonal Omphalos via Design‐of‐Experiments Optimizations Supplemented by Machine Learning. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202204035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Koki Ikemoto
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
| | - Misato Akiyoshi
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
| | - Tatsuru Mio
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
| | - Kaito Nishioka
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
| | - Sota Sato
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
- Present address: Department of Applied Chemistry The University of Tokyo Hongo, Bunkyo-ku, Tokyo 113-8656 Japan
| | - Hiroyuki Isobe
- Department of Chemistry The University of Tokyo Hongo 7-3-1, Bunkyo-ku Tokyo 113-0033 Japan
| |
Collapse
|
21
|
Cho BH, Kim YH, Lee KB, Hong YK, Kim KC. Potential of Snapshot-Type Hyperspectral Imagery Using Support Vector Classifier for the Classification of Tomatoes Maturity. SENSORS 2022; 22:s22124378. [PMID: 35746159 PMCID: PMC9227650 DOI: 10.3390/s22124378] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/07/2022] [Accepted: 06/07/2022] [Indexed: 02/01/2023]
Abstract
It is necessary to convert to automation in a tomato hydroponic greenhouse because of the aging of farmers, the reduction in agricultural workers as a proportion of the population, COVID-19, and so on. In particular, agricultural robots are attractive as one of the ways for automation conversion in a hydroponic greenhouse. However, to develop agricultural robots, crop monitoring techniques will be necessary. In this study, therefore, we aimed to develop a maturity classification model for tomatoes using both support vector classifier (SVC) and snapshot-type hyperspectral imaging (VIS: 460–600 nm (16 bands) and Red-NIR: 600–860 nm (15 bands)). The spectral data, a total of 258 tomatoes harvested in January and February 2022, was obtained from the tomatoes’ surfaces. Spectral data that has a relationship with the maturity stages of tomatoes was selected by correlation analysis. In addition, the four different spectral data were prepared, such as VIS data (16 bands), Red-NIR data (15 bands), combination data of VIS and Red-NIR (31 bands), and selected spectral data (6 bands). These data were trained by SVC, respectively, and we evaluated the performance of trained classification models. As a result, the SVC based on VIS data achieved a classification accuracy of 79% and an F1-score of 88% to classify the tomato maturity into six stages (Green, Breaker, Turning, Pink, Light-red, and Red). In addition, the developed model was tested in a hydroponic greenhouse and was able to classify the maturity stages with a classification accuracy of 75% and an F1-score of 86%.
Collapse
|
22
|
Packwood D, Nguyen LTH, Cesana P, Zhang G, Staykov A, Fukumoto Y, Nguyen DH. Machine Learning in Materials Chemistry: An Invitation. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100265] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
|
23
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
24
|
Rodríguez-Pérez R, Bajorath J. Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery. J Comput Aided Mol Des 2022; 36:355-362. [PMID: 35304657 PMCID: PMC9325859 DOI: 10.1007/s10822-022-00442-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 02/15/2022] [Indexed: 11/05/2022]
Abstract
The support vector machine (SVM) algorithm is one of the most widely used machine learning (ML) methods for predicting active compounds and molecular properties. In chemoinformatics and drug discovery, SVM has been a state-of-the-art ML approach for more than a decade. A unique attribute of SVM is that it operates in feature spaces of increasing dimensionality. Hence, SVM conceptually departs from the paradigm of low dimensionality that applies to many other methods for chemical space navigation. The SVM approach is applicable to compound classification, and ranking, multi-class predictions, and –in algorithmically modified form– regression modeling. In the emerging era of deep learning (DL), SVM retains its relevance as one of the premier ML methods in chemoinformatics, for reasons discussed herein. We describe the SVM methodology including strengths and weaknesses and discuss selected applications that have contributed to the evolution of SVM as a premier approach for compound classification, property predictions, and virtual compound screening.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115, Bonn, Germany. .,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002, Basel, Switzerland.
| |
Collapse
|
25
|
Comparison of Rainfall-Runoff Simulation between Support Vector Regression and HEC-HMS for a Rural Watershed in Taiwan. WATER 2022. [DOI: 10.3390/w14020191] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To better understand the effect and constraint of different data lengths on the data-driven model training for the rainfall-runoff simulation, the support vector regression (SVR) approach was applied to the data-driven model as the core algorithm in the present study. Various features selection strategies and different data lengths were employed in the training phase of the model. The validated results of the SVR were compared with the rainfall-runoff simulation derived from a physically based hydrologic model, the Hydrologic Modeling System (HEC-HMS). The HEC-HMS was considered a conventional approach and was also calibrated with a dataset period identical to the SVR. Our results showed that the SVR and HEC-HMS models could be adopted for short and long periods of rainfall-runoff simulation. However, the SVR model estimated the rainfall-runoff relationship reasonably well even if the observational data of one year or one typhoon event was used. In contrast, the HEC-HMS model needed more parameter optimization and inference processes to achieve the same performance level as the SVR model. Overall, the SVR model was superior to the HEC-HMS model in the performance of the rainfall-runoff simulation.
Collapse
|
26
|
Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021; 64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| |
Collapse
|
27
|
Gene Selection for Microarray Cancer Classification based on Manta Rays Foraging Optimization and Support Vector Machines. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-06102-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
28
|
Pourashraf T, Shokri S, Yousefi M, Ahmadi A, Azar PA. Implementing Machine Learning in Laboratory Synthesis by Hybrid of SVR Model and Optimization Algorithms. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202100225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Tolou Pourashraf
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| | - Saeid Shokri
- Technology and Innovation Group Research Institute of Petroleum Industry (RIPI) Tehran 1485733111 Iran
| | - Mohammad Yousefi
- Department of Chemistry Faculty of Pharmaceutical Chemistry Tehran Medical Sciences Islamic Azad University Tehran 1949635881 Iran
| | - Abbas Ahmadi
- Department of Chemistry Faculty of Science Karaj Branch Islamic Azad University Karaj 3149968111 Iran
| | - Parviz Aberoomand Azar
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| |
Collapse
|
29
|
Jesus B, Cassani R, McGeown WJ, Cecchi M, Fadem KC, Falk TH. Multimodal Prediction of Alzheimer's Disease Severity Level Based on Resting-State EEG and Structural MRI. Front Hum Neurosci 2021; 15:700627. [PMID: 34566600 PMCID: PMC8458963 DOI: 10.3389/fnhum.2021.700627] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 08/05/2021] [Indexed: 11/13/2022] Open
Abstract
While several biomarkers have been developed for the detection of Alzheimer's disease (AD), not many are available for the prediction of disease severity, particularly for patients in the mild stages of AD. In this paper, we explore the multimodal prediction of Mini-Mental State Examination (MMSE) scores using resting-state electroencephalography (EEG) and structural magnetic resonance imaging (MRI) scans. Analyses were carried out on a dataset comprised of EEG and MRI data collected from 89 patients diagnosed with minimal-mild AD. Three feature selection algorithms were assessed alongside four machine learning algorithms. Results showed that while MRI features alone outperformed EEG features, when both modalities were combined, improved results were achieved. The top-selected EEG features conveyed information about amplitude modulation rate-of-change, whereas top-MRI features comprised information about cortical area and white matter volume. Overall, a root mean square error between predicted MMSE values and true MMSE scores of 1.682 was achieved with a multimodal system and a random forest regression model.
Collapse
Affiliation(s)
- Belmir Jesus
- Institut National de la Recherche Scientifique, University of Quebec, Montreal, QC, Canada
| | - Raymundo Cassani
- Institut National de la Recherche Scientifique, University of Quebec, Montreal, QC, Canada
| | - William J McGeown
- School of Psychological Sciences and Health, University of Strathclyde, Glasgow, United Kingdom
| | | | - K C Fadem
- COGNISION, Louisville, KY, United States
| | - Tiago H Falk
- Institut National de la Recherche Scientifique, University of Quebec, Montreal, QC, Canada
| |
Collapse
|
30
|
Harnessing artificial intelligence for the next generation of 3D printed medicines. Adv Drug Deliv Rev 2021; 175:113805. [PMID: 34019957 DOI: 10.1016/j.addr.2021.05.015] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 05/02/2021] [Accepted: 05/13/2021] [Indexed: 02/06/2023]
Abstract
Artificial intelligence (AI) is redefining how we exist in the world. In almost every sector of society, AI is performing tasks with super-human speed and intellect; from the prediction of stock market trends to driverless vehicles, diagnosis of disease, and robotic surgery. Despite this growing success, the pharmaceutical field is yet to truly harness AI. Development and manufacture of medicines remains largely in a 'one size fits all' paradigm, in which mass-produced, identical formulations are expected to meet individual patient needs. Recently, 3D printing (3DP) has illuminated a path for on-demand production of fully customisable medicines. Due to its flexibility, pharmaceutical 3DP presents innumerable options during formulation development that generally require expert navigation. Leveraging AI within pharmaceutical 3DP removes the need for human expertise, as optimal process parameters can be accurately predicted by machine learning. AI can also be incorporated into a pharmaceutical 3DP 'Internet of Things', moving the personalised production of medicines into an intelligent, streamlined, and autonomous pipeline. Supportive infrastructure, such as The Cloud and blockchain, will also play a vital role. Crucially, these technologies will expedite the use of pharmaceutical 3DP in clinical settings and drive the global movement towards personalised medicine and Industry 4.0.
Collapse
|
31
|
Modeling and Predicting the Cell Migration Properties from Scratch Wound Healing Assay on Cisplatin-Resistant Ovarian Cancer Cell Lines Using Artificial Neural Network. Healthcare (Basel) 2021; 9:healthcare9070911. [PMID: 34356289 PMCID: PMC8305856 DOI: 10.3390/healthcare9070911] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 07/14/2021] [Accepted: 07/14/2021] [Indexed: 01/04/2023] Open
Abstract
The study of artificial neural networks (ANN) has undergone a tremendous revolution in recent years, boosted by deep learning tools. The presence of a greater number of learning tools and their applications, in particular, favors this revolution. However, there is a significant need to deal with the issue of implementing a systematic method during the development phase of the ANN to increase its performance. A multilayer feedforward neural network (FNN) was proposed in this paper to predict the cell migration assay on cisplatin-sensitive and cisplatin-resistant (CisR) ovarian cancer (OC) cell lines via scratch wound healing assay. An FNN training algorithm model was generated using the MATLAB fitting function in a MATLAB script to accomplish this task. The input parameters were types of cell lines, times, and wound area, and outputs were relative wound area, percentage of wound closure, and wound healing speed. In addition, we tested and compared the initial accuracy of various supervised learning classifier and support vector regression (SVR) algorithms. The proposed ANN model achieved good agreement with the experimental data and minimized error between the estimated and experimental values. The conclusions drawn demonstrate that the developed ANN model is a useful, accurate, fast, and inexpensive method to predict cancerous cell migration characteristics evaluated via scratch wound healing assay.
Collapse
|
32
|
Mekni N, Coronnello C, Langer T, Rosa MD, Perricone U. Support Vector Machine as a Supervised Learning for the Prioritization of Novel Potential SARS-CoV-2 Main Protease Inhibitors. Int J Mol Sci 2021; 22:7714. [PMID: 34299333 PMCID: PMC8305792 DOI: 10.3390/ijms22147714] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
In the last year, the COVID-19 pandemic has highly affected the lifestyle of the world population, encouraging the scientific community towards a great effort on studying the infection molecular mechanisms. Several vaccine formulations are nowadays available and helping to reach immunity. Nevertheless, there is a growing interest towards the development of novel anti-covid drugs. In this scenario, the main protease (Mpro) represents an appealing target, being the enzyme responsible for the cleavage of polypeptides during the viral genome transcription. With the aim of sharing new insights for the design of novel Mpro inhibitors, our research group developed a machine learning approach using the support vector machine (SVM) classification. Starting from a dataset of two million commercially available compounds, the model was able to classify two hundred novel chemo-types as potentially active against the viral protease. The compounds labelled as actives by SVM were next evaluated through consensus docking studies on two PDB structures and their binding mode was compared to well-known protease inhibitors. The best five compounds selected by consensus docking were then submitted to molecular dynamics to deepen binding interactions stability. Of note, the compounds selected via SVM retrieved all the most important interactions known in the literature.
Collapse
Affiliation(s)
- Nedra Mekni
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Claudia Coronnello
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, University of Vienna, 1090 Vienna, Austria;
| | - Maria De Rosa
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| | - Ugo Perricone
- Drug Discovery Unit, Fondazione Ri.MED, 90128 Palermo, Italy; (C.C.); (M.D.R.)
| |
Collapse
|
33
|
Cordero JA, He K, Janya K, Echigo S, Itoh S. Predicting formation of haloacetic acids by chlorination of organic compounds using machine-learning-assisted quantitative structure-activity relationships. JOURNAL OF HAZARDOUS MATERIALS 2021; 408:124466. [PMID: 33191030 DOI: 10.1016/j.jhazmat.2020.124466] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 10/30/2020] [Accepted: 10/31/2020] [Indexed: 06/11/2023]
Abstract
The presence of disinfection byproducts (DBPs) in drinking water is a major public health concern, and an effective strategy to limit the formation of these DBPs is to prevent their precursors. In silico prediction from chemical structure would allow rapid identification of precursors and could be used as a prescreening tool to prioritize testing. We present models using machine learning algorithms (i.e., support vector regressor, random forest regressor, and multilayer perceptron regressor) and chemical descriptors as features to predict the formation of haloacetic acids (HAAs). A robust model with good predictivity (i.e., leave-one-out cross-validated Q2 > 0.5) to predict the formation of trichloroacetic acid (TCAA) was developed using a random forest regressor. The number of aromatic bonds, hydrophilicity, and electrotopological descriptors related to electrostatic interactions and the atomic distribution of electronegativity were identified as important predictors of TCAA formation potentials (FPs). However, the prediction of dichloroacetic acid was less accurate, which is congruent with the presence of different types of precursors exhibiting distinct mechanisms. This study demonstrates that nonlinear combinations of general chemical descriptors can adequately estimate HAAFPs, and we hope that our study can be used to predict precursors of other disinfection byproducts based on chemical structures using a similar workflow.
Collapse
Affiliation(s)
- José Andrés Cordero
- Department of Environmental Engineering, Graduate School of Engineering, Kyoto University, Nishikyo, Kyoto 6158540, Japan
| | - Kai He
- Research Center for Environmental Quality Management, Kyoto University, 1-2 Yumihama, Otsu, Shiga 5200811, Japan.
| | - Kanjira Janya
- Department of Chemical Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand
| | - Shinya Echigo
- Department of Environmental Engineering, Graduate School of Engineering, Kyoto University, Nishikyo, Kyoto 6158540, Japan
| | - Sadahiko Itoh
- Department of Environmental Engineering, Graduate School of Engineering, Kyoto University, Nishikyo, Kyoto 6158540, Japan
| |
Collapse
|
34
|
Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions. J Comput Aided Mol Des 2021; 35:285-295. [PMID: 33598870 PMCID: PMC7982389 DOI: 10.1007/s10822-021-00376-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 02/03/2021] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) enables modeling of quantitative structure–activity relationships (QSAR) and compound potency predictions. Recently, multi-target QSAR models have been gaining increasing attention. Simultaneous compound potency predictions for multiple targets can be carried out using ensembles of independently derived target-based QSAR models or in a more integrated and advanced manner using multi-target deep neural networks (MT-DNNs). Herein, single-target and multi-target ML models were systematically compared on a large scale in compound potency value predictions for 270 human targets. By design, this large-magnitude evaluation has been a special feature of our study. To these ends, MT-DNN, single-target DNN (ST-DNN), support vector regression (SVR), and random forest regression (RFR) models were implemented. Different test systems were defined to benchmark these ML methods under conditions of varying complexity. Source compounds were divided into training and test sets in a compound- or analog series-based manner taking target information into account. Data partitioning approaches used for model training and evaluation were shown to influence the relative performance of ML methods, especially for the most challenging compound data sets. For example, the performance of MT-DNNs with per-target models yielded superior performance compared to single-target models. For a test compound or its analogs, the availability of potency measurements for multiple targets affected model performance, revealing the influence of ML synergies.
Collapse
|
35
|
Galati S, Yonchev D, Rodríguez-Pérez R, Vogt M, Tuccinardi T, Bajorath J. Predicting Isoform-Selective Carbonic Anhydrase Inhibitors via Machine Learning and Rationalizing Structural Features Important for Selectivity. ACS OMEGA 2021; 6:4080-4089. [PMID: 33585783 PMCID: PMC7876851 DOI: 10.1021/acsomega.0c06153] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 01/14/2021] [Indexed: 05/03/2023]
Abstract
Carbonic anhydrases (CAs) catalyze the physiological hydration of carbon dioxide and are among the most intensely studied pharmaceutical target enzymes. A hallmark of CA inhibition is the complexation of the catalytic zinc cation in the active site. Human (h) CA isoforms belonging to different families are implicated in a wide range of diseases and of very high interest for therapeutic intervention. Given the conserved catalytic mechanisms and high similarity of many hCA isoforms, a major challenge for CA-based therapy is achieving inhibitor selectivity for hCA isoforms that are associated with specific pathologies over other widely distributed isoforms such as hCA I or hCA II that are of critical relevance for the integrity of many physiological processes. To address this challenge, we have attempted to predict compounds that are selective for isoform hCA IX, which is a tumor-associated protein and implicated in metastasis, over hCA II on the basis of a carefully curated data set of selective and nonselective inhibitors. Machine learning achieved surprisingly high accuracy in predicting hCA IX-selective inhibitors. The results were further investigated, and compound features determining successful predictions were identified. These features were then studied on the basis of X-ray structures of hCA isoform-inhibitor complexes and found to include substructures that explain compound selectivity. Our findings lend credence to selectivity predictions and indicate that the machine learning models derived herein have considerable potential to aid in the identification of new hCA IX-selective compounds.
Collapse
Affiliation(s)
- Salvatore Galati
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- Department
of Pharmacy, University of Pisa, 56126 Pisa, Italy
| | - Dimitar Yonchev
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| | - Raquel Rodríguez-Pérez
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| | - Martin Vogt
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| | - Tiziano Tuccinardi
- Department
of Pharmacy, University of Pisa, 56126 Pisa, Italy
- . Phone: 39-050-2219595
| | - Jürgen Bajorath
- Department
of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology
and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
- . Phone: 49-228-7369-100
| |
Collapse
|
36
|
Shibayama S, Funatsu K. Industrial Case Study: Identification of Important Substructures and Exploration of Monomers for the Rapid Design of Novel Network Polymers with Distributed Representation. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2021. [DOI: 10.1246/bcsj.20200220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Shojiro Shibayama
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
37
|
Houssein EH, Hosney ME, Elhoseny M, Oliva D, Mohamed WM, Hassaballah M. Hybrid Harris hawks optimization with cuckoo search for drug design and discovery in chemoinformatics. Sci Rep 2020; 10:14439. [PMID: 32879410 PMCID: PMC7468137 DOI: 10.1038/s41598-020-71502-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Accepted: 07/23/2020] [Indexed: 11/09/2022] Open
Abstract
One of the major drawbacks of cheminformatics is a large amount of information present in the datasets. In the majority of cases, this information contains redundant instances that affect the analysis of similarity measurements with respect to drug design and discovery. Therefore, using classical methods such as the protein bank database and quantum mechanical calculations are insufficient owing to the dimensionality of search spaces. In this paper, we introduce a hybrid metaheuristic algorithm called CHHO-CS, which combines Harris hawks optimizer (HHO) with two operators: cuckoo search (CS) and chaotic maps. The role of CS is to control the main position vectors of the HHO algorithm to maintain the balance between exploitation and exploration phases, while the chaotic maps are used to update the control energy parameters to avoid falling into local optimum and premature convergence. Feature selection (FS) is a tool that permits to reduce the dimensionality of the dataset by removing redundant and non desired information, then FS is very helpful in cheminformatics. FS methods employ a classifier that permits to identify the best subset of features. The support vector machines (SVMs) are then used by the proposed CHHO-CS as an objective function for the classification process in FS. The CHHO-CS-SVM is tested in the selection of appropriate chemical descriptors and compound activities. Various datasets are used to validate the efficiency of the proposed CHHO-CS-SVM approach including ten from the UCI machine learning repository. Additionally, two chemical datasets (i.e., quantitative structure-activity relation biodegradation and monoamine oxidase) were utilized for selecting the most significant chemical descriptors and chemical compounds activities. The extensive experimental and statistical analyses exhibit that the suggested CHHO-CS method accomplished much-preferred trade-off solutions over the competitor algorithms including the HHO, CS, particle swarm optimization, moth-flame optimization, grey wolf optimizer, Salp swarm algorithm, and sine-cosine algorithm surfaced in the literature. The experimental results proved that the complexity associated with cheminformatics can be handled using chaotic maps and hybridizing the meta-heuristic methods.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Mosa E Hosney
- Faculty of Computers and Information, Luxor University, Luxor, Egypt
| | - Mohamed Elhoseny
- Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
| | - Diego Oliva
- Depto. de Ciencias Computacionales, Universidad de Guadalajara, CUCEI, Guadalajara, Jal, Mexico.
- IN3 - Computer Science Department, Universitat Oberta de Catalunya, Castelldefels, Spain.
| | - Waleed M Mohamed
- Faculty of Computers and Information, Minia University, Minia, Egypt
| | - M Hassaballah
- Computer Science Department, Faculty of Computers and Information, South Valley University, Qena, Egypt
| |
Collapse
|
38
|
Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. J Comput Aided Mol Des 2020; 34:1013-1026. [PMID: 32361862 PMCID: PMC7449951 DOI: 10.1007/s10822-020-00314-0] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/24/2020] [Indexed: 02/07/2023]
Abstract
Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.
Collapse
|
39
|
Houssein EH, Hosney ME, Oliva D, Mohamed WM, Hassaballah M. A novel hybrid Harris hawks optimization and support vector machines for drug design and discovery. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2019.106656] [Citation(s) in RCA: 124] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
40
|
Luchi A, Villafañe RN, Gómez Chávez JL, Bogado ML, Angelina EL, Peruchena NM. Combining Charge Density Analysis with Machine Learning Tools To Investigate the Cruzain Inhibition Mechanism. ACS OMEGA 2019; 4:19582-19594. [PMID: 31788588 PMCID: PMC6881835 DOI: 10.1021/acsomega.9b01934] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 10/18/2019] [Indexed: 05/28/2023]
Abstract
Trypanosoma cruzi, a flagellate protozoan parasite, is responsible for Chagas disease. The parasite major cysteine protease, cruzain (Cz), plays a vital role at every stage of its life cycle and the active-site region of the enzyme, similar to those of other members of the papain superfamily, is well characterized. Taking advantage of structural information available in public databases about Cz bound to known covalent inhibitors, along with their corresponding activity annotations, in this work, we performed a deep analysis of the molecular interactions at the Cz binding cleft, in order to investigate the enzyme inhibition mechanism. Our toolbox for performing this study consisted of the charge density topological analysis of the complexes to extract the molecular interactions and machine learning classification models to relate the interactions with biological activity. More precisely, such a combination was useful for the classification of molecular interactions as "active-like" or "inactive-like" according to whether they are prevalent in the most active or less active complexes, respectively. Further analysis of interactions with the help of unsupervised learning tools also allowed the understanding of how these interactions come into play together to trigger the enzyme into a particular conformational state. Most active inhibitors induce some conformational changes within the enzyme that lead to an overall better fit of the inhibitor into the binding cleft. Curiously, some of these conformational changes can be considered as a hallmark of the substrate recognition event, which means that most active inhibitors are likely recognized by the enzyme as if they were its own substrate so that the catalytic machinery is arranged as if it is about to break the substrate scissile bond. Overall, these results contribute to a better understanding of the enzyme inhibition mechanism. Moreover, the information about main interactions extracted through this work is already being used in our lab to guide docking solutions in ongoing prospective virtual screening campaigns to search for novel noncovalent cruzain inhibitors.
Collapse
|
41
|
Chemogenomic Analysis of the Druggable Kinome and Its Application to Repositioning and Lead Identification Studies. Cell Chem Biol 2019; 26:1608-1622.e6. [DOI: 10.1016/j.chembiol.2019.08.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/18/2019] [Accepted: 08/21/2019] [Indexed: 02/06/2023]
|
42
|
Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019; 63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
43
|
Jabeen A, Ranganathan S. Applications of machine learning in GPCR bioactive ligand discovery. Curr Opin Struct Biol 2019; 55:66-76. [PMID: 31005679 DOI: 10.1016/j.sbi.2019.03.022] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 03/14/2019] [Accepted: 03/14/2019] [Indexed: 12/17/2022]
Abstract
GPCRs constitute the largest druggable family having targets for 475 Food and Drug Administration (FDA) approved drugs. As GPCRs are of great interest to pharmaceutical industry, enormous efforts are being expended to find relevant and potent GPCR ligands as lead compounds. There are tens of millions of compounds present in different chemical databases. In order to scan this immense chemical space, computational methods, especially machine learning (ML) methods, are essential components of GPCR drug discovery pipelines. ML approaches have applications in both ligand-based and structure-based virtual screening. We present here a cheminformatics overview of ML applications to different stages of GPCR drug discovery. Focusing on olfactory receptors, which are the largest family of GPCRs, a case study for predicting agonists for an ectopic olfactory receptor, OR1G1, compares four classical ML methods.
Collapse
Affiliation(s)
- Amara Jabeen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| |
Collapse
|
44
|
Zheng S, Wang Y, Liu H, Chang W, Xu Y, Lin F. Prediction of Hemolytic Toxicity for Saponins by Machine-Learning Methods. Chem Res Toxicol 2019; 32:1014-1026. [DOI: 10.1021/acs.chemrestox.8b00347] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Suqing Zheng
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang 325035, P. R. China
- Chemical Biology Research Center, Wenzhou Medical University, Wenzhou, Zhejiang 325035, P. R. China
| | - Yibing Wang
- Genetic Screening Center, National Institute of Biological Sciences, Beijing 102206, P. R. China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, P. R. China
| | - Hongmei Liu
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang 325035, P. R. China
| | - Wenping Chang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang 325035, P. R. China
| | - Yong Xu
- Center of Chemical Biology, Guangzhou Institute of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, Guangdong, P. R. China
| | - Fu Lin
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, Zhejiang 325035, P. R. China
| |
Collapse
|
45
|
Zheng S, Chang W, Xu W, Xu Y, Lin F. e-Sweet: A Machine-Learning Based Platform for the Prediction of Sweetener and Its Relative Sweetness. Front Chem 2019; 7:35. [PMID: 30761295 PMCID: PMC6363693 DOI: 10.3389/fchem.2019.00035] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 01/14/2019] [Indexed: 11/23/2022] Open
Abstract
Artificial sweeteners (AS) can elicit the strong sweet sensation with the low or zero calorie, and are widely used to replace the nutritive sugar in the food and beverage industry. However, the safety issue of current AS is still controversial. Thus, it is imperative to develop more safe and potent AS. Due to the costly and laborious experimental-screening of AS, in-silico sweetener/sweetness prediction could provide a good avenue to identify the potential sweetener candidates before experiment. In this work, we curate the largest dataset of 530 sweeteners and 850 non-sweeteners, and collect the second largest dataset of 352 sweeteners with the relative sweetness (RS) from the literature. In light of these experimental datasets, we adopt five machine-learning methods and conformational-independent molecular fingerprints to derive the classification and regression models for the prediction of sweetener and its RS, respectively via the consensus strategy. Our best classification model achieves the 95% confidence intervals for the accuracy (0.91 ± 0.01), precision (0.90 ± 0.01), specificity (0.94 ± 0.01), sensitivity (0.86 ± 0.01), F1-score (0.88 ± 0.01), and NER (Non-error Rate: 0.90 ± 0.01) on the test set, which outperforms the model (NER = 0.85) of Rojas et al. in terms of NER, and our best regression model gives the 95% confidence intervals for the R2(test set) and ΔR2 [referring to |R2(test set)- R2(cross-validation)|] of 0.77 ± 0.01 and 0.03 ± 0.01, respectively, which is also better than the other works based on the conformation-independent 2D descriptors (e.g., 2D Dragon) according to R2(test set) and ΔR2. Our models are obtained by averaging over nineteen data-splitting schemes, and fully comply with the guidelines of Organization for Economic Cooperation and Development (OECD), which are not completely followed by the previous relevant works that are all on the basis of only one random data-splitting scheme for the cross-validation set and test set. Finally, we develop a user-friendly platform “e-Sweet” for the automatic prediction of sweetener and its corresponding RS. To our best knowledge, it is a first and free platform that can enable the experimental food scientists to exploit the current machine-learning methods to boost the discovery of more AS with the low or zero calorie content.
Collapse
Affiliation(s)
- Suqing Zheng
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China.,Chemical Biology Research Center, Wenzhou Medical University, Wenzhou, China
| | - Wenping Chang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Wenxin Xu
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Yong Xu
- Center of Chemical Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
| | - Fu Lin
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
46
|
Zheng S, Chang W, Liu W, Liang G, Xu Y, Lin F. Computational Prediction of a New ADMET Endpoint for Small Molecules: Anticommensal Effect on Human Gut Microbiota. J Chem Inf Model 2018; 59:1215-1220. [PMID: 30352151 DOI: 10.1021/acs.jcim.8b00600] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The human gut microbiota (HGM), which are evolutionarily commensal in the human gastrointestinal system, are crucial to our health. However, HGM can be broadly shaped by multifaceted factors such as intake of drugs. About one-quarter of the existing drugs for humans, which are designed to target human cells rather than HGM, can notably alter the composition of HGM. Therefore, the anticommensal effect of human drugs should be avoided to the maximum extent possible in the drug discovery and development process. Nevertheless, the anticommensal effect of small molecules is a new ADMET (absorption, distribution, metabolism, excretion, and toxicity) end point, which was never predicted with the computational method before. In this work, we present the first machine-learning based consensus classification model with the accuracy (0.811 ± 0.012), precision (0.759 ± 0.032), specificity (0.901 ± 0.019), sensitivity (0.628 ± 0.036), F1-score (0.687 ± 0.023), and AUC (0.814 ± 0.030) respectively on the test set. Furthermore, we develop an easy-to-use "e-Commensal" program for the automatic prediction. Based on this program, virtual-screening of the food-constituent database (FooDB) indicates that 5888 of 23 202 food-relevant compounds are forecasted to possess an anticommensal effect on HGM. Several top-ranked anticommensal compounds in our prediction are further scrutinized and confirmed by experiments in the existing literature. To the best of our knowledge, this is the first classification model and stand-alone software for the prediction of commensal or anticommensal compounds impacting HGM.
Collapse
Affiliation(s)
- Suqing Zheng
- School of Pharmaceutical Sciences , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China.,Chemical Biology Research Center , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China
| | - Wenping Chang
- School of Pharmaceutical Sciences , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China
| | - Wenxin Liu
- School of Pharmaceutical Sciences , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China
| | - Guang Liang
- School of Pharmaceutical Sciences , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China.,Chemical Biology Research Center , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China
| | - Yong Xu
- Center of Chemical Biology , Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences , Guangzhou , Guangdong 510530 , P. R. China
| | - Fu Lin
- School of Pharmaceutical Sciences , Wenzhou Medical University , Wenzhou , Zhejiang 325035 , P. R. China
| |
Collapse
|
47
|
Zheng S, Jiang M, Zhao C, Zhu R, Hu Z, Xu Y, Lin F. e-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods. Front Chem 2018; 6:82. [PMID: 29651416 PMCID: PMC5885771 DOI: 10.3389/fchem.2018.00082] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2017] [Accepted: 03/12/2018] [Indexed: 11/25/2022] Open
Abstract
In-silico bitterant prediction received the considerable attention due to the expensive and laborious experimental-screening of the bitterant. In this work, we collect the fully experimental dataset containing 707 bitterants and 592 non-bitterants, which is distinct from the fully or partially hypothetical non-bitterant dataset used in the previous works. Based on this experimental dataset, we harness the consensus votes from the multiple machine-learning methods (e.g., deep learning etc.) combined with the molecular fingerprint to build the bitter/bitterless classification models with five-fold cross-validation, which are further inspected by the Y-randomization test and applicability domain analysis. One of the best consensus models affords the accuracy, precision, specificity, sensitivity, F1-score, and Matthews correlation coefficient (MCC) of 0.929, 0.918, 0.898, 0.954, 0.936, and 0.856 respectively on our test set. For the automatic prediction of bitterant, a graphic program “e-Bitter” is developed for the convenience of users via the simple mouse click. To our best knowledge, it is for the first time to adopt the consensus model for the bitterant prediction and develop the first free stand-alone software for the experimental food scientist.
Collapse
Affiliation(s)
- Suqing Zheng
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China.,Chemical Biology Research Center, Wenzhou Medical University, Wenzhou, China
| | - Mengying Jiang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Chengwei Zhao
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Rui Zhu
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Zhicheng Hu
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Yong Xu
- Center of Chemical Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
| | - Fu Lin
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| |
Collapse
|