1
|
Kim D, Jeong J, Choi J. Identification of Optimal Machine Learning Algorithms and Molecular Fingerprints for Explainable Toxicity Prediction Models Using ToxCast/Tox21 Bioassay Data. ACS OMEGA 2024; 9:37934-37941. [PMID: 39281924 PMCID: PMC11391437 DOI: 10.1021/acsomega.4c04474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
Recent studies have primarily focused on introducing novel frameworks to enhance the predictive power of toxicity prediction models by refining molecular representation methods and algorithms. However, these methods are inherently complex and often pose challenges in understanding and explaining, leading to barriers in their regulatory adoption and validation. Therefore, it is necessary to select the optimal model, considering not only model performance but also interpretability. This study aimed to identify the optimal combination of molecular fingerprints (pattern-based versus algorithm-based) and machine learning algorithms (simple versus complex) for developing explainable toxicity prediction models through an comprehensive investigation of the ToxCast/Tox21 bioassay data set. For 1092 ToxCast/Tox21 assays, five molecular fingerprints (MACCS, Morgan, RDKit, Layered, and Patterned) and six algorithms (MLP, GBT, Random Forest, kNN, Logistic Regression, and Naïve Bayes) were used to train the models. Results showed that 35 models revealed acceptable performance (F1 score or accuracy is 0.8 or higher). Among the combinations, either MACCS or Morgan, paired with Random Forest, demonstrated robust performance compared with other molecular fingerprints and algorithms. MACCS and Random Forest are valuable, even when prioritizing interpretability. Consequently, the MACCS-Random Forest combination model based on four assays, targeting G protein-coupled receptor and kinase, were identified and they can be used to discern specific structural features or patterns in chemical compounds, offering explainable insights into toxicity-related chemical structures. This study indicates the importance of not disregarding the utilization of simple models when assessing both predictivity and interpretability within the context of chemical feature-based Tox21 data analysis.
Collapse
Affiliation(s)
- Donghyeon Kim
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jaeseong Jeong
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| | - Jinhee Choi
- School of Environmental Engineering, University of Seoul, 163 Seoulsiripdae-ro, Dongdaemun-gu, Seoul 02504, Republic of Korea
| |
Collapse
|
2
|
Zhao X, Kong Y, Ji Y, Xin X, Chen L, Chen G, Yu C. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers 2024; 28:2077-2097. [PMID: 37910346 DOI: 10.1007/s11030-023-10735-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/22/2023] [Indexed: 11/03/2023]
Abstract
Tropomyosin receptor kinases (TRKs) are important broad-spectrum anticancer targets. The oncogenic rearrangement of the NTRK gene disrupts the extracellular structural domain and epitopes for therapeutic antibodies, making small-molecule inhibitors essential for treating NTRK fusion-driven tumors. In this work, several algorithms were used to construct descriptor-based and nondescriptor-based models, and the models were evaluated by outer 10-fold cross-validation. To find a model with good generalization ability, the dataset was partitioned by random and cluster-splitting methods to construct in- and cross-domain models, respectively. Among the 48 models built, the model with the combination of the deep neural network (DNN) algorithm and extended connectivity fingerprints 4 (ECFP4) descriptors achieved excellent performance in both dataset divisions. The results indicate that the DNN algorithm has a strong generalization prediction ability, and the richness of features plays a vital role in predicting unknown spatial molecules. Additionally, we combined the clustering results and decision tree models of fingerprint descriptors to perform structure-activity relationship analysis. It was found that nitrogen-containing aromatic heterocyclic and benzo heterocyclic structures play a crucial role in enhancing the activity of TRK inhibitors.
Collapse
Affiliation(s)
- Xiaoman Zhao
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Yue Kong
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Yueshan Ji
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Xiulan Xin
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Liang Chen
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Guang Chen
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Changyuan Yu
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China.
| |
Collapse
|
3
|
Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result (R2train = 0.93, R2ext = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Kweeni Iduoku
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Hrvoje Kusic
- Faculty of Chemical Engineering and Technology, University of Zagreb, Marulicev Trg 19, Zagreb 10000, Croatia
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa 48940, Spain
- BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, Leioa 48940, Spain
- IKERBASQUE, Basque Foundation for Science,Bilbao, Biscay 48011, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| |
Collapse
|
4
|
Melo-Filho CC, Su G, Liu K, Muratov EN, Tropsha A, Liu J. Modeling interactions between Heparan sulfate and proteins based on the Heparan sulfate microarray analysis. Glycobiology 2024; 34:cwae039. [PMID: 38836441 PMCID: PMC11180703 DOI: 10.1093/glycob/cwae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 04/30/2024] [Accepted: 05/29/2024] [Indexed: 06/06/2024] Open
Abstract
Heparan sulfate (HS), a sulfated polysaccharide abundant in the extracellular matrix, plays pivotal roles in various physiological and pathological processes by interacting with proteins. Investigating the binding selectivity of HS oligosaccharides to target proteins is essential, but the exhaustive inclusion of all possible oligosaccharides in microarray experiments is impractical. To address this challenge, we present a hybrid pipeline that integrates microarray and in silico techniques to design oligosaccharides with desired protein affinity. Using fibroblast growth factor 2 (FGF2) as a model protein, we assembled an in-house dataset of HS oligosaccharides on microarrays and developed two structural representations: a standard representation with all atoms explicit and a simplified representation with disaccharide units as "quasi-atoms." Predictive Quantitative Structure-Activity Relationship (QSAR) models for FGF2 affinity were developed using the Random Forest (RF) algorithm. The resulting models, considering the applicability domain, demonstrated high predictivity, with a correct classification rate of 0.81-0.80 and improved positive predictive values (PPV) up to 0.95. Virtual screening of 40 new oligosaccharides using the simplified model identified 15 computational hits, 11 of which were experimentally validated for high FGF2 affinity. This hybrid approach marks a significant step toward the targeted design of oligosaccharides with desired protein interactions, providing a foundation for broader applications in glycobiology.
Collapse
Affiliation(s)
- Cleber C Melo-Filho
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Beard Hall, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Guowei Su
- Glycan Therapeutics, 617 Hutton Street, Raleigh, NC 27606, United States
| | - Kevin Liu
- Glycan Therapeutics, 617 Hutton Street, Raleigh, NC 27606, United States
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Beard Hall, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Beard Hall, University of North Carolina, Chapel Hill, NC 27599, United States
| | - Jian Liu
- Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, 1044 Genetic Medicine Bldg., University of North Carolina, Chapel Hill, NC 27599, United States
| |
Collapse
|
5
|
Puri D, Lee D, khankal DV, Thakur MS, Alfaisal FM, Alam S, Kumar R, Khan MA. Decision Tree-Based Modeling of the Aeration Effectiveness of Circular Plunging Jets. ACS OMEGA 2023; 8:38950-38960. [PMID: 37901507 PMCID: PMC10601425 DOI: 10.1021/acsomega.3c03375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 09/13/2023] [Indexed: 10/31/2023]
Abstract
Since soft computing has gained a lot of attention in hydrological studies, this study focuses on predicting aeration efficiency (E20) using circular plunging jets employing soft computing techniques such as reduced error pruning tree (REPTree), random forest (RF), and M5P. The study undertaken required the development and validation of models, which were achieved using 63 experimental data values with input variables, such as angle of inclination of tilt channel (α), number of plunging jets (JN), discharge of each jet (Q), hydraulic radius of each jet (HR), and Froude number (Fr. No), to evaluate the aeration efficiency (E20), which served as the output variable. To evaluate the effectiveness of the developed models, three different statistical indices were used such as the coefficient of correlation (CC), root-mean-square error (RMSE), and mean absolute error (MAE), and it was found that all of the applied techniques possessed good forecasting ability since their correlation coefficient values were greater than 0.8. Upon testing, it was discovered that the M5P model outperformed other soft computing-based models in its ability to predict E20, as demonstrated by its correlation coefficient value of 0.9564 and notably low values of MAE (0.0143) and RMSE (0.0193).
Collapse
Affiliation(s)
- Diksha Puri
- School
of Environmental Science, Shoolini University, Solan 173229, Himachal Pradesh, India
| | - Daeho Lee
- Department
of Mechanical Engineering, Gachon University, Seongnam 13120, South Korea
| | - Dhananjay Vasant khankal
- Department
of Mechanical Engineering, Sinhgad College
of Engineering, Pune 411041, Maharashtra, India
| | - Mohindra Singh Thakur
- Department
of Civil Engineering, Shoolini University, Solan 173229, Himachal Pradesh, India
| | - Faisal M. Alfaisal
- Department
of Civil Engineering, College of Engineering, King Saud University, Riyadh 11421, Saudi Arabia
| | - Shamshad Alam
- Department
of Civil Engineering, College of Engineering, King Saud University, Riyadh 11421, Saudi Arabia
| | - Raj Kumar
- Department
of Mechanical Engineering, Gachon University, Seongnam 13120, South Korea
| | - Mohammad Amir Khan
- Department
of Civil Engineering, Galgotia College of
Engineering, Greater
Noida 201310, India
| |
Collapse
|
6
|
Dias-Silva JR, Oliveira VM, Sanches-Neto FO, Wilhelms RZ, Queiroz Júnior LHK. SpectraFP: a new spectra-based descriptor to aid in cheminformatics, molecular characterization and search algorithm applications. Phys Chem Chem Phys 2023. [PMID: 37378661 DOI: 10.1039/d3cp00734k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
We have developed an algorithm to generate a new spectra-based descriptor, called SpectraFP, in order to digitalize the chemical shifts of 13C NMR spectra, as well as potentially important data from other spectroscopic techniques. This descriptor is a fingerprint vector with defined sizes and values of 0 and 1, with the ability to correct chemical shift fluctuations. To explore the applicability of SpectraFP, we outlined two application scenarios: (1) the prediction of six functional groups by machine learning (ML) models and (2) the search for structures based on the similarity between the query spectrum and spectra in an experimental database, both in the SpectraFP format. For each functional group, five ML models were built and validated following the OECD principles: internal and external validations, applicability domains, and mechanistic interpretations. All the models resulted in high goodness-of-fit for the training and test sets with MCC respectively between 0.626 and 0.909 and 0.653 and 0.917, and J ranging from 0.812 to 0.957 and 0.825 to 0.961. Using the SHAP (SHapley Additive exPlanations) approach, the mechanistic interpretations of the models were explored; the results indicated that the most important variables for model decision making were coherent with the expected chemical shifts for each functional group. Several metrics, including Tanimoto, geometric, arithmetic, and Tversky, can be used to perform the similarity calculation for the search algorithm. This algorithm can also incorporate additional variables, such as the correction parameter and the difference between the amount of signals in the query spectrum and the database spectra, while preserving its high performance speed. We hope that our descriptor can link information from spectroscopic/spectrometric techniques with ML models to expand the possibilities in understanding the field of cheminformatics. All databases and algorithms developed for this work are open sources and freely accessible.
Collapse
Affiliation(s)
| | - Vitor M Oliveira
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
| | - Flávio O Sanches-Neto
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
- Instituto Federal de Educação, Ciência e Tecnologia de Goiás, Valparaíso de Goiás, Goiania, GO, CEP: 72876-601, Brazil
| | - Renan Z Wilhelms
- Instituto de Química, Universidade Federal de Goiás, Goiânia, Brazil.
| | | |
Collapse
|
7
|
Sharma B, Chenthamarakshan V, Dhurandhar A, Pereira S, Hendler JA, Dordick JS, Das P. Accurate clinical toxicity prediction using multi-task deep neural nets and contrastive molecular explanations. Sci Rep 2023; 13:4908. [PMID: 36966203 PMCID: PMC10039880 DOI: 10.1038/s41598-023-31169-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 03/07/2023] [Indexed: 03/27/2023] Open
Abstract
Explainable machine learning for molecular toxicity prediction is a promising approach for efficient drug development and chemical safety. A predictive ML model of toxicity can reduce experimental cost and time while mitigating ethical concerns by significantly reducing animal and clinical testing. Herein, we use a deep learning framework for simultaneously modeling in vitro, in vivo, and clinical toxicity data. Two different molecular input representations are used; Morgan fingerprints and pre-trained SMILES embeddings. A multi-task deep learning model accurately predicts toxicity for all endpoints, including clinical, as indicated by the area under the Receiver Operator Characteristic curve and balanced accuracy. In particular, pre-trained molecular SMILES embeddings as input to the multi-task model improved clinical toxicity predictions compared to existing models in MoleculeNet benchmark. Additionally, our multitask approach is comprehensive in the sense that it is comparable to state-of-the-art approaches for specific endpoints in in vitro, in vivo and clinical platforms. Through both the multi-task model and transfer learning, we were able to indicate the minimal need of in vivo data for clinical toxicity predictions. To provide confidence and explain the model's predictions, we adapt a post-hoc contrastive explanation method that returns pertinent positive and negative features, which correspond well to known mutagenic and reactive toxicophores, such as unsubstituted bonded heteroatoms, aromatic amines, and Michael receptors. Furthermore, toxicophore recovery by pertinent feature analysis captures more of the in vitro (53%) and in vivo (56%), rather than of the clinical (8%), endpoints, and indeed uncovers a preference in known toxicophore data towards in vitro and in vivo experimental data. To our knowledge, this is the first contrastive explanation, using both present and absent substructures, for predictions of clinical and in vivo molecular toxicity.
Collapse
Affiliation(s)
| | | | | | - Shiranee Pereira
- ICARE, International Center for Alternatives in Research and Education, Chennai, India
| | | | | | - Payel Das
- IBM Research, Yorktown Heights, NY, USA.
| |
Collapse
|
8
|
Hernandez-Betancur JD, Ruiz-Mercado GJ, Martin M. Predicting Chemical End-of-Life Scenarios Using Structure-Based Classification Models. ACS SUSTAINABLE CHEMISTRY & ENGINEERING 2023; 11:3594-3602. [PMID: 36911873 PMCID: PMC9993395 DOI: 10.1021/acssuschemeng.2c05662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Analyzing chemicals and their effects on the environment from a life cycle viewpoint can produce a thorough analysis that takes end-of-life (EoL) activities into account. Chemical risk assessment, predicting environmental discharges, and finding EoL paths and exposure scenarios all depend on chemical flow data availability. However, it is challenging to gain access to such data and systematically determine EoL activities and potential chemical exposure scenarios. As a result, this work creates quantitative structure-transfer relationship (QSTR) models for aiding environmental managment decision-making based on chemical structure-based machine learning (ML) models to predict potential industrial EoL activities, chemical flow allocation, environmental releases, and exposure routes. Further multi-label classification methods may improve the predictability of QSTR models according to the ML experiment tracking. The developed QSTR models will assist stakeholders in predicting and comprehending potential EoL management activities and recycling loops, enabling environmental decision-making and EoL exposure assessment for new or existing chemicals in the global marketplace.
Collapse
Affiliation(s)
| | - Gerardo J. Ruiz-Mercado
- Office
of Research & Development, US Environmental
Protection Agency, Cincinnati, Ohio 45268, United States
- Chemical
Engineering Graduate Program, Universidad
del Atlántico, Puerto Colombia 080007, Colombia
| | - Mariano Martin
- Department
of Chemical Engineering, University of Salamanca, Salamanca 37008, Spain
| |
Collapse
|
9
|
Xu Z, Chughtai H, Tian L, Liu L, Roy JF, Bayen S. Development of quantitative structure-retention relationship models to improve the identification of leachables in food packaging using non-targeted analysis. Talanta 2023; 253:123861. [PMID: 36095943 DOI: 10.1016/j.talanta.2022.123861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 08/15/2022] [Accepted: 08/17/2022] [Indexed: 12/13/2022]
Abstract
Quantitative structure-retention relationship (QSRR) models can be used to predict the chromatographic retention time of chemicals and facilitate the identification of unknown compounds, notably with non-targeted analysis. In this study, QSRR models were developed from the data obtained for 178 pure chemical standards and four types of analytical columns (C18, phenylhexyl, pentafluorophenyl, cyano) in liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS). First, different data partitioning ratios and feature selection methods [random forest (RF) and support vector machine (SVM)] were tested to build models to predict chromatographic retention times based on 2D molecular descriptors. The internal and external performances of the non-linear (RF) and corresponding linear predictive models were systematically compared, and RF models resulted in better predictive capacities [p < 0.05, with an average PVE (proportion of variance explained) value of 0.89 ± 0.02] than linear models (0.79 ± 0.03). For each column, the resulting model was applied to identify leachables from actual plastic packaging samples. An in-depth investigation of the top 20 most intense molecular features revealed that all false-positives could be identified as outliers in the QSRR models (outside of the 95% prediction bands). Furthermore, analyzing a sample on multiple chromatographic columns and applying the associated QSRR models increased the capacity to filter false positives. Such an approach will contribute to a more effective identification of unknown or unexpected leachables in plastics (e.g. non-intended added substances), therefore refining our understanding of the chemical risks associated with food contact materials.
Collapse
Affiliation(s)
- Ziyun Xu
- Department of Food Science and Agricultural Chemistry, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Hamza Chughtai
- Department of Food Science and Agricultural Chemistry, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Lei Tian
- Department of Food Science and Agricultural Chemistry, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | - Lan Liu
- Department of Food Science and Agricultural Chemistry, McGill University, Ste-Anne-de-Bellevue, QC, Canada
| | | | - Stéphane Bayen
- Department of Food Science and Agricultural Chemistry, McGill University, Ste-Anne-de-Bellevue, QC, Canada.
| |
Collapse
|
10
|
Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023; 28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Spiking neural networks are biologically inspired machine learning algorithms attracting researchers' attention for their applicability to alternative energy-efficient hardware other than traditional computers. In the current work, spiking neural networks have been tested in a quantitative structure-activity analysis targeting the toxicity of molecules. Multiple public-domain databases of compounds have been evaluated with spiking neural networks, achieving accuracies compatible with high-quality frameworks presented in the previous literature. The numerical experiments also included an analysis of hyperparameters and tested the spiking neural networks on molecular fingerprints of different lengths. Proposing alternatives to traditional software and hardware for time- and resource-consuming tasks, such as those found in chemoinformatics, may open the door to new research and improvements in the field.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
- Enginsoft SpA, 35129 Padua, Italy
- Correspondence:
| | - Lia Rimondini
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
| |
Collapse
|
11
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
12
|
Abrahamsson D, Siddharth A, Robinson JF, Soshilov A, Elmore S, Cogliano V, Ng C, Khan E, Ashton R, Chiu WA, Fung J, Zeise L, Woodruff TJ. Modeling the transplacental transfer of small molecules using machine learning: a case study on per- and polyfluorinated substances (PFAS). JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2022; 32:808-819. [PMID: 36207486 PMCID: PMC9742309 DOI: 10.1038/s41370-022-00481-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 09/14/2022] [Accepted: 09/15/2022] [Indexed: 05/10/2023]
Abstract
BACKGROUND Despite their large numbers and widespread use, very little is known about the extent to which per- and polyfluoroalkyl substances (PFAS) can cross the placenta and expose the developing fetus. OBJECTIVE The aim of our study is to develop a computational approach that can be used to evaluate the of extend to which small molecules, and in particular PFAS, can cross to cross the placenta and partition to cord blood. METHODS We collected experimental values of the concentration ratio between cord and maternal blood (RCM) for 260 chemical compounds and calculated their physicochemical descriptors using the cheminformatics package Mordred. We used the compiled database to, train and test an artificial neural network (ANN). And then applied the best performing model to predict RCM for a large dataset of PFAS chemicals (n = 7982). We, finally, examined the calculated physicochemical descriptors of the chemicals to identify which properties correlated significantly with RCM. RESULTS We determined that 7855 compounds were within the applicability domain and 127 compounds are outside the applicability domain of our model. Our predictions of RCM for PFAS suggested that 3623 compounds had a log RCM > 0 indicating preferable partitioning to cord blood. Some examples of these compounds were bisphenol AF, 2,2-bis(4-aminophenyl)hexafluoropropane, and nonafluoro-tert-butyl 3-methylbutyrate. SIGNIFICANCE These observations have important public health implications as many PFAS have been shown to interfere with fetal development. In addition, as these compounds are highly persistent and many of them can readily cross the placenta, they are expected to remain in the population for a long time as they are being passed from parent to offspring. IMPACT Understanding the behavior of chemicals in the human body during pregnancy is critical in preventing harmful exposures during critical periods of development. Many chemicals can cross the placenta and expose the fetus, however, the mechanism by which this transport occurs is not well understood. In our study, we developed a machine learning model that describes the transplacental transfer of chemicals as a function of their physicochemical properties. The model was then used to make predictions for a set of 7982 per- and polyfluorinated alkyl substances that are listed on EPA's CompTox Chemicals Dashboard. The model can be applied to make predictions for other chemical categories of interest, such as plasticizers and pesticides. Accurate predictions of RCM can help scientists and regulators to prioritize chemicals that have the potential to cause harm by exposing the fetus.
Collapse
Affiliation(s)
- Dimitri Abrahamsson
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA.
| | - Adi Siddharth
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA
| | - Joshua F Robinson
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA
| | - Anatoly Soshilov
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Sarah Elmore
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Vincent Cogliano
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Carla Ng
- Department of Civil and Environmental Engineering, University of Pittsburgh, 3700 O'Hara St, Pittsburgh, PA, 15261, USA
| | - Elaine Khan
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Randolph Ashton
- Wisconsin Institute for Discovery, University of Wisconsin, Madison, 330 N Orchard St, Madison, WI, 53715, USA
- The Stem Cell and Regenerative Medicine Center, University of Wisconsin, Madison, 1111 Highland Avenue, Madison, WI, 53705, USA
- Department of Biomedical Engineering, University of Wisconsin - Madison, 1550 Engineering Drive, Madison, WI, 53706, USA
| | - Weihsueh A Chiu
- Department of Veterinary Physiology and Pharmacology, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843, USA
| | - Jennifer Fung
- Department of Obstetrics, Gynecology, and Reproductive Science and the Center of Reproductive Science, University of California, San Francisco, San Francisco, CA, 94143-2240, USA
| | - Lauren Zeise
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1001 I St, Sacramento, CA, 95814, USA
- California Environmental Protection Agency, Office of Environmental Health Hazard Assessment, 1515 Clay St, Oakland, CA, 94612, USA
| | - Tracey J Woodruff
- Department of Obstetrics, Gynecology and Reproductive Sciences, Program on Reproductive Health and the Environment, University of California, San Francisco, 490 Illinois Street, San Francisco, CA, 94143, USA.
| |
Collapse
|
13
|
Gao Z, Xia R, Zhang P. Prediction of anti-proliferation effect of [1,2,3]triazolo[4,5-d]pyrimidine derivatives by random forest and mix-kernel function SVM with PSO. Chem Pharm Bull (Tokyo) 2022; 70:684-693. [PMID: 35922903 DOI: 10.1248/cpb.c22-00376] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In order to predict the anti-gastric cancer effect of [1,2,3]triazolo[4,5-d]pyrimidine derivatives (1,2,3-TPD), quantitative structure-activity relationship (QSAR) studies were performed. Based on five descriptors selected from descriptors pool, four QSAR models were established by heuristic method (HM), random forest (RF), support vector machine with radial basis kernel function (RBF-SVM), and mix-kernel function support vector machine (MIX-SVM) including radial basis kernel and polynomial kernel function. Furthermore, the model built by RF explained the importance of the descriptors selected by HM. Compared with RBF-SVM, the MIX-SVM enhanced the generalization and learning ability of the constructed model simultaneously and the multi parameters optimization problem in this method was also solved by particle swarm optimization (PSO) algorithm with very low complexity and fast convergence. Besides, leave-one-out cross validation (LOO-CV) was adopted to test the robustness of the models and Q2 was used to describe the results. And the MIX-SVM model showed the best prediction ability and strongest model robustness: R2 = 0.927, Q2 = 0.916, MSE = 0.027 for the training set and R2 = 0.946, Q2 = 0.913, MSE = 0.023 for the test set. This study reveals five key descriptors of 1,2,3-TPD and will provide help to screen out efficient and novel drugs in the future.
Collapse
Affiliation(s)
- Zhan Gao
- College of Computer Science and Technology, Qingdao University
| | - Runze Xia
- College of Computer Science and Technology, Qingdao University
| | - Peijian Zhang
- College of Computer Science and Technology, Qingdao University
| |
Collapse
|
14
|
Yoo JE, Rho M. Large-Scale Survey Data Analysis with Penalized Regression: A Monte Carlo Simulation on Missing Categorical Predictors. MULTIVARIATE BEHAVIORAL RESEARCH 2022; 57:642-657. [PMID: 33703972 DOI: 10.1080/00273171.2021.1891856] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With the advent of the big data era, machine learning methods have evolved and proliferated. This study focused on penalized regression, a procedure that builds interpretive prediction models among machine learning methods. In particular, penalized regression coupled with large-scale data can explore hundreds or thousands of variables in one statistical model without convergence problems and identify yet uninvestigated important predictors. As one of the first Monte Carlo simulation studies to investigate predictive modeling with missing categorical predictors in the context of social science research, this study endeavored to emulate real social science large-scale data. Likert-scaled variables were simulated as well as multiple-category and count variables. Due to the inclusion of the categorical predictors in modeling, penalized regression methods that consider the grouping effect were employed such as group Mnet. We also examined the applicability of the simulation conditions with a real large-scale dataset that the simulation study referenced. Particularly, the study presented selection counts of variables after multiple iterations of modeling in order to consider the bias resulting from data-splitting in model validation. Selection counts turned out to be a necessary tool when variable selection is of research interest. Efforts to utilize large-scale data to the fullest appear to offer a valid approach to mitigate the effect of nonignorable missingness. Overall, penalized regression which assumes linearity is a viable method to analyze social science large-scale survey data.
Collapse
Affiliation(s)
- Jin Eun Yoo
- Department of Education, Korea National University of Education
| | - Minjeong Rho
- Department of Education, Korea National University of Education
| |
Collapse
|
15
|
Ji Y, Li R, Tian Y, Chen G, Yan A. Classification models and SAR analysis on thromboxane A 2 synthase inhibitors by machine learning methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:429-462. [PMID: 35678125 DOI: 10.1080/1062936x.2022.2078880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
Thromboxane A2 synthase (TXS) is a promising drug target for cardiovascular diseases and cancer. In this work, we conducted a structure-activity relationship (SAR) study on 526 TXS inhibitors for bioactivity prediction. Three types of descriptors (MACCS fingerprints, ECFP4 fingerprints, and MOE descriptors) were utilized to characterize inhibitors, 24 classification models were developed by support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and deep neural networks (DNN). Then we reduced the number of fingerprints according to the contribution of descriptors to the models, and constructed 16 extra models on simplified fingerprints. In general, Model_4D built by DNN algorithm and 67 bits MACCS fingerprints performs best. The prediction accuracy of the model on the test set is 0.969, and Matthews correlation coefficient (MCC) is 0.936. The distance between compound and model (dSTD-PRO) was used to characterize the application domain of the model. In the test set of Model_4D, dSTD-PRO of 91.5% compounds is lower than the corresponding training set threshold (threshold0.90 = 0.1055), and the accuracy of these compounds is 0.983. In addition, the important descriptors were summarized and further analyzed. It showed that aromatic nitrogenous heterocyclic groups were beneficial to improve the bioactivity of TXS inhibitors.
Collapse
Affiliation(s)
- Y Ji
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - R Li
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - Y Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| | - G Chen
- College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - A Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, P. R. China
| |
Collapse
|
16
|
Prediction of the Neurotoxic Potential of Chemicals Based on Modelling of Molecular Initiating Events Upstream of the Adverse Outcome Pathways of (Developmental) Neurotoxicity. Int J Mol Sci 2022; 23:ijms23063053. [PMID: 35328472 PMCID: PMC8954925 DOI: 10.3390/ijms23063053] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 12/23/2022] Open
Abstract
Developmental and adult/ageing neurotoxicity is an area needing alternative methods for chemical risk assessment. The formulation of a strategy to screen large numbers of chemicals is highly relevant due to potential exposure to compounds that may have long-term adverse health consequences on the nervous system, leading to neurodegeneration. Adverse Outcome Pathways (AOPs) provide information on relevant molecular initiating events (MIEs) and key events (KEs) that could inform the development of computational alternatives for these complex effects. We propose a screening method integrating multiple Quantitative Structure–Activity Relationship (QSAR) models. The MIEs of existing AOP networks of developmental and adult/ageing neurotoxicity were modelled to predict neurotoxicity. Random Forests were used to model each MIE. Predictions returned by single models were integrated and evaluated for their capability to predict neurotoxicity. Specifically, MIE predictions were used within various types of classifiers and compared with other reference standards (chemical descriptors and structural fingerprints) to benchmark their predictive capability. Overall, classifiers based on MIE predictions returned predictive performances comparable to those based on chemical descriptors and structural fingerprints. The integrated computational approach described here will be beneficial for large-scale screening and prioritisation of chemicals as a function of their potential to cause long-term neurotoxic effects.
Collapse
|
17
|
Carrera GVSM, Inês J, Bernardes CES, Klimenko K, Shimizu K, Canongia Lopes JN. The Solubility of Gases in Ionic Liquids: A Chemoinformatic Predictive and Interpretable Approach. Chemphyschem 2021; 22:2190-2200. [PMID: 34464013 DOI: 10.1002/cphc.202100632] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Indexed: 11/07/2022]
Abstract
This work comprises the study of solubilities of gases in ionic liquids (ILs) using a chemoinformatic approach. It is based on the codification, of the atomic inter-component interactions, cation/gas and anion/gas, which are used to obtain a pattern of activation in a Kohonen Neural Network (MOLMAP descriptors). A robust predictive model has been obtained with the Random Forest algorithm and used the maximum proximity as a confidence measure of a given chemical system compared to the training set. The encoding method has been validated with molecular dynamics. This encoding approach is a valuable estimator of attractive/repulsive interactions of a generical chemical system IL+gas. This method has been used as a fast/visual form of identification of the reasons behind the differences observed between the solubility of CO2 and O2 in 1-butyl-3-methylimidazolium hexafluorophosphate (BMIM PF6 ) at identical temperature and pressure (TP) conditions, The effect of variable cation and anion effect has been evaluated.
Collapse
Affiliation(s)
- Gonçalo V S M Carrera
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - João Inês
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - Carlos E S Bernardes
- Centro de Química Estrutural, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| | - Kyrylo Klimenko
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - Karina Shimizu
- Centro de Química Estrutural, Department of Chemical and Biological Engineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
| | - José N Canongia Lopes
- Centro de Química Estrutural, Department of Chemical and Biological Engineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
| |
Collapse
|
18
|
Gajewicz-Skretna A, Furuhama A, Yamamoto H, Suzuki N. Generating accurate in silico predictions of acute aquatic toxicity for a range of organic chemicals: Towards similarity-based machine learning methods. CHEMOSPHERE 2021; 280:130681. [PMID: 34162070 DOI: 10.1016/j.chemosphere.2021.130681] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 06/13/2023]
Abstract
There has been an increase in the use of non-animal approaches, such as in silico and/or in vitro methods, for assessing the risks of hazardous chemicals. A number of machine learning algorithms link molecular descriptors that interpret chemical structural properties with their biological activity. These computer-aided methods encounter several challenges, the most significant being the heterogeneity of datasets; more efficient and inclusive computational methods that are able to process large and heterogeneous chemical datasets are needed. In this context, this study verifies the utility of similarity-based machine learning methods in predicting the acute aquatic toxicity of diverse organic chemicals on Daphnia magna and Oryzias latipes. Two similarity-based methods were tested that employ a limited training dataset, most similar to a given fitting point, instead of using the entire dataset that encompasses a wide range of chemicals. The kernel-weighted local polynomial approach had a number of advantages over the distance-weighted k-nearest neighbor (k-NN) algorithm. The results highlight the importance of lipophilicity, electrophilic reactivity, molecular polarizability, and size in determining acute toxicity. The rigorous model validation ensures that this approach is an important tool for estimating toxicity in new or untested chemicals.
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| | - Ayako Furuhama
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan; Division of Genetics and Mutagenesis, National Institute of Health Sciences (NIHS), 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki City, Kanagawa, 210-9501, Japan
| | - Hiroshi Yamamoto
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan
| | - Noriyuki Suzuki
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan
| |
Collapse
|
19
|
Fernandes PO, Martins DM, de Souza Bozzi A, Martins JPA, de Moraes AH, Maltarollo VG. Molecular insights on ABL kinase activation using tree-based machine learning models and molecular docking. Mol Divers 2021; 25:1301-1314. [PMID: 34191245 PMCID: PMC8241884 DOI: 10.1007/s11030-021-10261-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/18/2021] [Indexed: 12/14/2022]
Abstract
Abelson kinase (c-Abl) is a non-receptor tyrosine kinase involved in several biological processes essential for cell differentiation, migration, proliferation, and survival. This enzyme's activation might be an alternative strategy for treating diseases such as neutropenia induced by chemotherapy, prostate, and breast cancer. Recently, a series of compounds that promote the activation of c-Abl has been identified, opening a promising ground for c-Abl drug development. Structure-based drug design (SBDD) and ligand-based drug design (LBDD) methodologies have significantly impacted recent drug development initiatives. Here, we combined SBDD and LBDD approaches to characterize critical chemical properties and interactions of identified c-Abl's activators. We used molecular docking simulations combined with tree-based machine learning models-decision tree, AdaBoost, and random forest to understand the c-Abl activators' structural features required for binding to myristoyl pocket, and consequently, to promote enzyme and cellular activation. We obtained predictive and robust models with Matthews correlation coefficient values higher than 0.4 for all endpoints and identified characteristics that led to constructing a structure-activity relationship model (SAR).
Collapse
Affiliation(s)
- Philipe Oliveira Fernandes
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Diego Magno Martins
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Aline de Souza Bozzi
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - João Paulo A Martins
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Adolfo Henrique de Moraes
- Departamento de Química, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Vinícius Gonçalves Maltarollo
- Departamento de Produtos Farmacêuticos, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
| |
Collapse
|
20
|
Kuz’min V, Artemenko A, Ognichenko L, Hromov A, Kosinskaya A, Stelmakh S, Sessions ZL, Muratov EN. Simplex representation of molecular structure as universal QSAR/QSPR tool. Struct Chem 2021; 32:1365-1392. [PMID: 34177203 PMCID: PMC8218296 DOI: 10.1007/s11224-021-01793-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/07/2021] [Indexed: 10/24/2022]
Abstract
We review the development and application of the Simplex approach for the solution of various QSAR/QSPR problems. The general concept of the simplex method and its varieties are described. The advantages of utilizing this methodology, especially for the interpretation of QSAR/QSPR models, are presented in comparison to other fragmentary methods of molecular structure representation. The utility of SiRMS is demonstrated not only in the standard QSAR/QSPR applications, but also for mixtures, polymers, materials, and other complex systems. In addition to many different types of biological activity (antiviral, antimicrobial, antitumor, psychotropic, analgesic, etc.), toxicity and bioavailability, the review examines the simulation of important properties, such as water solubility, lipophilicity, as well as luminescence, and thermodynamic properties (melting and boiling temperatures, critical parameters, etc.). This review focuses on the stereochemical description of molecules within the simplex approach and details the possibilities of universal molecular stereo-analysis and stereochemical configuration description, along with stereo-isomerization mechanism and molecular fragment "topography" identification.
Collapse
Affiliation(s)
- Victor Kuz’min
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anatoly Artemenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Luidmyla Ognichenko
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Alexander Hromov
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Anna Kosinskaya
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
- Department of Medical Chemistry, Odessa National Medical University, Odessa, 65082 Ukraine
| | - Sergij Stelmakh
- Department of Molecular Structures and Chemoinformatics, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080 Ukraine
| | - Zoe L. Sessions
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Eugene N. Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599 USA
- Department of Pharmaceutical Sciences, Federal University of Paraiba, Joao Pessoa, PB 58059 Brazil
| |
Collapse
|
21
|
Abstract
Toxicity analysis is a major challenge in drug design and discovery. Recently significant progress has been made through machine learning due to its accuracy, efficiency, and lower cost. US Toxicology in the 21st Century (Tox21) screened a large library of compounds, including approximately 12 000 environmental chemicals and drugs, for different mechanisms responsible for eliciting toxic effects. The Tox21 Data Challenge offered a platform to evaluate different computational methods for toxicity predictions. Inspired by the success of multiscale weighted colored graph (MWCG) theory in protein-ligand binding affinity predictions, we consider MWCG theory for toxicity analysis. In the present work, we develop a geometric graph learning toxicity (GGL-Tox) model by integrating MWCG features and the gradient boosting decision tree (GBDT) algorithm. The benchmark tests of the Tox21 Data Challenge are employed to demonstrate the utility and usefulness of the proposed GGL-Tox model. An extensive comparison with other state-of-the-art models indicates that GGL-Tox is an accurate and efficient model for toxicity analysis and prediction.
Collapse
Affiliation(s)
- Jian Jiang
- Research Center of Nonlinear Science, College of Mathematics and Computer Science, Engineering Research Center of Hubei Province for Clothing Information, Wuhan Textile University, Wuhan 430200, P R. China
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
22
|
Schmidt F. Computational Toxicology. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11534-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
23
|
Idakwo G, Thangapandian S, Luttrell J, Li Y, Wang N, Zhou Z, Hong H, Yang B, Zhang C, Gong P. Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets. J Cheminform 2020; 12:66. [PMID: 33372637 PMCID: PMC7592558 DOI: 10.1186/s13321-020-00468-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 10/13/2020] [Indexed: 12/14/2022] Open
Abstract
The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F1 score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.
Collapse
Affiliation(s)
- Gabriel Idakwo
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Sundar Thangapandian
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Yan Li
- Bennett Aerospace Inc, Cary, NC, 27518, USA
| | - Nan Wang
- Department of Computer Science, New Jersey City University, Jersey City, NJ, 07305, USA
| | - Zhaoxian Zhou
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Centre for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Bei Yang
- School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, 39406, USA.
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA.
| |
Collapse
|
24
|
Wang Y, Chen X. A joint optimization QSAR model of fathead minnow acute toxicity based on a radial basis function neural network and its consensus modeling. RSC Adv 2020; 10:21292-21308. [PMID: 35518745 PMCID: PMC9054390 DOI: 10.1039/d0ra02701d] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 05/24/2020] [Indexed: 01/07/2023] Open
Abstract
Acute toxicity of the fathead minnow (Pimephales promelas) is an important indicator to evaluate the hazards and risks of compounds in aquatic environments. The aim of our study is to explore the predictive power of the quantitative structure-activity relationship (QSAR) model based on a radial basis function (RBF) neural network with the joint optimization method to study the acute toxicity mechanism, and to develop a potential acute toxicity prediction model, for fathead minnow. To ensure the symmetry and fairness of the data splitting and to generate multiple chemically diverse training and validation sets, we used a self-organizing mapping (SOM) neural network to split the modeling dataset (containing 955 compounds) characterized by PaDEL-descriptors. After preliminary selection of descriptors via the mean decrease impurity method, a hybrid quantum particle swarm optimization (HQPSO) algorithm was used to jointly optimize the parameters of RBF and select the key descriptors. We established 20 RBF-based QSAR models, and the statistical results showed that the 10-fold cross-validation results (R cv10 2) and the adjusted coefficients of determination (R adj 2) were all great than 0.7 and 0.8, respectively. The Q ext 2 of these models was between 0.6480 and 0.7317, and the R ext 2 was between 0.6563 and 0.7318. Combined with the frequency and importance of the descriptors used in RBF-based models, and the correlation between the descriptors and acute toxicity, we concluded that the water distribution coefficient, molar refractivity, and first ionization potential are important factors affecting the acute toxicity of fathead minnow. A consensus QSAR model with RBF-based models was established; this model showed good performance with R 2 = 0.9118, R cv10 2 = 0.7632, and Q ext 2 = 0.7430. A frequency weighted and distance (FWD)-based application domain (AD) definition method was proposed, and the outliers were analyzed carefully. Compared with previous studies the method proposed in this paper has obvious advantages and its robustness and external predictive power are also better than Xgboost-based model. It is an effective QSAR modeling method.
Collapse
Affiliation(s)
- Yukun Wang
- School of Chemical Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| | - Xuebo Chen
- School of Electronic and Information Engineering, University of Science and Technology Liaoning No. 185, Qianshan Anshan 114051 Liaoning China +864125928367
| |
Collapse
|
25
|
Mozafari Z, Arab Chamjangali M, Beglari M, Doosti R. The efficiency of ligand-receptor interaction information alone as new descriptors in QSAR modeling via random forest artificial neural network. Chem Biol Drug Des 2020; 96:812-824. [PMID: 32259386 DOI: 10.1111/cbdd.13690] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 02/15/2020] [Accepted: 03/15/2020] [Indexed: 11/28/2022]
Abstract
A new approach is introduced for the construction of a predictive quantitative structure-activity relationship model in which only ligand-receptor (LR) interaction features are used as relevant descriptors. This approach combines the benefit of the random forest (RF) as a new variable selection method with the intrinsic capability of the artificial neural network (ANN). The interaction information of the ligand-receptor (LR) complex was used as molecular docking descriptors. The most relevant descriptors were selected using the RF technique and used as inputs of ANN. The proposed RF ANN (RF-LM-ANN) method was optimized and then evaluated by the prediction of pEC50 for some of the azine derivatives as non-nucleoside reverse transcriptase inhibitors. RF-LM-ANN model under the optimal conditions was evaluated using internal (validation) and external test sets. The determination coefficients of the external test and validation sets were 0.88 and 0.89, respectively. The mean square deviation (MSE) values for the prediction of biological activities in the external test and validation sets were found to be 0.10 and 0.11, respectively. The results obtained demonstrated the good prediction ability and high generalizability of the proposed RF-LM-ANN model based on the MMDs alone.
Collapse
Affiliation(s)
- Zeinab Mozafari
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| | | | - Mozhgan Beglari
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| | - Rahele Doosti
- Department of Chemistry, Shahrood University of Technology, Shahrood, Iran
| |
Collapse
|
26
|
Chen CH, Tanaka K, Kotera M, Funatsu K. Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications. J Cheminform 2020; 12:19. [PMID: 33430997 PMCID: PMC7106596 DOI: 10.1186/s13321-020-0417-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 02/05/2020] [Indexed: 12/23/2022] Open
Abstract
Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine’s inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.
Collapse
Affiliation(s)
- Chia-Hsiu Chen
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kenichi Tanaka
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Masaaki Kotera
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan.
| |
Collapse
|
27
|
Jiao Z, Yuan S, Zhang Z, Wang Q. Machine learning prediction of hydrocarbon mixture lower flammability limits using quantitative structure‐property relationship models. PROCESS SAFETY PROGRESS 2019. [DOI: 10.1002/prs.12103] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Zeren Jiao
- Mary Kay O'Connor Process Safety Center, Artie McFerrin Department of Chemical EngineeringTexas A&M University College Station Texas
| | - Shuai Yuan
- Mary Kay O'Connor Process Safety Center, Artie McFerrin Department of Chemical EngineeringTexas A&M University College Station Texas
| | - Zhuoran Zhang
- Mary Kay O'Connor Process Safety Center, Artie McFerrin Department of Chemical EngineeringTexas A&M University College Station Texas
| | - Qingsheng Wang
- Mary Kay O'Connor Process Safety Center, Artie McFerrin Department of Chemical EngineeringTexas A&M University College Station Texas
| |
Collapse
|
28
|
Toxicity Prediction Method Based on Multi-Channel Convolutional Neural Network. Molecules 2019; 24:molecules24183383. [PMID: 31533341 PMCID: PMC6766985 DOI: 10.3390/molecules24183383] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 09/03/2019] [Accepted: 09/13/2019] [Indexed: 02/08/2023] Open
Abstract
Molecular toxicity prediction is one of the key studies in drug design. In this paper, a deep learning network based on a two-dimension grid of molecules is proposed to predict toxicity. At first, the van der Waals force and hydrogen bond were calculated according to different descriptors of molecules, and multi-channel grids were generated, which could discover more detail and helpful molecular information for toxicity prediction. The generated grids were fed into a convolutional neural network to obtain the result. A Tox21 dataset was used for the evaluation. This dataset contains more than 12,000 molecules. It can be seen from the experiment that the proposed method performs better compared to other traditional deep learning and machine learning methods.
Collapse
|
29
|
Zhang Y, Zhao J, Wang Y, Fan Y, Zhu L, Yang Y, Chen X, Lu T, Chen Y, Liu H. Prediction of hERG K+ channel blockage using deep neural networks. Chem Biol Drug Des 2019; 94:1973-1985. [PMID: 31394026 DOI: 10.1111/cbdd.13600] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 07/23/2019] [Accepted: 07/30/2019] [Indexed: 01/08/2023]
Abstract
Human ether-a-go-go-related gene (hERG) K+ channel blockage may cause severe cardiac side-effects and has become a serious issue in safety evaluation of drug candidates. Therefore, improving the ability to avoid undesirable hERG activity in the early stage of drug discovery is of significant importance. The purpose of this study was to build predictive models of hERG activity by deep neural networks. For each combination of sampling methods and descriptors, deep neural networks with different architectures were implemented to build classification models. The optimal model M15 with three hidden layers, undersampling method, and 2D descriptors yielded the prediction accuracy of 0.78 and F1 score of 0.75 on the test set as well as accuracy of 0.77 and F1 score of 0.34 on the external validation set, outperforming the other 35 models including 9 random forest models. Particularly, the optimal model M15 achieved the highest F1 score and the second highest accuracy when compared with other five methods from four groups using different machine learning algorithms with the same external validation set. It can be believed that this model has powerful capability on prediction of hERG toxicity, which is of great benefit for developing novel drug candidates.
Collapse
Affiliation(s)
- Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Junnan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuchen Wang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuanrong Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yan Yang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Xingye Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
30
|
Gadaleta D, Vuković K, Toma C, Lavado GJ, Karmaus AL, Mansouri K, Kleinstreuer NC, Benfenati E, Roncaglioni A. SAR and QSAR modeling of a large collection of LD 50 rat acute oral toxicity data. J Cheminform 2019; 11:58. [PMID: 33430989 PMCID: PMC6717335 DOI: 10.1186/s13321-019-0383-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 08/13/2019] [Indexed: 11/10/2022] Open
Abstract
The median lethal dose for rodent oral acute toxicity (LD50) is a standard piece of information required to categorize chemicals in terms of the potential hazard posed to human health after acute exposure. The exclusive use of in vivo testing is limited by the time and costs required for performing experiments and by the need to sacrifice a number of animals. (Quantitative) structure-activity relationships [(Q)SAR] proved a valid alternative to reduce and assist in vivo assays for assessing acute toxicological hazard. In the framework of a new international collaborative project, the NTP Interagency Center for the Evaluation of Alternative Toxicological Methods and the U.S. Environmental Protection Agency's National Center for Computational Toxicology compiled a large database of rat acute oral LD50 data, with the aim of supporting the development of new computational models for predicting five regulatory relevant acute toxicity endpoints. In this article, a series of regression and classification computational models were developed by employing different statistical and knowledge-based methodologies. External validation was performed to demonstrate the real-life predictability of models. Integrated modeling was then applied to improve performance of single models. Statistical results confirmed the relevance of developed models in regulatory frameworks, and confirmed the effectiveness of integrated modeling. The best integrated strategies reached RMSEs lower than 0.50 and the best classification models reached balanced accuracies over 0.70 for multi-class and over 0.80 for binary endpoints. Computed predictions will be hosted on the EPA's Chemistry Dashboard and made freely available to the scientific community.
Collapse
Affiliation(s)
- Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy.
| | - Kristijan Vuković
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
- Institute for Risk Assessment Sciences, Utrecht University, PO Box 80177, 3508 TD, Utrecht, The Netherlands
| | - Giovanna J Lavado
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Agnes L Karmaus
- Integrated Laboratory Systems, Research Triangle Park, NC, 27560, USA
| | - Kamel Mansouri
- Integrated Laboratory Systems, Research Triangle Park, NC, 27560, USA
| | - Nicole C Kleinstreuer
- NTP Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, 27560, USA
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| |
Collapse
|
31
|
Suthar M. Applying several machine learning approaches for prediction of unconfined compressive strength of stabilized pond ashes. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04411-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
32
|
Majumdar S, Basak SC, Lungu CN, Diudea MV, Grunwald GD. Finding Needles in a Haystack: Determining Key Molecular Descriptors Associated with the Blood-brain Barrier Entry of Chemical Compounds Using Machine Learning. Mol Inform 2019; 38:e1800164. [PMID: 31322827 DOI: 10.1002/minf.201800164] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Accepted: 04/11/2019] [Indexed: 12/23/2022]
Abstract
In this paper we used two sets of calculated molecular descriptors to predict blood-brain barrier (BBB) entry of a collection of 415 chemicals. The set of 579 descriptors were calculated by Schrodinger and TopoCluj software. Polly and Triplet software were used to calculate the second set of 198 descriptors. Following this, modelling and a two-deep, repeated external validation method was used for QSAR formulation. Results show that both sets of descriptors individually and their combination give models of reasonable prediction accuracy. We also uncover the effectiveness of a variable selection approach, by showing that for one of our descriptor sets, the top 5 % predictors in terms of random forest variable importance are able to provide a better performing model than the model with all predictors. The top influential descriptors indicate important aspects of molecular structural features that govern BBB entry of chemicals.
Collapse
Affiliation(s)
- Subhabrata Majumdar
- University of Florida Informatics Institute, 432 Newell Dr, CISE Bldg E251, Gainesville, FL 32611, USA.,Currently at: AT&T Labs Research
| | - Subhash C Basak
- Department of Chemistry and Biochemistry, University of Minnesota, 246 Chemistry Building, 1039 University Drive, Duluth, MN 55812, USA
| | - Claudiu N Lungu
- Department of Chemistry, Babes-Bolyai University, Strada Arany János 11, Cluj-Napoca, 400028, Romania
| | - Mircea V Diudea
- Department of Chemistry, Babes-Bolyai University, Strada Arany János 11, Cluj-Napoca, 400028, Romania
| | - Gregory D Grunwald
- Natural Resources Research Institute, University of Minnesota, 5013 Miller Trunk Highway, Duluth, MN 55811, USA
| |
Collapse
|
33
|
Assessment of the cardiovascular adverse effects of drug-drug interactions through a combined analysis of spontaneous reports and predicted drug-target interactions. PLoS Comput Biol 2019; 15:e1006851. [PMID: 31323029 PMCID: PMC6668846 DOI: 10.1371/journal.pcbi.1006851] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 07/31/2019] [Accepted: 06/29/2019] [Indexed: 12/11/2022] Open
Abstract
Adverse drug effects (ADEs) are one of the leading causes of death in developed countries and are the main reason for drug recalls from the market, whereas the ADEs that are associated with action on the cardiovascular system are the most dangerous and widespread. The treatment of human diseases often requires the intake of several drugs, which can lead to undesirable drug-drug interactions (DDIs), thus causing an increase in the frequency and severity of ADEs. An evaluation of DDI-induced ADEs is a nontrivial task and requires numerous experimental and clinical studies. Therefore, we developed a computational approach to assess the cardiovascular ADEs of DDIs. This approach is based on the combined analysis of spontaneous reports (SRs) and predicted drug-target interactions to estimate the five cardiovascular ADEs that are induced by DDIs, namely, myocardial infarction, ischemic stroke, ventricular tachycardia, cardiac failure, and arterial hypertension. We applied a method based on least absolute shrinkage and selection operator (LASSO) logistic regression to SRs for the identification of interacting pairs of drugs causing corresponding ADEs, as well as noninteracting pairs of drugs. As a result, five datasets containing, on average, 3100 potentially ADE-causing and non-ADE-causing drug pairs were created. The obtained data, along with information on the interaction of drugs with 1553 human targets predicted by PASS Targets software, were used to create five classification models using the Random Forest method. The average area under the ROC curve of the obtained models, sensitivity, specificity and balanced accuracy were 0.837, 0.764, 0.754 and 0.759, respectively. The predicted drug targets were also used to hypothesize the potential mechanisms of DDI-induced ventricular tachycardia for the top-scoring drug pairs. The created five classification models can be used for the identification of drug combinations that are potentially the most or least dangerous for the cardiovascular system.
Collapse
|
34
|
Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 2019; 11:41. [PMID: 31218493 PMCID: PMC6582521 DOI: 10.1186/s13321-019-0364-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 06/09/2019] [Indexed: 02/08/2023] Open
Abstract
The application of convolutional neural networks (ConvNets) to harness high-content screening images or 2D compound representations is gaining increasing attention in drug discovery. However, existing applications often require large data sets for training, or sophisticated pretraining schemes. Here, we show using 33 IC50 data sets from ChEMBL 23 that the in vitro activity of compounds on cancer cell lines and protein targets can be accurately predicted on a continuous scale from their Kekulé structure representations alone by extending existing architectures (AlexNet, DenseNet-201, ResNet152 and VGG-19), which were pretrained on unrelated image data sets. We show that the predictive power of the generated models, which just require standard 2D compound representations as input, is comparable to that of Random Forest (RF) models and fully-connected Deep Neural Networks trained on circular (Morgan) fingerprints. Notably, including additional fully-connected layers further increases the predictive power of the ConvNets by up to 10%. Analysis of the predictions generated by RF models and ConvNets shows that by simply averaging the output of the RF models and ConvNets we obtain significantly lower errors in prediction for multiple data sets, although the effect size is small, than those obtained with either model alone, indicating that the features extracted by the convolutional layers of the ConvNets provide complementary predictive signal to Morgan fingerprints. Lastly, we show that multi-task ConvNets trained on compound images permit to model COX isoform selectivity on a continuous scale with errors in prediction comparable to the uncertainty of the data. Overall, in this work we present a set of ConvNet architectures for the prediction of compound activity from their Kekulé structure representations with state-of-the-art performance, that require no generation of compound descriptors or use of sophisticated image processing techniques. The code needed to reproduce the results presented in this study and all the data sets are provided at https://github.com/isidroc/kekulescope .
Collapse
Affiliation(s)
- Isidro Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW UK
| |
Collapse
|
35
|
Melo-Filho CC, Braga RC, Muratov EN, Franco CH, Moraes CB, Freitas-Junior LH, Andrade CH. Discovery of new potent hits against intracellular Trypanosoma cruzi by QSAR-based virtual screening. Eur J Med Chem 2018; 163:649-659. [PMID: 30562700 DOI: 10.1016/j.ejmech.2018.11.062] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 11/21/2018] [Accepted: 11/23/2018] [Indexed: 12/17/2022]
Abstract
Chagas disease is a neglected tropical disease (NTD) caused by the protozoan parasite Trypanosoma cruzi and is primarily transmitted to humans by the feces of infected Triatominae insects during their blood meal. The disease affects 6-8 million people, mostly in Latin America countries, and kills more people in the region each year than any other parasite-born disease, including malaria. Moreover, patient numbers are currently increasing in non-endemic, developed countries, such as Australia, Japan, Canada, and the United States. The treatment is limited to one drug, benznidazole, which is only effective in the acute phase of the disease and is very toxic. Thus, there is an urgent need to develop new, safer, and effective drugs against the chronic phase of Chagas disease. Using a QSAR-based virtual screening followed by in vitro experimental evaluation, we report herein the identification of novel potent and selective hits against T. cruzi intracellular stage. We developed and validated binary QSAR models for prediction of anti-trypanosomal activity and cytotoxicity against mammalian cells using the best practices for QSAR modeling. These models were then used for virtual screening of a commercial database, leading to the identification of 39 virtual hits. Further in vitro assays showed that seven compounds were potent against intracellular T. cruzi at submicromolar concentrations (EC50 < 1 μM) and were very selective (SI > 30). Furthermore, other six compounds were also inside the hit criteria for Chagas disease, which presented activity at low micromolar concentrations (EC50 < 10 μM) against intracellular T. cruzi and were also selective (SI > 15). Moreover, we performed a multi-parameter analysis for the comparison of tested compounds regarding their balance between potency, selectivity, and predicted ADMET properties. In the next studies, the most promising compounds will be submitted to additional in vitro and in vivo assays in acute model of Chagas disease, and can be further optimized for the development of new promising drug candidates against this important yet neglected disease.
Collapse
Affiliation(s)
- Cleber C Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmacia, Universidade Federal de Goiás - UFG, Rua 240, Qd.87, Goiania, GO, 74605-510, Brazil
| | - Rodolpho C Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmacia, Universidade Federal de Goiás - UFG, Rua 240, Qd.87, Goiania, GO, 74605-510, Brazil
| | - Eugene N Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA; Department of Chemical Technology, Odessa National Polytechnic University, 1. Shevchenko Ave., Odessa, 65000, Ukraine
| | - Caio Haddad Franco
- National Laboratory of Biosciences (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, SP, 13083-970, Brazil
| | - Carolina B Moraes
- National Laboratory of Biosciences (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, SP, 13083-970, Brazil; Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, SP, 05508-900, Brazil
| | - Lucio H Freitas-Junior
- National Laboratory of Biosciences (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, SP, 13083-970, Brazil; Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, SP, 05508-900, Brazil
| | - Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmacia, Universidade Federal de Goiás - UFG, Rua 240, Qd.87, Goiania, GO, 74605-510, Brazil.
| |
Collapse
|
36
|
Kensert A, Alvarsson J, Norinder U, Spjuth O. Evaluating parameters for ligand-based modeling with random forest on sparse data sets. J Cheminform 2018; 10:49. [PMID: 30306349 PMCID: PMC6755600 DOI: 10.1186/s13321-018-0304-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 10/03/2018] [Indexed: 11/10/2022] Open
Abstract
Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints ([Formula: see text]), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.
Collapse
Affiliation(s)
- Alexander Kensert
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Ulf Norinder
- Unit of Toxicology Sciences, Karolinska Institutet, Swetox, Forskargatan 20, SE-15136, Södertälje, Sweden.,Department of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164 07, Kista, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
37
|
Wu Y, Wang G. Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis. Int J Mol Sci 2018; 19:E2358. [PMID: 30103448 PMCID: PMC6121588 DOI: 10.3390/ijms19082358] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 07/31/2018] [Accepted: 08/08/2018] [Indexed: 02/07/2023] Open
Abstract
Toxicity prediction is very important to public health. Among its many applications, toxicity prediction is essential to reduce the cost and labor of a drug's preclinical and clinical trials, because a lot of drug evaluations (cellular, animal, and clinical) can be spared due to the predicted toxicity. In the era of Big Data and artificial intelligence, toxicity prediction can benefit from machine learning, which has been widely used in many fields such as natural language processing, speech recognition, image recognition, computational chemistry, and bioinformatics, with excellent performance. In this article, we review machine learning methods that have been applied to toxicity prediction, including deep learning, random forests, k-nearest neighbors, and support vector machines. We also discuss the input parameter to the machine learning algorithm, especially its shift from chemical structural description only to that combined with human transcriptome data analysis, which can greatly enhance prediction accuracy.
Collapse
Affiliation(s)
- Yunyi Wu
- Department of Biology, Guangdong Provincial Key Laboratory of Cell Microenviroment and Disease Research, Southern University of Science and Technology, Shenzhen 518055, China.
| | - Guanyu Wang
- Department of Biology, Guangdong Provincial Key Laboratory of Cell Microenviroment and Disease Research, Southern University of Science and Technology, Shenzhen 518055, China.
| |
Collapse
|
38
|
Majumdar S, Basak SC, Lungu CN, Diudea MV, Grunwald GD. Mathematical structural descriptors and mutagenicity assessment: a study with congeneric and diverse datasets $. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:579-590. [PMID: 30025481 DOI: 10.1080/1062936x.2018.1496475] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 07/01/2018] [Indexed: 06/08/2023]
Abstract
Quantitative bioactivity and toxicity assessment of chemical compounds plays a central role in drug discovery as it saves a substantial amount of resources. To this end, high-performance computing has enabled researchers and practitioners to leverage hundreds, or even thousands, of computed molecular descriptors for the activity prediction of candidate compounds. In this paper, we evaluate the utility of two large groups of chemical descriptors by such predictive modelling, as well as chemical structure discovery, through empirical analysis. We use a suite of commercially available and in-house software to calculate molecular descriptors for two sets of chemical mutagens - a homogeneous set of 95 amines, and a diverse set of 508 chemicals. Using calculated descriptors, we model the mutagenic activity of these compounds using a number of methods from the statistics and machine-learning literature, and use robust principal component analysis to investigate the low-dimensional subspaces that characterize these chemicals. Our results suggest that combining different sets of descriptors is likely to result in a better predictive model - but that depends on the compounds being modelled and the modelling technique being used.
Collapse
Affiliation(s)
- S Majumdar
- a University of Florida Informatics Institute , Gainesville , USA
| | - S C Basak
- b Department of Chemistry and Biochemistry , University of Minnesota , Duluth MN , USA
| | - C N Lungu
- c Department of Chemistry , Babes-Bolyai University , Cluj-Napoca , Romania
| | - M V Diudea
- c Department of Chemistry , Babes-Bolyai University , Cluj-Napoca , Romania
| | - G D Grunwald
- d Natural Resources Research Institute , University of Minnesota , Duluth , USA
| |
Collapse
|
39
|
Gadaleta D, Manganelli S, Roncaglioni A, Toma C, Benfenati E, Mombelli E. QSAR Modeling of ToxCast Assays Relevant to the Molecular Initiating Events of AOPs Leading to Hepatic Steatosis. J Chem Inf Model 2018; 58:1501-1517. [PMID: 29949360 DOI: 10.1021/acs.jcim.8b00297] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Nonalcoholic hepatic steatosis is a worldwide epidemiological concern since it is among the most prominent hepatic diseases. Indeed, research in toxicology and epidemiology has gathered evidence that exposure to endocrine disruptors can perturb cellular homeostasis and cause this disease. Therefore, assessing the likelihood of a chemical to trigger hepatic steatosis is a matter of the utmost importance. However, systematic in vivo testing of all the chemicals humans are exposed to is not feasible for ethical and economical reasons. In this context, predicting the molecular initiating events (MIE) leading to hepatic steatosis by QSAR modeling is an issue of practical relevance in modern toxicology. In this article, we present QSAR models based on random forest classifiers and DRAGON molecular descriptors for the prediction of in vitro assays that are relevant to MIEs leading to hepatic steatosis. These assays were provided by the ToxCast program and proved to be predictive for the detection of chemical-induced steatosis. During the modeling process, special attention was paid to chemical and toxicological data curation. We adopted two modeling strategies (undersampling and balanced random forests) to develop robust QSAR models from unbalanced data sets. The two modeling approaches gave similar results in terms of predictivity, and most of the models satisfy a minimum percentage of correctly predicted chemicals equal to 75%. Finally, and most importantly, the developed models proved to be useful as an effective in silico screening test for hepatic steatosis.
Collapse
Affiliation(s)
- Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Serena Manganelli
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Enrico Mombelli
- Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO) , Institut National de l'Environnement Industriel et des Risques (INERIS) , 60550 Verneuil en Halatte , France
| |
Collapse
|
40
|
Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert DA, Hochreiter S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci 2018; 9:5441-5451. [PMID: 30155234 PMCID: PMC6011237 DOI: 10.1039/c8sc00148k] [Citation(s) in RCA: 262] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 05/16/2018] [Indexed: 12/24/2022] Open
Abstract
Deep learning is currently the most successful machine learning technique in a wide range of application areas and has recently been applied successfully in drug discovery research to predict potential drug targets and to screen for active molecules. However, due to (1) the lack of large-scale studies, (2) the compound series bias that is characteristic of drug discovery datasets and (3) the hyperparameter selection bias that comes with the high number of potential deep learning architectures, it remains unclear whether deep learning can indeed outperform existing computational methods in drug discovery tasks. We therefore assessed the performance of several deep learning methods on a large-scale drug discovery dataset and compared the results with those of other machine learning and target prediction methods. To avoid potential biases from hyperparameter selection or compound series, we used a nested cluster-cross-validation strategy. We found (1) that deep learning methods significantly outperform all competing methods and (2) that the predictive performance of deep learning is in many cases comparable to that of tests performed in wet labs (i.e., in vitro assays).
Collapse
Affiliation(s)
- Andreas Mayr
- LIT AI Lab and Institute of Bioinformatics , Johannes Kepler University Linz , Austria . ; ; Tel: +43-732-2468-4521
| | - Günter Klambauer
- LIT AI Lab and Institute of Bioinformatics , Johannes Kepler University Linz , Austria . ; ; Tel: +43-732-2468-4521
| | - Thomas Unterthiner
- LIT AI Lab and Institute of Bioinformatics , Johannes Kepler University Linz , Austria . ; ; Tel: +43-732-2468-4521
| | | | | | | | | | - Sepp Hochreiter
- LIT AI Lab and Institute of Bioinformatics , Johannes Kepler University Linz , Austria . ; ; Tel: +43-732-2468-4521
| |
Collapse
|
41
|
Piras P, Sheridan R, Sherer EC, Schafer W, Welch CJ, Roussel C. Modeling and predicting chiral stationary phase enantioselectivity: An efficient random forest classifier using an optimally balanced training dataset and an aggregation strategy. J Sep Sci 2018; 41:1365-1375. [PMID: 29383846 DOI: 10.1002/jssc.201701334] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 01/17/2018] [Accepted: 01/17/2018] [Indexed: 11/10/2022]
Abstract
Predicting whether a chiral column will be effective is a daily task for many analysts. Moreover, finding the best chiral column for separating a particular racemic compound is mostly a matter of trial and error that may take up to a week in some cases. In this study we have developed a novel prediction approach based on combining a random forest classifier and an optimized discretization method for dealing with enantioselectivity as a continuous variable. Using the optimization results, models were trained on data sets divided into four enantioselectivity classes. The best model performances were achieved by over-sampling the minority classes (α ≤ 1.10 and α ≥ 2.00), down-sampling the majority class (1.2 ≤ α < 2.0), and aggregating multicategory predictions into binary classifications. We tested our method on 41 chiral stationary phases using layered fingerprints as descriptors. Experimental results show that this learning methodology was successful in terms of average area under the Receiver Operating Characteristic curve, Kappa indices and F-measure for structure-based prediction of the enantioselective behavior of 34 chiral columns.
Collapse
Affiliation(s)
- Patrick Piras
- Aix Marseille Université, CNRS, Centrale Marseille, iSm2, Marseille, France
| | - Robert Sheridan
- Department of Structural Chemistry, Merck Research Laboratories, Rahway, USA
| | - Edward C Sherer
- Modeling and Informatics Process Research and Development, Merck Research Laboratories, Rahway, USA
| | - Wes Schafer
- Department of Process & Analytical Chemistry, Merck Research Laboratories, Rahway, NJ, USA
| | | | - Christian Roussel
- Aix Marseille Université, CNRS, Centrale Marseille, iSm2, Marseille, France
| |
Collapse
|
42
|
Simm J, Klambauer G, Arany A, Steijaert M, Wegner JK, Gustin E, Chupakhin V, Chong YT, Vialard J, Buijnsters P, Velter I, Vapirev A, Singh S, Carpenter AE, Wuyts R, Hochreiter S, Moreau Y, Ceulemans H. Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery. Cell Chem Biol 2018; 25:611-618.e3. [PMID: 29503208 DOI: 10.1016/j.chembiol.2018.01.015] [Citation(s) in RCA: 127] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 10/31/2017] [Accepted: 01/29/2018] [Indexed: 12/19/2022]
Abstract
In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.
Collapse
Affiliation(s)
- Jaak Simm
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria
| | - Adam Arany
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | | | - Jörg Kurt Wegner
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Emmanuel Gustin
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | | | - Yolanda T Chong
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Jorge Vialard
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Peter Buijnsters
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Ingrid Velter
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Alexander Vapirev
- Facilities for Research, KU Leuven, Willem de Croylaan 52c, Box 5580, 3001 Leuven, Belgium
| | - Shantanu Singh
- Imaging Platform, Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA 02142, USA
| | - Anne E Carpenter
- Imaging Platform, Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, MA 02142, USA
| | - Roel Wuyts
- ExaScience Life Lab, IMEC, Kapeldreef 75, 3001 Leuven, Belgium
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstrasse 69, 4040 Linz, Austria
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
| | - Hugo Ceulemans
- Janssen Pharmaceutica NV, Turnhoutseweg 30, 2340 Beerse, Belgium.
| |
Collapse
|
43
|
Kaneko H. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels. J Chem Inf Model 2018; 58:480-489. [PMID: 29425038 DOI: 10.1021/acs.jcim.7b00649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University , 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
44
|
Abstract
Various methods of machine learning, supervised and unsupervised, linear and nonlinear, classification and regression, in combination with various types of molecular descriptors, both "handcrafted" and "data-driven," are considered in the context of their use in computational toxicology. The use of multiple linear regression, variants of naïve Bayes classifier, k-nearest neighbors, support vector machine, decision trees, ensemble learning, random forest, several types of neural networks, and deep learning is the focus of attention of this review. The role of fragment descriptors, graph mining, and graph kernels is highlighted. The application of unsupervised methods, such as Kohonen's self-organizing maps and related approaches, which allow for combining predictions with data analysis and visualization, is also considered. The necessity of applying a wide range of machine learning methods in computational toxicology is underlined.
Collapse
Affiliation(s)
- Igor I Baskin
- Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russian Federation.
- Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russian Federation.
| |
Collapse
|
45
|
Polishchuk P. Interpretation of Quantitative Structure–Activity Relationship Models: Past, Present, and Future. J Chem Inf Model 2017; 57:2618-2639. [DOI: 10.1021/acs.jcim.7b00274] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Pavel Polishchuk
- Institute of Molecular and
Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hněvotínská
1333/5, 779 00 Olomouc, Czech Republic
| |
Collapse
|
46
|
Klimenko K, Lyakhov S, Shibinskaya M, Karpenko A, Marcou G, Horvath D, Zenkova M, Goncharova E, Amirkhanov R, Krysko A, Andronati S, Levandovskiy I, Polishchuk P, Kuz'min V, Varnek A. Virtual screening, synthesis and biological evaluation of DNA intercalating antiviral agents. Bioorg Med Chem Lett 2017; 27:3915-3919. [PMID: 28666733 DOI: 10.1016/j.bmcl.2017.06.035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 06/09/2017] [Accepted: 06/11/2017] [Indexed: 01/01/2023]
Abstract
This paper describes computer-aided design of new anti-viral agents against Vaccinia virus (VACV) potentially acting as nucleic acid intercalators. Earlier obtained experimental data for DNA intercalation affinities and activities against Vesicular stomatitis virus (VSV) have been used to build, respectively, pharmacophore and QSAR models. These models were used for virtual screening of a database of 245 molecules generated around typical scaffolds of known DNA intercalators. This resulted in 12 hits which then were synthesized and tested for antiviral activity against VaV together with 43 compounds earlier studied against VSV. Two compounds displaying high antiviral activity against VaV and low cytotoxicity were selected for further antiviral activity investigations.
Collapse
Affiliation(s)
- Kyrylo Klimenko
- Laboratoire de Chemoinformatique, (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France; A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Sergey Lyakhov
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Marina Shibinskaya
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Alexander Karpenko
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Gilles Marcou
- Laboratoire de Chemoinformatique, (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France
| | - Dragos Horvath
- Laboratoire de Chemoinformatique, (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France
| | - Marina Zenkova
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of Russian Academy of Sciences, 8 Lavrentiev Avenue, Novosibirsk 630090, Russia
| | - Elena Goncharova
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of Russian Academy of Sciences, 8 Lavrentiev Avenue, Novosibirsk 630090, Russia
| | - Rinat Amirkhanov
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of Russian Academy of Sciences, 8 Lavrentiev Avenue, Novosibirsk 630090, Russia
| | - Andrei Krysko
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Sergei Andronati
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Igor Levandovskiy
- Department of Organic Chemistry, Kiev Polytechnic Institute, Pr. Pobedy 37, 03056 Kiev, Ukraine
| | - Pavel Polishchuk
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine; Institute of Molecular and Translational Medicine, Palacky University Olomouc, Hněvotínská 1333/5, Olomouc 779 00, Czech Republic
| | - Victor Kuz'min
- A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Lyustdorfskaya doroga, 86, Odessa 65080, Ukraine
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, (UMR 7140 CNRS/UniStra), Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France; Federal University of Kazan, Kremlevskaya str., 18, Kazan, Russia.
| |
Collapse
|
47
|
Zhao P, Liu B, Wang C. Hepatotoxicity evaluation of traditional Chinese medicines using a computational molecular model. Clin Toxicol (Phila) 2017; 55:996-1000. [PMID: 28594241 DOI: 10.1080/15563650.2017.1333123] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
BACKGROUND Liver injury caused by traditional Chinese medicines (TCMs) is reported from many countries around the world. TCM hepatotoxicity has attracted worldwide concerns. OBJECTIVE This study aims to develop a more applicable and optimal tool to evaluate TCM hepatotoxicity. METHODS A quantitative structure-activity relationship (QSAR) analysis was performed based on published data and U.S. Food and Drug Administration's Liver Toxicity Knowledge Base (LTKB). RESULTS Eleven herbal ingredients with proven liver toxicity in the literature were added into the dataset besides chemicals from LTKB. The finally generated QSAR model yielded a sensitivity of 83.8%, a specificity of 70.1%, and an accuracy of 80.2%. Among the externally tested 20 ingredients from TCMs, 14 hepatotoxic ingredients were all accurately identified by the QSAR model derived from the dataset containing natural hepatotoxins. CONCLUSIONS Adding natural hepatotoxins into the dataset makes the QSAR model more applicable for TCM hepatotoxicity assessment, which provides a right direction in the methodology study for TCM safety evaluation. The generated QSAR model has the practical value to prioritize the hepatotoxicity risk of TCM compounds. Furthermore, an open-access international specialized database on TCM hepatotoxicity should be quickly established.
Collapse
Affiliation(s)
- Pan Zhao
- a Clinical Trial Center , Beijing 302 Hospital , Beijing , China.,b Liver Failure Therapy and Research Center, Beijing 302 Hospital , Beijing , China
| | - Bin Liu
- c Computer Technology Center, Beijing 302 Hospital , Beijing , China
| | - Chunya Wang
- d Emergency & Critical Care Center , Beijing Anzhen Hospital, Capital Medical University , Beijing , China
| | | |
Collapse
|
48
|
Muratov E, Lewis M, Fourches D, Tropsha A, Cox WC. Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance. AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION 2017; 81:46. [PMID: 28496266 PMCID: PMC5423062 DOI: 10.5688/ajpe81346] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 04/20/2016] [Indexed: 05/22/2023]
Abstract
Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools. Methods. All PharmD candidates over three admission cycles were divided into two groups: those who completed the PharmD program with a GPA ≥ 3; and the remaining candidates. Random Forest machine learning technique was used to develop a binary classification model based on 11 pre-admission parameters. Results. Robust and externally predictive models were developed that had particularly high overall accuracy of 77% for candidates with high or low academic performance. These multivariate models were highly accurate in predicting these groups to those obtained using undergraduate GPA and composite PCAT scores only. Conclusion. The models developed in this study can be used to improve the admission process as preliminary filters and thus quickly identify candidates who are likely to be successful in the PharmD curriculum.
Collapse
Affiliation(s)
- Eugene Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Margaret Lewis
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Wendy C. Cox
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
49
|
Abstract
It is widely accepted that modern QSAR began in the early 1960s. However, as long ago as 1816 scientists were making predictions about physical and chemical properties. The first investigations into the correlation of biological activities with physicochemical properties such as molecular weight and aqueous solubility began in 1841, almost 60 years before the important work of Overton and Meyer linking aquatic toxicity to lipid-water partitioning. Throughout the 20th century QSAR progressed, though there were many lean years. In 1962 came the seminal work of Corwin Hansch and co-workers, which stimulated a huge interest in the prediction of biological activities. Initially that interest lay largely within medicinal chemistry and drug design, but in the 1970s and 1980s, with increasing ecotoxicological concerns, QSAR modelling of environmental toxicities began to grow, especially once regulatory authorities became involved. Since then QSAR has continued to expand, with over 1400 publications annually from 2011 onwards.
Collapse
|
50
|
Nattee C, Khamsemanan N, Lawtrakul L, Toochinda P, Hannongbua S. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees. J Mol Graph Model 2016; 71:13-27. [PMID: 27835827 DOI: 10.1016/j.jmgm.2016.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 09/19/2016] [Accepted: 09/20/2016] [Indexed: 10/20/2022]
Abstract
Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, Ki of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted Ki values from the proposed model show a strong coefficient of determination, R2=0.996, to experimental Ki values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low Ki values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted Ki should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low Ki.
Collapse
Affiliation(s)
- Cholwich Nattee
- Sirindhorn International Institute of Technology, Thammasat University, Thailand
| | | | - Luckhana Lawtrakul
- Sirindhorn International Institute of Technology, Thammasat University, Thailand
| | - Pisanu Toochinda
- Sirindhorn International Institute of Technology, Thammasat University, Thailand
| | | |
Collapse
|