1
|
Moreira-Filho JT, Ranganath D, Conway M, Schmitt C, Kleinstreuer N, Mansouri K. Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow. J Cheminform 2024; 16:101. [PMID: 39152469 PMCID: PMC11330086 DOI: 10.1186/s13321-024-00894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 08/06/2024] [Indexed: 08/19/2024] Open
Abstract
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.
Collapse
Affiliation(s)
- José T Moreira-Filho
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| | - Dhruv Ranganath
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mike Conway
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Charles Schmitt
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Nicole Kleinstreuer
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Kamel Mansouri
- National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| |
Collapse
|
2
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
3
|
Kumar N, Acharya V. Machine intelligence-driven framework for optimized hit selection in virtual screening. J Cheminform 2022; 14:48. [PMID: 35869511 PMCID: PMC9306080 DOI: 10.1186/s13321-022-00630-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 07/05/2022] [Indexed: 11/10/2022] Open
Abstract
AbstractVirtual screening (VS) aids in prioritizing unknown bio-interactions between compounds and protein targets for empirical drug discovery. In standard VS exercise, roughly 10% of top-ranked molecules exhibit activity when examined in biochemical assays, which accounts for many false positive hits, making it an arduous task. Attempts for conquering false-hit rates were developed through either ligand-based or structure-based VS separately; however, nonetheless performed remarkably well. Here, we present an advanced VS framework—automated hit identification and optimization tool (A-HIOT)—comprises chemical space-driven stacked ensemble for identification and protein space-driven deep learning architectures for optimization of an array of specific hits for fixed protein receptors. A-HIOT implements numerous open-source algorithms intending to integrate chemical and protein space leading to a high-quality prediction. The optimized hits are the selective molecules which we retrieve after extreme refinement implying chemical space and protein space modules of A-HIOT. Using CXC chemokine receptor 4, we demonstrated the superior performance of A-HIOT for hit molecule identification and optimization with tenfold cross-validation accuracies of 94.8% and 81.9%, respectively. In comparison with other machine learning algorithms, A-HIOT achieved higher accuracies of 96.2% for hit identification and 89.9% for hit optimization on independent benchmark datasets for CXCR4 and 86.8% for hit identification and 90.2% for hit optimization on independent test dataset for androgen receptor (AR), thus, shows its generalizability and robustness. In conclusion, advantageous features impeded in A-HIOT is making a reliable approach for bridging the long-standing gap between ligand-based and structure-based VS in finding the optimized hits for the desired receptor. The complete resource (framework) code is available at https://gitlab.com/neeraj-24/A-HIOT.
Graphical Abstract
Collapse
|
4
|
Thebelt A, Wiebe J, Kronqvist J, Tsay C, Misener R. Maximizing information from chemical engineering data sets: Applications to machine learning. Chem Eng Sci 2022. [DOI: 10.1016/j.ces.2022.117469] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
5
|
Sharma SR, Singh B, Kaur M. Hybrid SFO and TLBO optimization for biodegradable classification. Soft comput 2021. [DOI: 10.1007/s00500-021-06196-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
6
|
John L, Soujanya Y, Mahanta HJ, Narahari Sastry G. Chemoinformatics and Machine Learning Approaches for Identifying Antiviral Compounds. Mol Inform 2021; 41:e2100190. [PMID: 34811938 DOI: 10.1002/minf.202100190] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 10/15/2021] [Indexed: 11/06/2022]
Abstract
Current pandemics propelled research efforts in unprecedented fashion, primarily triggering computational efforts towards new vaccine and drug development as well as drug repurposing. There is an urgent need to design novel drugs with targeted biological activity and minimum adverse reactions that may be useful to manage viral outbreaks. Hence an attempt has been made to develop Machine Learning based predictive models that can be used to assess whether a compound has the potency to be antiviral or not. To this end, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC50 value. A total 1157 two-dimensional molecular descriptors were computed among which, the most highly correlated descriptors were selected using Tree-based, Correlation-based and Mutual information-based feature selection methods. Seven Machine Learning algorithms i. e., Random Forest, XGBoost, Support Vector Machine, KNN, Decision Tree, MLP Classifier and Logistic Regression were benchmarked. The best performance was achieved by the models developed using Random Forest and XGBoost algorithms in all the feature selection methods. The maximum predictive accuracy of both these models was 88 % with internal validation. Whereas, with an external dataset, a maximum accuracy of 93.10 % for XGBoost and 100 % for Random Forest based model was achievable. Furthermore, the study demonstrated scaffold analysis of the molecules as a pragmatic approach to explore the importance of structurally diverse compounds in data driven studies.
Collapse
Affiliation(s)
- Lijo John
- Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Yarasi Soujanya
- Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR- North East Institute of Science and Technology, Jorhat, 785006, Assam, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India
| |
Collapse
|
7
|
Selvaraj C, Chandra I, Singh SK. Artificial intelligence and machine learning approaches for drug design: challenges and opportunities for the pharmaceutical industries. Mol Divers 2021; 26:1893-1913. [PMID: 34686947 PMCID: PMC8536481 DOI: 10.1007/s11030-021-10326-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 09/24/2021] [Indexed: 12/27/2022]
Abstract
The global spread of COVID-19 has raised the importance of pharmaceutical drug development as intractable and hot research. Developing new drug molecules to overcome any disease is a costly and lengthy process, but the process continues uninterrupted. The critical point to consider the drug design is to use the available data resources and to find new and novel leads. Once the drug target is identified, several interdisciplinary areas work together with artificial intelligence (AI) and machine learning (ML) methods to get enriched drugs. These AI and ML methods are applied in every step of the computer-aided drug design, and integrating these AI and ML methods results in a high success rate of hit compounds. In addition, this AI and ML integration with high-dimension data and its powerful capacity have taken a step forward. Clinical trials output prediction through the AI/ML integrated models could further decrease the clinical trials cost by also improving the success rate. Through this review, we discuss the backend of AI and ML methods in supporting the computer-aided drug design, along with its challenge and opportunity for the pharmaceutical industry. From the available information or data, the AI and ML based prediction for the high throughput virtual screening. After this integration of AI and ML, the success rate of hit identification has gained a momentum with huge success by providing novel drugs.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| | - Ishwar Chandra
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India
| | - Sanjeev Kumar Singh
- CADD and Molecular Modelling Lab, Department of Bioinformatics, Alagappa University, Science Block, Karaikudi, Tamil Nadu, 630004, India.
| |
Collapse
|
8
|
Karim A, Lee M, Balle T, Sattar A. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 2021; 13:60. [PMID: 34399849 PMCID: PMC8365955 DOI: 10.1186/s13321-021-00541-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 08/05/2021] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Ether-a-go-go-related gene (hERG) channel blockade by small molecules is a big concern during drug development in the pharmaceutical industry. Blockade of hERG channels may cause prolonged QT intervals that potentially could lead to cardiotoxicity. Various in-silico techniques including deep learning models are widely used to screen out small molecules with potential hERG related toxicity. Most of the published deep learning methods utilize a single type of features which might restrict their performance. Methods based on more than one type of features such as DeepHIT struggle with the aggregation of extracted information. DeepHIT shows better performance when evaluated against one or two accuracy metrics such as negative predictive value (NPV) and sensitivity (SEN) but struggle when evaluated against others such as Matthew correlation coefficient (MCC), accuracy (ACC), positive predictive value (PPV) and specificity (SPE). Therefore, there is a need for a method that can efficiently aggregate information gathered from models based on different chemical representations and boost hERG toxicity prediction over a range of performance metrics. RESULTS In this paper, we propose a deep learning framework based on step-wise training to predict hERG channel blocking activity of small molecules. Our approach utilizes five individual deep learning base models with their respective base features and a separate neural network to combine the outputs of the five base models. By using three external independent test sets with potency activity of IC50 at a threshold of 10 [Formula: see text]m, our method achieves better performance for a combination of classification metrics. We also investigate the effective aggregation of chemical information extracted for robust hERG activity prediction. In summary, CardioTox net can serve as a robust tool for screening small molecules for hERG channel blockade in drug discovery pipelines and performs better than previously reported methods on a range of classification metrics.
Collapse
Affiliation(s)
- Abdul Karim
- School of Information Communication Technology, Griffith University, 4111 Nathan, Brisbane, Australia
| | - Matthew Lee
- School of Information Communication Technology, Griffith University, 4111 Nathan, Brisbane, Australia
| | - Thomas Balle
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, 2006 Sydney, Australia
- Brain and Mind Centre, The University of Sydney, 2050 Sydney, Australia
| | - Abdul Sattar
- Institute of Integrated and Intelligent Systems, Griffith University, 4111 Nathan, Brisbane, Australia
| |
Collapse
|
9
|
Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers 2021; 26:1597-1607. [PMID: 34351547 DOI: 10.1007/s11030-021-10288-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 07/24/2021] [Indexed: 12/13/2022]
Abstract
Schistosomiasis is a neglected tropical disease caused by helminths of the Schistosoma genus. Despite its high morbidity and socio-economic burden, therapeutics are just a handful with praziquantel being the main drug. Praziquantel is an old drug registered for human use in 1982 and has since been administered en masse for chemotherapy, risking the development of resistance, thus the need for new drugs with different mechanisms of action. This review examines the use of machine learning (ML) in this era of big data to aid in the prediction of novel antischistosomal molecules. It first discusses the challenges of drug discovery in schistosomiasis. Explanations are then offered for big data, its characteristics and then, some open databases where large biochemical data on schistosomiasis can be obtained for ML model development are examined. The concepts of artificial intelligence, ML, and deep learning and their drug applications are explored in schistosomiasis. The use of binary classification in predicting antischistosomal compounds and some algorithms that have been applied including random forest and naive Bayesian are discussed. For this review, some deep learning algorithms (deep neural networks) are proposed as novel algorithms for predicting antischistosomal molecules via binary classification. Databases specifically designed for housing bioactivity data on antischistosomal molecules enriched with functional genomic datasets and ontologies are thus urgently needed for developing predictive ML models. This shows the application of machine learning techniques for the discovery of novel antischistosomal small molecules via binary classification in the era of big data.
Collapse
|
10
|
Gallego V, Naveiro R, Roca C, Ríos Insua D, Campillo NE. AI in drug development: a multidisciplinary perspective. Mol Divers 2021; 25:1461-1479. [PMID: 34251580 PMCID: PMC8342381 DOI: 10.1007/s11030-021-10266-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 06/29/2021] [Indexed: 01/09/2023]
Abstract
The introduction of a new drug to the commercial market follows a complex and long process that typically spans over several years and entails large monetary costs due to a high attrition rate. Because of this, there is an urgent need to improve this process using innovative technologies such as artificial intelligence (AI). Different AI tools are being applied to support all four steps of the drug development process (basic research for drug discovery; pre-clinical phase; clinical phase; and postmarketing). Some of the main tasks where AI has proven useful include identifying molecular targets, searching for hit and lead compounds, synthesising drug-like compounds and predicting ADME-Tox. This review, on the one hand, brings in a mathematical vision of some of the key AI methods used in drug development closer to medicinal chemists and, on the other hand, brings the drug development process and the use of different models closer to mathematicians. Emphasis is placed on two aspects not mentioned in similar surveys, namely, Bayesian approaches and their applications to molecular modelling and the eventual final use of the methods to actually support decisions. Promoting a perfect synergy.
Collapse
Affiliation(s)
- Víctor Gallego
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Roi Naveiro
- Institute of Mathematical Sciences (ICMAT-CSIC), Nicolás Cabrera 13-15, 28049, Madrid, Spain
| | - Carlos Roca
- AItenea Biotech S.L. Parque Científico de Madrid, Faraday, 7, 28049, Madrid, Spain
| | - David Ríos Insua
- ICMAT-CSIC and Dept. of Statistics and OR, U. Compl. Madrid, Madrid, Spain
| | - Nuria E Campillo
- CIB-Margarita Salas (CSIC), Ramiro de Maeztu, 9, 28040, Madrid, Spain.
| |
Collapse
|
11
|
Bouhedjar K, Boukelia A, Khorief Nacereddine A, Boucheham A, Belaidi A, Djerourou A. A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure-activity relationship modeling. Chem Biol Drug Des 2021; 96:961-972. [PMID: 33058460 DOI: 10.1111/cbdd.13742] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 05/27/2020] [Accepted: 05/31/2020] [Indexed: 12/15/2022]
Abstract
Over the past decade, rapid development in biological and chemical technologies such as high-throughput screening, parallel synthesis, has been significantly increased the amount of data, which requires the creation and the integration of new analytical methods, especially deep learning models. Recently, there is an increasing interest in deep learning utilization in computer-aided drug discovery due to its exceptional successful application in many fields. The present work proposed a natural language processing approach, based on embedding deep neural networks. Our method aims to transform the Simplified Molecular Input Line Entry System format into word embedding vectors to represent the semantics of compounds. These vectors are fed into supervised machine learning algorithms such as convolutional long short-term memory neural network, support vector machine, and random forest to build up quantitative structure-activity relationship models on toxicity data sets. The obtained results on toxicity data to the ciliate Tetrahymena pyriformis (IGC50 ), and acute toxicity rat data expressed as median lethal dose of treated rats (LD50 ) show that our approach can eventually be used to predict the activities of chemical compounds efficiently. All material used in this study is available online through the GitHub portal (https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git).
Collapse
Affiliation(s)
- Khalid Bouhedjar
- Laboratoire de Synthèse et Biocatalyse Organique, Département de Chimie, Faculté des Sciences, Université Badji Mokhtar Annaba, Annaba, Algeria.,Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria
| | - Abdelbasset Boukelia
- Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria.,Computer Science Department, Faculty of NTIC University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | - Abdelmalek Khorief Nacereddine
- Laboratory of Physical Chemistry and Biology of Materials, Department of Physics and Chemistry, Higher Normal School of Technological Education-Skikda, Skikda, Algeria
| | - Anouar Boucheham
- University Salah Boubnider Constantine, Constantine, Algeria.,Laboratory of Molecular and Cellular Biology, Constantine, Algeria
| | - Amine Belaidi
- Laboratoire Bioinformatique, Centre de Recherche en Biotechnologie (CRBt), Constantine, Algeria
| | - Abdelhafid Djerourou
- Laboratoire de Synthèse et Biocatalyse Organique, Département de Chimie, Faculté des Sciences, Université Badji Mokhtar Annaba, Annaba, Algeria
| |
Collapse
|
12
|
Abstract
Quantitative Structure–Activity Relationship (QSAR) aims to correlate molecular structure properties with corresponding bioactivity. Chance correlations and multicollinearity are two major problems often encountered when generating QSAR models. Feature selection can significantly improve the accuracy and interpretability of QSAR by removing redundant or irrelevant molecular descriptors. An artificial bee colony algorithm (ABC) that mimics the foraging behaviors of honey bee colony was originally proposed for continuous optimization problems. It has been applied to feature selection for classification but seldom for regression analysis and prediction. In this paper, a binary ABC algorithm is used to select features (molecular descriptors) in QSAR. Furthermore, we propose an improved ABC-based algorithm for feature selection in QSAR, namely ABC-PLS-1. Crossover and mutation operators are introduced to employed bee and onlooker bee phase to modify several dimensions of each solution, which not only saves the process of converting continuous values into discrete values, but also reduces the computational resources. In addition, a novel greedy selection strategy which selects the feature subsets with higher accuracy and fewer features helps the algorithm to converge fast. Three QSAR datasets are used for the evaluation of the proposed algorithm. Experimental results show that ABC-PLS-1 outperforms PSO-PLS, WS-PSO-PLS, and BFDE-PLS in accuracy, root mean square error, and the number of selected features. Moreover, we also study whether to implement scout bee phase when tracking regression problems and drawing such an interesting conclusion that the scout bee phase is redundant when dealing with the feature selection in low-dimensional and medium-dimensional regression problems.
Collapse
|
13
|
iResponse: An AI and IoT-Enabled Framework for Autonomous COVID-19 Pandemic Management. SUSTAINABILITY 2021. [DOI: 10.3390/su13073797] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
SARS-CoV-2, a tiny virus, is severely affecting the social, economic, and environmental sustainability of our planet, causing infections and deaths (2,674,151 deaths, as of 17 March 2021), relationship breakdowns, depression, economic downturn, riots, and much more. The lessons that have been learned from good practices by various countries include containing the virus rapidly; enforcing containment measures; growing COVID-19 testing capability; discovering cures; providing stimulus packages to the affected; easing monetary policies; developing new pandemic-related industries; support plans for controlling unemployment; and overcoming inequalities. Coordination and multi-term planning have been found to be the key among the successful national and global endeavors to fight the pandemic. The current research and practice have mainly focused on specific aspects of COVID-19 response. There is a need to automate the learning process such that we can learn from good and bad practices during pandemics and normal times. To this end, this paper proposes a technology-driven framework, iResponse, for coordinated and autonomous pandemic management, allowing pandemic-related monitoring and policy enforcement, resource planning and provisioning, and data-driven planning and decision-making. The framework consists of five modules: Monitoring and Break-the-Chain, Cure Development and Treatment, Resource Planner, Data Analytics and Decision Making, and Data Storage and Management. All modules collaborate dynamically to make coordinated and informed decisions. We provide the technical system architecture of a system based on the proposed iResponse framework along with the design details of each of its five components. The challenges related to the design of the individual modules and the whole system are discussed. We provide six case studies in the paper to elaborate on the different functionalities of the iResponse framework and how the framework can be implemented. These include a sentiment analysis case study, a case study on the recognition of human activities, and four case studies using deep learning and other data-driven methods to show how to develop sustainability-related optimal strategies for pandemic management using seven real-world datasets. A number of important findings are extracted from these case studies.
Collapse
|
14
|
Huang DZ, Baber JC, Bahmanyar SS. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction. Expert Opin Drug Discov 2021; 16:1045-1056. [PMID: 33739897 DOI: 10.1080/17460441.2021.1901685] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) has seen a massive resurgence in recent years with wide successes in computer vision, natural language processing, and games. The similar creation of robust and accurate AI models for ADME/Tox endpoint and activity prediction would be revolutionary to drug discovery pipelines. There have been numerous demonstrations of successful applications, but a key challenge remains: how generalizable are these predictive models? AREAS COVERED The authors present a summary of current promising components of AI models in the context of early drug discovery where ADME/Tox endpoint and activity prediction is the main driver of the iterative drug design process. Following that is a review of applicability domains and dataset construction considerations which determine generalizability bottlenecks for AI deployment. Further reviewed is the role of promising learning frameworks - multitask, transfer, and meta learning - which leverage auxiliary data to overcome issues of generalizability. EXPERT OPINION The authors conclude that the most promising direction toward integrating reliable and informative AI models into the drug discovery pipeline is a conjunction of learned feature representations, deep learning, and novel learning frameworks. Such a solution would address the sparse and incomplete datasets that are available for key endpoints related to drug discovery.
Collapse
Affiliation(s)
| | - J Christian Baber
- Scientific Informatics, Global Head of Scientific Informatics, Scientific Informatics, Takeda Pharmaceuticals, Cambridge, MA, USA
| | - Sogole Sami Bahmanyar
- Computational Chemistry, Director of Computational Sciences, Computational Chemistry, Takeda Pharmaceuticals, San Diego, USA
| |
Collapse
|
15
|
Kim H, Kim E, Lee I, Bae B, Park M, Nam H. Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches. BIOTECHNOL BIOPROC E 2021; 25:895-930. [PMID: 33437151 PMCID: PMC7790479 DOI: 10.1007/s12257-020-0049-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 05/27/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023]
Abstract
As expenditure on drug development increases exponentially, the overall drug discovery process requires a sustainable revolution. Since artificial intelligence (AI) is leading the fourth industrial revolution, AI can be considered as a viable solution for unstable drug research and development. Generally, AI is applied to fields with sufficient data such as computer vision and natural language processing, but there are many efforts to revolutionize the existing drug discovery process by applying AI. This review provides a comprehensive, organized summary of the recent research trends in AI-guided drug discovery process including target identification, hit identification, ADMET prediction, lead optimization, and drug repositioning. The main data sources in each field are also summarized in this review. In addition, an in-depth analysis of the remaining challenges and limitations will be provided, and proposals for promising future directions in each of the aforementioned areas.
Collapse
Affiliation(s)
- Hyunho Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Eunyoung Kim
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Bongsung Bae
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Minsu Park
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju, 61005 Korea
| |
Collapse
|
16
|
de Albuquerque S, Cianni L, de Vita D, Duque C, Gomes ASM, Gomes P, Laughton C, Leitão A, Montanari CA, Montanari R, Ribeiro JFR, da Silva JS, Teixeira C. Molecular design aided by random forests and synthesis of potent trypanocidal agents as cruzain inhibitors for Chagas disease treatment. Chem Biol Drug Des 2020; 96:948-960. [PMID: 33058457 DOI: 10.1111/cbdd.13663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/13/2019] [Accepted: 12/23/2019] [Indexed: 11/30/2022]
Abstract
Cruzain is an established target for the identification of novel trypanocidal agents, but how good are in vitro/in vivo correlations? This work describes the development of a random forests model for the prediction of the bioavailability of cruzain inhibitors that are Trypanosoma cruzi killers. Some common properties that characterize drug-likeness are poorly represented in many established cruzain inhibitors. This correlates with the evidence that many high-affinity cruzain inhibitors are not trypanocidal agents against T. cruzi. On the other hand, T. cruzi killers that present typical drug-like characteristics are likely to show better trypanocidal action than those without such features. The random forests model was not outperformed by other machine learning methods (such as artificial neural networks and support vector machines), and it was validated with the synthesis of two new trypanocidal agents. Specifically, we report a new lead compound, Neq0565, which was tested on T. cruzi Tulahuen (β-galactosidase) with a pEC50 of 4.9. It is inactive in the host cell line showing a selectivity index (SI = EC50 cyto /EC50 T. cruzi ) higher than 50.
Collapse
Affiliation(s)
- Sérgio de Albuquerque
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Lorenzo Cianni
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Daniela de Vita
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Carla Duque
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Ana S M Gomes
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Paula Gomes
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Charles Laughton
- School of Pharmacy and Centre for Biomolecular Sciences, University of Nottingham, Nottingham, UK
| | - Andrei Leitão
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Carlos A Montanari
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - Raphael Montanari
- Centro de Robótica de São Carlos, EESC-ICMC, Universidade de São Paulo, São Paulo, Brazil
| | - Jean F R Ribeiro
- Grupo de Química Medicinal, Instituto de Química de São Carlos, Universidade de São Paulo, São Carlos/SP, Brazil
| | - João Santana da Silva
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil
| | - Cátia Teixeira
- LAQV-REQUIMTE, Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| |
Collapse
|
17
|
Hajibabaei M, Shafiei F, Abdoli‐Senejani M. Quantitative modeling for prediction of thermodynamic properties of some pyridine derivatives using molecular descriptors and genetic algorithm‐multiple linear regressions. J CHIN CHEM SOC-TAIP 2020. [DOI: 10.1002/jccs.201900283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Maryam Hajibabaei
- Department of Chemistry, Arak BranchIslamic Azad University Arak Iran
| | - Fatemeh Shafiei
- Department of Chemistry, Arak BranchIslamic Azad University Arak Iran
| | | |
Collapse
|
18
|
Keyvanpour MR, Shirzad MB. An Analysis of QSAR Research Based on Machine Learning Concepts. Curr Drug Discov Technol 2020; 18:17-30. [PMID: 32178612 DOI: 10.2174/1570163817666200316104404] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 08/22/2019] [Accepted: 10/28/2019] [Indexed: 11/22/2022]
Abstract
Quantitative Structure-Activity Relationship (QSAR) is a popular approach developed to correlate chemical molecules with their biological activities based on their chemical structures. Machine learning techniques have proved to be promising solutions to QSAR modeling. Due to the significant role of machine learning strategies in QSAR modeling, this area of research has attracted much attention from researchers. A considerable amount of literature has been published on machine learning based QSAR modeling methodologies whilst this domain still suffers from lack of a recent and comprehensive analysis of these algorithms. This study systematically reviews the application of machine learning algorithms in QSAR, aiming to provide an analytical framework. For this purpose, we present a framework called 'ML-QSAR'. This framework has been designed for future research to: a) facilitate the selection of proper strategies among existing algorithms according to the application area requirements, b) help to develop and ameliorate current methods and c) providing a platform to study existing methodologies comparatively. In ML-QSAR, first a structured categorization is depicted which studied the QSAR modeling research based on machine models. Then several criteria are introduced in order to assess the models. Finally, inspired by aforementioned criteria the qualitative analysis is carried out.
Collapse
Affiliation(s)
| | - Mehrnoush Barani Shirzad
- Data Mining Research Laboratory, Department of Computer Engineering, Alzahra University, Tehran, Iran
| |
Collapse
|
19
|
Cravero F, Schustik SA, Martínez MJ, Vázquez GE, Díaz MF, Ponzoni I. Feature Selection for Polymer Informatics: Evaluating Scalability and Robustness of the FS4RV DD Algorithm Using Synthetic Polydisperse Data Sets. J Chem Inf Model 2020; 60:592-603. [PMID: 31790226 DOI: 10.1021/acs.jcim.9b00867] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The feature selection (FS) process is a key step in the Quantitative Structure-Property Relationship (QSPR) modeling of physicochemical properties in cheminformatics. In particular, the inference of QSPR models for polymeric material properties constitutes a complex problem because of the uncertainty introduced by the polydispersity of these materials. The main challenge is how to capture the polydispersity information from the molecular weight distribution (MWD) curve to achieve a more effective computational representation of polymeric materials. To date, most of the existing QSPR techniques use only a single molecule to represent each of these materials, but polydispersity is not considered. Consequently, QSPR models obtained by these approaches are being oversimplified. For this reason, we introduced in a previous work a new FS algorithm called Feature Selection for Random Variables with Discrete Distribution (FS4RVDD), which allows dealing with polydisperse data. In the present paper, we evaluate both the scalability and the robustness of the FS4RVDD algorithm. In this sense, we generated synthetic data by varying and combining different parameters: the size of the database, the cardinality of the selected feature subsets, the presence of noise in the data, and the type of correlation (linear and nonlinear). Moreover, the performances obtained by FS4RVDD were contrasted with traditional FS techniques applied to different simplified representations of polymeric materials. The obtained results show that the FS4RVDD algorithm outperformed the traditional FS methods in all proposed scenarios, which suggest the need of an algorithm such as FS4RVDD to deal with the uncertainty that polydispersity introduces in human-made polymers.
Collapse
Affiliation(s)
- Fiorella Cravero
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina
| | - Santiago A Schustik
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina.,Comisión de Investigaciones Científicas de la Provincia de Buenos Aires , (CIC) , CP 1900 La Plata , Argentina
| | - M Jimena Martínez
- Instituto de Ciencias e Ingeniería de la Computación , (UNS-CONICET) , San Andrés 800, Campus de Palihue , CP 8000 Bahía Blanca , Argentina
| | - Gustavo E Vázquez
- Facultad de Ingeniería y Tecnologías , Universidad Católica del Uruguay , Av. 8 de Octubre 2788 , CP 11600 Montevideo , Uruguay
| | - Mónica F Díaz
- Planta Piloto de Ingeniería Química , Universidad Nacional del Sur - CONICET , Camino La Carrindanga 7000 , CP 8000 Bahía Blanca , Argentina.,Departamento de Ingeniería Química , (DIQ-UNS) , CP 8000 Bahía Blanca , Argentina
| | - Ignacio Ponzoni
- Instituto de Ciencias e Ingeniería de la Computación , (UNS-CONICET) , San Andrés 800, Campus de Palihue , CP 8000 Bahía Blanca , Argentina.,Departamento de Ciencias e Ingeniería de la Computación , (DCIC-UNS) , CP 8000 Bahía Blanca , Argentina
| |
Collapse
|
20
|
Nuñez-Borque E, González-Naranjo P, Bartolomé F, Alquézar C, Reinares-Sebastián A, Pérez C, Ceballos ML, Páez JA, Campillo NE, Martín-Requero Á. Targeting Cannabinoid Receptor Activation and BACE-1 Activity Counteracts TgAPP Mice Memory Impairment and Alzheimer's Disease Lymphoblast Alterations. Mol Neurobiol 2020; 57:1938-1951. [PMID: 31898159 DOI: 10.1007/s12035-019-01813-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 10/02/2019] [Indexed: 12/20/2022]
Abstract
Alzheimer's disease (AD), the leading cause of dementia in the elderly, is a neurodegenerative disorder marked by progressive impairment of cognitive ability. Patients with AD display neuropathological lesions including senile plaques, neurofibrillary tangles, and neuronal loss. There are no disease-modifying drugs currently available. With the number of affected individuals increasing dramatically throughout the world, there is obvious urgent need for effective treatment strategy for AD. The multifactorial nature of AD encouraged the development of multifunctional compounds, able to interact with several putative targets. Here, we have evaluated the effects of two in-house designed cannabinoid receptors (CB) agonists showing inhibitory actions on β-secretase-1 (BACE-1) (NP137) and BACE-1/butyrylcholinesterase (BuChE) (NP148), on cellular models of AD, including immortalized lymphocytes from late-onset AD patients. Furthermore, the performance of TgAPP mice in a spatial navigation task was investigated following chronic administration of NP137 and NP148. We report here that NP137 and NP148 showed neuroprotective effects in amyloid-β-treated primary cortical neurons, and NP137 in particular rescued the cognitive deficit of TgAPP mice. The latter compound was able to blunt the abnormal cell response to serum addition or withdrawal of lymphoblasts derived from AD patients. It is suggested that NP137 could be a good drug candidate for future treatment of AD.
Collapse
Affiliation(s)
- Emilio Nuñez-Borque
- Centro de Investigaciones Biológicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain
| | | | - Fernando Bartolomé
- Neurodegenerative Disorders Group, Instituto de Investigación Hospital 12 de Octubre, Madrid, Spain.,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
| | - Carolina Alquézar
- Centro de Investigaciones Biológicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain.,Department of Neurology, Memory and Aging Center, University of California, Box 1207, San Francisco, CA, 94158, USA
| | | | | | - Maria L Ceballos
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain.,Insituto Cajal (CSIC), Madrid, Spain
| | - Juan A Páez
- Instituto de Química Médica (CSIC), Madrid, Spain
| | - Nuria E Campillo
- Centro de Investigaciones Biológicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain.
| | - Ángeles Martín-Requero
- Centro de Investigaciones Biológicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain. .,Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain.
| |
Collapse
|
21
|
Cerruela-García G, Pérez-Parra Toledano J, de Haro-García A, García-Pedrajas N. Influence of feature rankers in the construction of molecular activity prediction models. J Comput Aided Mol Des 2020; 34:305-325. [PMID: 31893338 DOI: 10.1007/s10822-019-00273-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 12/20/2019] [Indexed: 02/07/2023]
Abstract
In the construction of activity prediction models, the use of feature ranking methods is a useful mechanism for extracting information for ranking features in terms of their significance to develop predictive models. This paper studies the influence of feature rankers in the construction of molecular activity prediction models; for this purpose, a comparative study of fourteen rankings methods for feature selection was conducted. The activity prediction models were constructed using four well-known classifiers and a wide collection of datasets. The ranking algorithms were compared considering the performance of these classifiers using different metrics and the consistency of the ranked features.
Collapse
Affiliation(s)
- Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain.
| | - José Pérez-Parra Toledano
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| | - Aída de Haro-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, 14071, Córdoba, Spain
| |
Collapse
|
22
|
Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
23
|
Park K, Ko YJ, Durai P, Pan CH. Machine learning-based chemical binding similarity using evolutionary relationships of target genes. Nucleic Acids Res 2019; 47:e128. [PMID: 31504818 PMCID: PMC6846180 DOI: 10.1093/nar/gkz743] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 08/13/2019] [Accepted: 08/20/2019] [Indexed: 12/21/2022] Open
Abstract
Chemical similarity searching is a basic research tool that can be used to find small molecules which are similar in shape to known active molecules. Despite its popularity, the retrieval of local molecular features that are critical to functional activity related to target binding often fails. To overcome this limitation, we developed a novel machine learning-based chemical binding similarity score by using various evolutionary relationships of binding targets. The chemical similarity was defined by the probability of chemical compounds binding to identical targets. Comprehensive and heterogeneous multiple target-binding chemical data were integrated into a paired data format and processed using multiple classification similarity-learning models with various levels of target evolutionary information. Encoding evolutionary information to chemical compounds through their binding targets substantially expanded available chemical-target interaction data and significantly improved model performance. The output probability of our integrated model, referred to as ensemble evolutionary chemical binding similarity (ensECBS), was effective for finding hidden chemical relationships. The developed method can serve as a novel chemical similarity tool that uses evolutionarily conserved target binding information.
Collapse
Affiliation(s)
- Keunwan Park
- Natural Product Informatics Research Center, KIST Gangneung Institute of Natural Products, Gangneung 25451, Republic of Korea
| | - Young-Joon Ko
- Natural Product Informatics Research Center, KIST Gangneung Institute of Natural Products, Gangneung 25451, Republic of Korea
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 06978, Republic of Korea
| | - Prasannavenkatesh Durai
- Natural Product Informatics Research Center, KIST Gangneung Institute of Natural Products, Gangneung 25451, Republic of Korea
| | - Cheol-Ho Pan
- Natural Product Informatics Research Center, KIST Gangneung Institute of Natural Products, Gangneung 25451, Republic of Korea
| |
Collapse
|
24
|
Kwon S, Bae H, Jo J, Yoon S. Comprehensive ensemble in QSAR prediction for drug discovery. BMC Bioinformatics 2019; 20:521. [PMID: 31655545 PMCID: PMC6815455 DOI: 10.1186/s12859-019-3135-4] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/09/2019] [Indexed: 12/04/2022] Open
Abstract
Background Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. Results The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http://data.snu.ac.kr/QSAR/. Conclusions We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.
Collapse
Affiliation(s)
- Sunyoung Kwon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, South Korea.,Clova AI Research, NAVER Corp., Seongnam, 13561, South Korea
| | - Ho Bae
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, South Korea
| | - Jeonghee Jo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, South Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, South Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, South Korea. .,Biological Sciences, Seoul National University, Seoul, 08826, South Korea. .,ASRI and INMC, Seoul National University, Seoul, 08826, South Korea. .,Institute of Engineering Research, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
25
|
Matsuzaka Y, Uesawa Y. Prediction Model with High-Performance Constitutive Androstane Receptor (CAR) Using DeepSnap-Deep Learning Approach from the Tox21 10K Compound Library. Int J Mol Sci 2019; 20:ijms20194855. [PMID: 31574921 PMCID: PMC6801383 DOI: 10.3390/ijms20194855] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 09/23/2019] [Accepted: 09/27/2019] [Indexed: 12/30/2022] Open
Abstract
The constitutive androstane receptor (CAR) plays pivotal roles in drug-induced liver injury through the transcriptional regulation of drug-metabolizing enzymes and transporters. Thus, identifying regulatory factors for CAR activation is important for understanding its mechanisms. Numerous studies conducted previously on CAR activation and its toxicity focused on in vivo or in vitro analyses, which are expensive, time consuming, and require many animals. We developed a computational model that predicts agonists for the CAR using the Toxicology in the 21st Century 10k library. Additionally, we evaluate the prediction performance of novel deep learning (DL)-based quantitative structure-activity relationship analysis called the DeepSnap-DL approach, which is a procedure of generating an omnidirectional snapshot portraying three-dimensional (3D) structures of chemical compounds. The CAR prediction model, which applies a 3D structure generator tool, called CORINA-generated and -optimized chemical structures, in the DeepSnap-DL demonstrated better performance than the existing methods using molecular descriptors. These results indicate that high performance in the prediction model using the DeepSnap-DL approach may be important to prepare suitable 3D chemical structures as input data and to enable the identification of modulators of the CAR.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo 204-8588, Japan.
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo 204-8588, Japan.
| |
Collapse
|
26
|
Dadfar E, Shafiei F. Prediction of some thermodynamic properties of sulfonamide drugs using genetic algorithm‐multiple linear regressions. J CHIN CHEM SOC-TAIP 2019. [DOI: 10.1002/jccs.201900232] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Etratsadat Dadfar
- Department of ChemistryArak Branch, Islamic Azad University Arak Iran
| | - Fatemeh Shafiei
- Department of ChemistryArak Branch, Islamic Azad University Arak Iran
| |
Collapse
|
27
|
Ahmadinejad N, Shafiei F. Quantitative Structure-Activity Relationship Study of Camptothecin Derivatives as Anticancer Drugs Using Molecular Descriptors. Comb Chem High Throughput Screen 2019; 22:387-399. [DOI: 10.2174/1386207322666190708112251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 05/15/2019] [Accepted: 06/19/2019] [Indexed: 12/12/2022]
Abstract
Aim and Objective:A Quantitative Structure-Activity Relationship (QSAR) has been widely developed to derive a correlation between chemical structures of molecules to their known activities. In the present investigation, QSAR models have been carried out on 76 Camptothecin (CPT) derivatives as anticancer drugs to develop a robust model for the prediction of physicochemical properties.Materials and Methods:A training set of 60 structurally diverse CPT derivatives was used to construct QSAR models for the prediction of physiochemical parameters such as Van der Waals surface area (SvdW), Van der Waals Volume (VvdW), Molar Refractivity (MR) and Polarizability (α). The QSAR models were optimized using Multiple Linear Regression (MLR) analysis. A test set of 16 compounds was evaluated using the defined models.:The Genetic Algorithm And Multiple Linear Regression Analysis (GA-MLR) were used to select the descriptors derived from the Dragon software to generate the correlation models that relate the structural features to the studied properties.Results:QSAR models were used to delineate the important descriptors responsible for the properties of the CPT derivatives. The statistically significant QSAR models derived by GA-MLR analysis were validated by Leave-One-Out Cross-Validation (LOOCV) and test set validation methods. The multicollinearity and autocorrelation properties of the descriptors contributed in the models were tested by calculating the Variance Inflation Factor (VIF) and the Durbin–Watson (DW) statistics.Conclusion:The predictive ability of the models was found to be satisfactory. Thus, QSAR models derived from this study may be helpful for modeling and designing some new CPT derivatives and for predicting their activity.
Collapse
Affiliation(s)
- Neda Ahmadinejad
- Department of Chemistry, Arak Branch, Islamic Azad University, Arak, Iran
| | - Fatemeh Shafiei
- Department of Chemistry, Arak Branch, Islamic Azad University, Arak, Iran
| |
Collapse
|
28
|
QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer's Disease. Sci Rep 2019; 9:9102. [PMID: 31235739 PMCID: PMC6591229 DOI: 10.1038/s41598-019-45522-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 05/30/2019] [Indexed: 12/27/2022] Open
Abstract
Alzheimer’s disease is one of the most common neurodegenerative disorders in elder population. The β-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.
Collapse
|
29
|
Mikolajczyk A, Sizochenko N, Mulkiewicz E, Malankowska A, Rasulev B, Puzyn T. A chemoinformatics approach for the characterization of hybrid nanomaterials: safer and efficient design perspective. NANOSCALE 2019; 11:11808-11818. [PMID: 31184677 DOI: 10.1039/c9nr01162e] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
In this study, photocatalytic properties and in vitro cytotoxicity of 29 TiO2-based multi-component nanomaterials (i.e., hybrids of more than two composition types of nanoparticles) were evaluated using a combination of the experimental testing and supervised machine learning modeling. TiO2-based multi-component nanomaterials with metal clusters of silver, and their mixtures with gold, palladium, and platinum were successfully synthesized. Two activities, photocatalytic activity and cytotoxicity, were studied. A novel cheminformatic approach was developed and applied for the computational representation of the photocatalytic activity and cytotoxicity effect. In this approach, features of investigated TiO2-based hybrid nanomaterials were reflected by a series of novel additive descriptors for hybrid and hybrid nanostructures (denoted as "hybrid nanosctructure descriptors"). These descriptors are based on quantum chemical calculations and the Smoluchowski equation. The obtained experimental data and calculated hybrid-nanostructure descriptors were used to develop novel predictive Quantitative Structure-Activity Relationship computational models (called "nano-QSARmix"). The proposed modeling approach is an initial step in the understanding of the relationships between physicochemical properties of hybrid nanoparticles, their toxicity, and photochemical activity under UV-vis irradiation. Acquired knowledge supports the safe-by-design approaches relevant to the development of efficient hybrid nanomaterials with reduced hazardous effects.
Collapse
Affiliation(s)
- Alicja Mikolajczyk
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
| | | | | | | | | | | |
Collapse
|
30
|
Martínez MJ, Razuc M, Ponzoni I. MoDeSuS: A Machine Learning Tool for Selection of Molecular Descriptors in QSAR Studies Applied to Molecular Informatics. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2905203. [PMID: 30906770 PMCID: PMC6398071 DOI: 10.1155/2019/2905203] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Revised: 01/10/2019] [Accepted: 01/19/2019] [Indexed: 01/15/2023]
Abstract
The selection of the most relevant molecular descriptors to describe a target variable in the context of QSAR (Quantitative Structure-Activity Relationship) modelling is a challenging combinatorial optimization problem. In this paper, a novel software tool for addressing this task in the context of regression and classification modelling is presented. The methodology that implements the tool is organized into two phases. The first phase uses a multiobjective evolutionary technique to perform the selection of subsets of descriptors. The second phase performs an external validation of the chosen descriptors subsets in order to improve reliability. The tool functionalities have been illustrated through a case study for the estimation of the ready biodegradation property as an example of classification QSAR modelling. The results obtained show the usefulness and potential of this novel software tool that aims to reduce the time and costs of development in the drug discovery process.
Collapse
Affiliation(s)
- María Jimena Martínez
- Instituto de Ciencias e Ingeniería de la Computación (UNS-CONICET), Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur (UNS), CP 8000, Bahía Blanca, Argentina
| | - Marina Razuc
- Instituto de Ciencias e Ingeniería de la Computación (UNS-CONICET), Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur (UNS), CP 8000, Bahía Blanca, Argentina
- Comisión de Investigaciones Científicas de la Provincia de Buenos Aires (CIC), Calle 526 between 10 and 11, CP 1900, La Plata, Argentina
| | - Ignacio Ponzoni
- Instituto de Ciencias e Ingeniería de la Computación (UNS-CONICET), Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur (UNS), CP 8000, Bahía Blanca, Argentina
| |
Collapse
|
31
|
Sebastián-Pérez V, Martínez MJ, Gil C, Campillo NE, Martínez A, Ponzoni I. QSAR Modelling to Identify LRRK2 Inhibitors for Parkinson's Disease. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0063/jib-2018-0063.xml. [PMID: 30763264 PMCID: PMC6798859 DOI: 10.1515/jib-2018-0063] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/14/2019] [Indexed: 01/09/2023] Open
Abstract
Parkinson’s disease is one of the most common neurodegenerative illnesses in older persons and the leucine-rich repeat kinase 2 (LRRK2) is an auspicious target for its pharmacological treatment. In this work, quantitative structure–activity relationship (QSAR) models for identification of putative inhibitors of LRRK2 protein are developed by using an in-house chemical library and several machine learning techniques. The methodology applied in this paper has two steps: first, alternative subsets of molecular descriptors useful for characterizing LRRK2 inhibitors are chosen by a multi-objective feature selection method; secondly, QSAR models are learned by using these subsets and three different strategies for supervised learning. The qualities of all these QSAR models are compared by classical metrics and the best models are discussed in statistical and physicochemical terms.
Collapse
Affiliation(s)
- Víctor Sebastián-Pérez
- Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - María Jimena Martínez
- Instituto de Ciencias e Ingeniería de la Computación (UNS-CONICET), Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur (UNS), Bahía Blanca, Argentina
| | - Carmen Gil
- Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Nuria Eugenia Campillo
- Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Ana Martínez
- Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, 28040 Madrid, Spain
| | - Ignacio Ponzoni
- Instituto de Ciencias e Ingeniería de la Computación (UNS-CONICET), Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur (UNS), Bahía Blanca, Argentina
| |
Collapse
|
32
|
Simões RS, Maltarollo VG, Oliveira PR, Honorio KM. Transfer and Multi-task Learning in QSAR Modeling: Advances and Challenges. Front Pharmacol 2018; 9:74. [PMID: 29467659 PMCID: PMC5807924 DOI: 10.3389/fphar.2018.00074] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 01/22/2018] [Indexed: 12/11/2022] Open
Abstract
Medicinal chemistry projects involve some steps aiming to develop a new drug, such as the analysis of biological targets related to a given disease, the discovery and the development of drug candidates for these targets, performing parallel biological tests to validate the drug effectiveness and side effects. Approaches as quantitative study of activity-structure relationships (QSAR) involve the construction of predictive models that relate a set of descriptors of a chemical compound series and its biological activities with respect to one or more targets in the human body. Datasets used to perform QSAR analyses are generally characterized by a small number of samples and this makes them more complex to build accurate predictive models. In this context, transfer and multi-task learning techniques are very suitable since they take information from other QSAR models to the same biological target, reducing efforts and costs for generating new chemical compounds. Therefore, this review will present the main features of transfer and multi-task learning studies, as well as some applications and its potentiality in drug design projects.
Collapse
Affiliation(s)
- Rodolfo S Simões
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Vinicius G Maltarollo
- Department of Pharmaceutical Products, Faculty of Pharmacy, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Patricia R Oliveira
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
| | - Kathia M Honorio
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil.,Center for Natural and Human Sciences, Federal University of ABC, Santo André, Brazil
| |
Collapse
|