1
|
Boonyarit B, Yamprasert N, Kaewnuratchadasorn P, Kinchagawat J, Prommin C, Rungrotmongkol T, Nutanong S. GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity. J Comput Chem 2024; 45:2001-2023. [PMID: 38713612 DOI: 10.1002/jcc.27388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/09/2024]
Abstract
The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.
Collapse
Affiliation(s)
- Bundit Boonyarit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Nattawin Yamprasert
- School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
| | | | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Chanatkran Prommin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| | - Thanyada Rungrotmongkol
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
| |
Collapse
|
2
|
Singh K, Nainwal N, Chitme HR. A review on recent advancements in pharmaceutical technology transfer of tablets from an Indian perspective. ANNALES PHARMACEUTIQUES FRANÇAISES 2024:S0003-4509(24)00108-1. [PMID: 39127322 DOI: 10.1016/j.pharma.2024.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 06/25/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
OBJECTIVE The healthcare sector is a paramount and rapidly expanding industry in India. The pharmaceutical field in India has experienced substantial growth and transformation in recent times, making significant contributions to the global healthcare market. This comprehensive review delves into the most recent innovations in pharmaceutical technology transfer (TT), particularly in the context of tablet formulations from an Indian standpoint. SIGNIFICANCE The pharmaceutical sector has grappled with various challenging issues, including the escalating costs of medications and the demand for patient-friendly products. METHODS In this technological progress era, various cutting-edge pharmaceutical technologies, such as artificial intelligence (AI), and 3D and 4D printing, play pivotal roles in drug development. Tablets, the most promising and widely utilized dosage form worldwide, require a sophisticated approach to TT. Achieving a successful TT necessitates a dedicated team with well-defined objectives, improved documentation, and effective communication. RESULTS The Indian Pharmaceutical Industry (IPI) possesses the potential to make significant contributions to the global healthcare sector. Moreover, we delve into the various phases of TT, highlighting the pivotal role of formulation development and process optimization in ensuring product quality, efficiency, and cost-effectiveness along with different models of TT. Additionally, we examine the challenges associated with TT and potential solutions, as well as the initiatives of the Indian government to bolster the Indian pharmaceutical sector's position as the "Pharmacy of the World". CONCLUSION It is concluded that there is a need to contextualize and institutionalize the tech transfer policies for successful implementation for the benefit of the global population.
Collapse
Affiliation(s)
- Kishan Singh
- All India Institute of Ayurveda, Sarita Vihar, New Delhi 110076, India.
| | - Nidhi Nainwal
- Uttaranchal Institute of Pharmaceutical Sciences, Uttaranchal University, Premnagar, Dehradun, Uttarakhand 248007, India.
| | - Havagiray R Chitme
- Amity Institute of Pharmacy, Amity University Uttar Pradesh, Noida 201313, India.
| |
Collapse
|
3
|
Peteani G, Huynh MTD, Gerebtzoff G, Rodríguez-Pérez R. Application of machine learning models for property prediction to targeted protein degraders. Nat Commun 2024; 15:5764. [PMID: 38982061 PMCID: PMC11233499 DOI: 10.1038/s41467-024-49979-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 06/21/2024] [Indexed: 07/11/2024] Open
Abstract
Machine learning (ML) systems can model quantitative structure-property relationships (QSPR) using existing experimental data and make property predictions for new molecules. With the advent of modalities such as targeted protein degraders (TPD), the applicability of QSPR models is questioned and ML usage in TPD-centric projects remains limited. Herein, ML models are developed and evaluated for TPDs' property predictions, including passive permeability, metabolic clearance, cytochrome P450 inhibition, plasma protein binding, and lipophilicity. Interestingly, performance on TPDs is comparable to that of other modalities. Predictions for glues and heterobifunctionals often yield lower and higher errors, respectively. For permeability, CYP3A4 inhibition, and human and rat microsomal clearance, misclassification errors into high and low risk categories are lower than 4% for glues and 15% for heterobifunctionals. For all modalities, misclassification errors range from 0.8% to 8.1%. Investigated transfer learning strategies improve predictions for heterobifunctionals. This is the first comprehensive evaluation of ML for the prediction of absorption, distribution, metabolism, and excretion (ADME) and physicochemical properties of TPD molecules, including heterobifunctional and molecular glue sub-modalities. Taken together, our investigations show that ML-based QSPR models are applicable to TPDs and support ML usage for TPDs' design, to potentially accelerate drug discovery.
Collapse
Affiliation(s)
- Giulia Peteani
- Novartis Biomedical Research, Novartis Campus, 4002, Basel, Switzerland
| | | | | | | |
Collapse
|
4
|
Walter M, Borghardt JM, Humbeck L, Skalic M. Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets. Mol Inform 2024:e202400079. [PMID: 38973777 DOI: 10.1002/minf.202400079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 04/10/2024] [Accepted: 05/04/2024] [Indexed: 07/09/2024]
Abstract
ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.
Collapse
Affiliation(s)
- Moritz Walter
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Jens M Borghardt
- Drug Discovery Sciences Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Lina Humbeck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Miha Skalic
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| |
Collapse
|
5
|
Liu J, Gui Y, Rao J, Sun J, Wang G, Ren Q, Qu N, Niu B, Chen Z, Sheng X, Wang Y, Zheng M, Li X. In silico off-target profiling for enhanced drug safety assessment. Acta Pharm Sin B 2024; 14:2927-2941. [PMID: 39027254 PMCID: PMC11252485 DOI: 10.1016/j.apsb.2024.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/21/2024] [Accepted: 02/29/2024] [Indexed: 07/20/2024] Open
Abstract
Ensuring drug safety in the early stages of drug development is crucial to avoid costly failures in subsequent phases. However, the economic burden associated with detecting drug off-targets and potential side effects through in vitro safety screening and animal testing is substantial. Drug off-target interactions, along with the adverse drug reactions they induce, are significant factors affecting drug safety. To assess the liability of candidate drugs, we developed an artificial intelligence model for the precise prediction of compound off-target interactions, leveraging multi-task graph neural networks. The outcomes of off-target predictions can serve as representations for compounds, enabling the differentiation of drugs under various ATC codes and the classification of compound toxicity. Furthermore, the predicted off-target profiles are employed in adverse drug reaction (ADR) enrichment analysis, facilitating the inference of potential ADRs for a drug. Using the withdrawn drug Pergolide as an example, we elucidate the mechanisms underlying ADRs at the target level, contributing to the exploration of the potential clinical relevance of newly predicted off-target interactions. Overall, our work facilitates the early assessment of compound safety/toxicity based on off-target identification, deduces potential ADRs of drugs, and ultimately promotes the secure development of drugs.
Collapse
Affiliation(s)
- Jin Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
| | - Yike Gui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingjing Sun
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Gang Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qun Ren
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Ning Qu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhiyi Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xia Sheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Mingyue Zheng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Nanjing University of Chinese Medicine, Nanjing 210023, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, Hangzhou 330106, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
6
|
Tinkov OV, Osipov VN, Kolotaev AV, Khachatryan DS, Grigorev VY. HT_PREDICT: a machine learning-based computational open-source tool for screening HDAC6 inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:505-530. [PMID: 39007781 DOI: 10.1080/1062936x.2024.2371155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 06/17/2024] [Indexed: 07/16/2024]
Abstract
Histone deacetylase 6 (HDAC6) is a promising drug target for the treatment of human diseases such as cancer, neurodegenerative diseases (in particular, Alzheimer's disease), and multiple sclerosis. Considerable attention is paid to the development of selective non-toxic HDAC6 inhibitors. To this end, we successfully form a set of 3854 compounds and proposed adequate regression QSAR models for HDAC6 inhibitors. The models have been developed using the PubChem, Klekota-Roth, 2D atom pair fingerprints, and RDkit descriptors and the gradient boosting, support vector machines, neural network, and k-nearest neighbours methods. The models are integrated into the developed HT_PREDICT application, which is freely available at https://htpredict.streamlit.app/. In vitro studies have confirmed the predictive ability of the proposed QSAR models integrated into the HT_PREDICT web application. In addition, the virtual screening performed with the HT_PREDICT web application allowed us to propose two promising inhibitors for further investigations.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Pharmacology and Pharmaceutical Chemistry, Medical Faculty, Shevchenko Transnistria State University, Tiraspol, Moldova
| | - V N Osipov
- Department of Chemical Synthesis, Blokhin National Medical Research Center of Oncology, Ministry of Health of the Russian Federation, Moscow, Russia
| | - A V Kolotaev
- Laboratory of Natural Compounds, National Research Centre "Kurchatov Institute", Moscow, Russia
| | - D S Khachatryan
- Laboratory of Natural Compounds, National Research Centre "Kurchatov Institute", Moscow, Russia
| | - V Y Grigorev
- Institute of Physiologically Active Compounds, Federal Research Center of Problems of Chemical Physics and Medicinal Chemistry, Russian Academy of Sciences, Chernogolovka, Russia
| |
Collapse
|
7
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
8
|
Umemori Y, Handa K, Yoshimura S, Kageyama M, Iijima T. Development of a Novel In Silico Classification Model to Assess Reactive Metabolite Formation in the Cysteine Trapping Assay and Investigation of Important Substructures. Biomolecules 2024; 14:535. [PMID: 38785942 PMCID: PMC11117661 DOI: 10.3390/biom14050535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/25/2024] Open
Abstract
Predicting whether a compound can cause drug-induced liver injury (DILI) is difficult due to the complexity of drug mechanism. The cysteine trapping assay is a method for detecting reactive metabolites that bind to microsomes covalently. However, it is cumbersome to use 35S isotope-labeled cysteine for this assay. Therefore, we constructed an in silico classification model for predicting a positive/negative outcome in the cysteine trapping assay. We collected 475 compounds (436 in-house compounds and 39 publicly available drugs) based on experimental data performed in this study, and the composition of the results showed 248 positives and 227 negatives. Using a Message Passing Neural Network (MPNN) and Random Forest (RF) with extended connectivity fingerprint (ECFP) 4, we built machine learning models to predict the covalent binding risk of compounds. In the time-split dataset, AUC-ROC of MPNN and RF were 0.625 and 0.559 in the hold-out test, restrictively. This result suggests that the MPNN model has a higher predictivity than RF in the time-split dataset. Hence, we conclude that the in silico MPNN classification model for the cysteine trapping assay has a better predictive power. Furthermore, most of the substructures that contributed positively to the cysteine trapping assay were consistent with previous results.
Collapse
Affiliation(s)
| | - Koichi Handa
- DMPK Research Department, Teijin Institute for Bio-Medical Research, TEIJIN PHARMA LIMITED, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan; (Y.U.); (S.Y.); (M.K.); (T.I.)
| | | | | | | |
Collapse
|
9
|
Kötter A, Allenspach S, Grebner C, Matter H, Hiss JA, Schneider G, Hessler G. Task-Similarity is a Crucial Factor for Few-Shot Meta-Learning of Structure-Activity Relationships. Chembiochem 2024:e202400095. [PMID: 38682398 DOI: 10.1002/cbic.202400095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/24/2024] [Accepted: 04/25/2024] [Indexed: 05/01/2024]
Abstract
Machine learning models support computer-aided molecular design and compound optimization. However, the initial phases of drug discovery often face a scarcity of training data for these models. Meta-learning has emerged as a potentially promising strategy, harnessing the wealth of structure-activity data available for known targets to facilitate efficient few-shot model training for the specific target of interest. In this study, we assessed the effectiveness of two different meta-learning methods, namely model-agnostic meta-learning (MAML) and adaptive deep kernel fitting (ADKF), specifically in the regression setting. We investigated how factors such as dataset size and the similarity of training tasks impact predictability. The results indicate that ADKF significantly outperformed both MAML and a single-task baseline model on the inhibition data. However, the performance of ADKF varied across different test tasks. Our findings suggest that considerable enhancements in performance can be anticipated primarily when the task of interest is similar to the tasks incorporated in the meta-learning process.
Collapse
Affiliation(s)
- Alex Kötter
- R&D, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Stephan Allenspach
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Christoph Grebner
- R&D, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Hans Matter
- R&D, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| | - Jan A Hiss
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland
| | - Gerhard Hessler
- R&D, Integrated Drug Discovery, Sanofi-Aventis Deutschland GmbH, Industriepark Höchst, 65926, Frankfurt am Main, Germany
| |
Collapse
|
10
|
Chen Z, Zhang L, Zhang P, Guo H, Zhang R, Li L, Li X. Prediction of Cytochrome P450 Inhibition Using a Deep Learning Approach and Substructure Pattern Recognition. J Chem Inf Model 2024; 64:2528-2538. [PMID: 37864562 DOI: 10.1021/acs.jcim.3c01396] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2023]
Abstract
Cytochrome P450 (CYP) is a family of enzymes that are responsible for about 75% of all metabolic reactions. Among them, CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 participate in the metabolism of most drugs and mediate many adverse drug reactions. Therefore, it is necessary to estimate the chemical inhibition of Cytochrome P450 enzymes in drug discovery and the food industry. In the past few decades, many computational models have been reported, and some provided good performance. However, there are still several issues that should be resolved for these models, such as single isoform, models with unbalanced performance, lack of structural characteristics analysis, and poor availability. In the present study, the deep learning models based on python using the Keras framework and TensorFlow were developed for the chemical inhibition of each CYP isoform. These models were established based on a large data set containing 85715 compounds extracted from the PubChem bioassay database. On external validation, the models provided good AUC values with 0.97, 0.94, 0.94, 0.96, and 0.94 for CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, respectively. The models can be freely accessed on the Web server named CYPi-DNNpredictor (cypi.sapredictor.cn), and the codes for the model were made open source in the Supporting Information. In addition, we also analyzed the structural characteristics of chemicals with CYP450 inhibition and detected the structural alerts (SAs), which should be responsible for the inhibition. The SAs were also made available online, named CYPi-SAdetector (cypisa.sapredictor.cn). The models can be used as a powerful tool for the prediction of CYP450 inhibitors, and the SAs should provide useful information for the mechanisms of Cytochrome P450 inhibition.
Collapse
Affiliation(s)
- Zhaoyang Chen
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Le Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Pei Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ling Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| |
Collapse
|
11
|
Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert DA, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, Göller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H. MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information. J Chem Inf Model 2024; 64:2331-2344. [PMID: 37642660 PMCID: PMC11005050 DOI: 10.1021/acs.jcim.3c00799] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 08/31/2023]
Abstract
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Collapse
Affiliation(s)
| | - Lewis Mervin
- AstraZeneca
R&D, Biomedical Campus, 1 Francis Crick Ave, Cambridge CB2 0SL, U.K.
| | - Tobias Morawietz
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Noé Sturm
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Lukas Friedrich
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Adam Zalewski
- Amgen Research
(Munich) GmbH, Staffelseestraße
2, Munich 81477, Germany
| | - Anastasia Pentina
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Lina Humbeck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Martijn Oldenhof
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Ritsuya Niwayama
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | | | - Nikolas Fechner
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Jaak Simm
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Adam Arany
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Rama Jabal
- Iktos, 65 rue de Prony, Paris 75017, France
| | - Arina Afanasyeva
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Regis Loeb
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | - Shlok Verma
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Simon Harnqvist
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Matthew Holmes
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Balazs Pejo
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | | | - Nicholas Holway
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Arne Dieckmann
- Bayer
AG, API Production, Product Supply, Pharmaceuticals, Ernst-Schering-Straße 14, Bergkamen 59192, Germany
| | - Nicola Rieke
- NVIDIA
GmbH, Floessergasse 2, Munich 81369, Germany
| | | | - Djork-Arné Clevert
- Bayer AG, Machine Learning Research, Research & Development,
Pharmaceuticals, Berlin 10117, Germany
| | - Michael Krug
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Christopher Luscombe
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Darren Green
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Peter Ertl
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Peter Antal
- Budapest
University of Technology and Economics, Department of Measurement and Information Systems, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - David Marcus
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | | | - Hideyoshi Fuji
- Modality
Informatics Group, Digital Research Solutions, Advanced Informatics
& Analytics, Astellas Pharma Inc., 21 Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Stephen Pickett
- GlaxoSmithKline, Computational Sciences, Gunnels Wood Road Stevenage, Herts SG1 2NY, U.K.
| | - Gergely Acs
- Budapest
University of Technology and Economics, Department of Networked Systems and Services, Műegyetem rkp. 3, Budapest 1111, Hungary
| | - Eric Boniface
- Substra
Foundation - Labelia Labs, 4 rue Voltaire, Nantes 44000, France
| | - Bernd Beck
- BI Medicinal
Chemistry Department, Boehringer Ingelheim
Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Germany
| | - Yax Sun
- Amgen
Research, 1 Amgen Center
Drive, Thousand Oaks, California 92130, United States
| | - Arnaud Gohier
- Institut
de recherches Servier, 125 chemin de ronde Croissy-sur-Seine, Île-de-France 78290, France
| | - Friedrich Rippmann
- Merck KGaA, Global Research & Development, Frankfurter Strasse 250, Darmstadt 64293, Germany
| | - Ola Engkvist
- AstraZeneca, Molecular AI, Discovery Sciences,
R&D, Pepparedsleden
1, Mölndal 431 50, Sweden
| | - Andreas H. Göller
- Bayer
Pharma
AG, Global Drug Discovery, Chemical Research,
Computational Chemistry, Aprather Weg 18 a, Wuppertal 42096, Germany
| | - Yves Moreau
- KU
Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
| | | | - Ansgar Schuffenhauer
- Novartis
Institutes for BioMedical Research, Novartis Campus, Basel 4002, Switzerland
| | - Hugo Ceulemans
- Janssen
Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium
| |
Collapse
|
12
|
Agu PC, Obulose CN. Piquing artificial intelligence towards drug discovery: Tools, techniques, and applications. Drug Dev Res 2024; 85:e22159. [PMID: 38375772 DOI: 10.1002/ddr.22159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/12/2024] [Accepted: 01/29/2024] [Indexed: 02/21/2024]
Abstract
The purpose of this study was to discuss how artificial intelligence (AI) methods have affected the field of drug development. It looks at how AI models and data resources are reshaping the drug development process by offering more affordable and expedient options to conventional approaches. The paper opens with an overview of well-known information sources for drug development. The discussion then moves on to molecular representation techniques that make it possible to convert data into representations that computers can understand. The paper also gives a general overview of the algorithms used in the creation of drug discovery models based on AI. In particular, the paper looks at how AI algorithms might be used to forecast drug toxicity, drug bioactivity, and drug physicochemical properties. De novo drug design, binding affinity prediction, and other AI-based models for drug-target interaction were covered in deeper detail. Modern applications of AI in nanomedicine design and pharmacological synergism/antagonism prediction were also covered. The potential advantages of AI in drug development are highlighted as the evaluation comes to a close. It underlines how AI may greatly speed up and improve the efficiency of drug discovery, resulting in the creation of new and better medicines. To fully realize the promise of AI in drug discovery, the review acknowledges the difficulties that come with its uses in this field and advocates for more study and development.
Collapse
Affiliation(s)
- Peter Chinedu Agu
- Department of Biochemistry, College of Science, Evangel University, Akaeze, Ebonyi State, Nigeria
| | - Chidiebere Nwiboko Obulose
- Department of Computer Sciences, Our Savior Institute of Science, Agriculture, and Technology (OSISATECH Polytechnic), Enugu, Nigeria
| |
Collapse
|
13
|
Melo L, Scotti L, Scotti MT. Development of a standardized methodology for transfer learning with QSAR models: a purely data-driven approach for source task selection. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:183-198. [PMID: 38312090 DOI: 10.1080/1062936x.2024.2311693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/23/2024] [Indexed: 02/06/2024]
Abstract
Transfer learning is a machine learning technique that works well with chemical endpoints, with several papers confirming its efficiency. Although effective, because the choice of source/assistant tasks is non-trivial, the application of this technique is severely limited by the domain knowledge of the modeller. Considering this limitation, we developed a purely data-driven approach for source task selection that abstracts the need for domain knowledge. To achieve this, we created a supervised learning setting in which transfer outcome (positive/negative) is the variable to be predicted, and a set of six transferability metrics, calculated based on information from target and source datasets, are the features for prediction. We used the ChEMBL database to generate 100,000 transfers using random pairing, and with these transfers, we trained and evaluated our transferability prediction model (TP-Model). Our TP-Model achieved a 135-fold increase in precision while achieving a sensitivity of 92%, demonstrating a clear superiority against random search. In addition, we observed that transfer learning could provide considerable performance increases when applicable, with an average Matthews Correlation Coefficient (MCC) increase of 0.19 when using a single source and an average MCC increase of 0.44 when using multiple sources.
Collapse
Affiliation(s)
- L Melo
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba, João Pessoa, Brazil
| | - L Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba, João Pessoa, Brazil
| | - M T Scotti
- Postgraduate Program in Natural and Synthetic Bioactive Products, Federal University of Paraíba, João Pessoa, Brazil
| |
Collapse
|
14
|
Buterez D, Janet JP, Kiddle SJ, Oglic D, Lió P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat Commun 2024; 15:1517. [PMID: 38409255 PMCID: PMC11258334 DOI: 10.1038/s41467-024-45566-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 01/25/2024] [Indexed: 02/28/2024] Open
Abstract
We investigate the potential of graph neural networks for transfer learning and improving molecular property prediction on sparse and expensive to acquire high-fidelity data by leveraging low-fidelity measurements as an inexpensive proxy for a targeted property of interest. This problem arises in discovery processes that rely on screening funnels for trading off the overall costs against throughput and accuracy. Typically, individual stages in these processes are loosely connected and each one generates data at different scale and fidelity. We consider this setup holistically and demonstrate empirically that existing transfer learning techniques for graph neural networks are generally unable to harness the information from multi-fidelity cascades. Here, we propose several effective transfer learning strategies and study them in transductive and inductive settings. Our analysis involves a collection of more than 28 million unique experimental protein-ligand interactions across 37 targets from drug discovery by high-throughput screening and 12 quantum properties from the dataset QMugs. The results indicate that transfer learning can improve the performance on sparse tasks by up to eight times while using an order of magnitude less high-fidelity training data. Moreover, the proposed methods consistently outperform existing transfer learning strategies for graph-structured data on drug discovery and quantum mechanics datasets.
Collapse
Affiliation(s)
- David Buterez
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Jon Paul Janet
- Molecular AI, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Steven J Kiddle
- Data Science & Advanced Analytics, Data Science & AI, R&D, AstraZeneca, Cambridge, UK
| | - Dino Oglic
- Centre for AI, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| |
Collapse
|
15
|
Hu J, Allen BK, Stathias V, Ayad NG, Schürer SC. Kinome-Wide Virtual Screening by Multi-Task Deep Learning. Int J Mol Sci 2024; 25:2538. [PMID: 38473785 PMCID: PMC10932040 DOI: 10.3390/ijms25052538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/04/2024] [Accepted: 02/17/2024] [Indexed: 03/14/2024] Open
Abstract
Deep learning is a machine learning technique to model high-level abstractions in data by utilizing a graph composed of multiple processing layers that experience various linear and non-linear transformations. This technique has been shown to perform well for applications in drug discovery, utilizing structural features of small molecules to predict activity. Here, we report a large-scale study to predict the activity of small molecules across the human kinome-a major family of drug targets, particularly in anti-cancer agents. While small-molecule kinase inhibitors exhibit impressive clinical efficacy in several different diseases, resistance often arises through adaptive kinome reprogramming or subpopulation diversity. Polypharmacology and combination therapies offer potential therapeutic strategies for patients with resistant diseases. Their development would benefit from a more comprehensive and dense knowledge of small-molecule inhibition across the human kinome. Leveraging over 650,000 bioactivity annotations for more than 300,000 small molecules, we evaluated multiple machine learning methods to predict the small-molecule inhibition of 342 kinases across the human kinome. Our results demonstrated that multi-task deep neural networks outperformed classical single-task methods, offering the potential for conducting large-scale virtual screening, predicting activity profiles, and bridging the gaps in the available data.
Collapse
Affiliation(s)
- Jiaming Hu
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
| | - Bryce K. Allen
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
- Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
| | - Nagi G. Ayad
- Center for Therapeutic Innovation Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Miami Project to Cure Paralysis, Department of Psychiatry and Behavioral Sciences, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Stephan C. Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (B.K.A.); (V.S.)
- Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
- Center for Therapeutic Innovation Miller School of Medicine, University of Miami, Miami, FL 33136, USA;
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| |
Collapse
|
16
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
17
|
Zhang C, Zang T, Zhao T. KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery. Brief Bioinform 2024; 25:bbae043. [PMID: 38348746 PMCID: PMC10939374 DOI: 10.1093/bib/bbae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 12/29/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
The prediction of molecular interactions is vital for drug discovery. Existing methods often focus on individual prediction tasks and overlook the relationships between them. Additionally, certain tasks encounter limitations due to insufficient data availability, resulting in limited performance. To overcome these limitations, we propose KGE-UNIT, a unified framework that combines knowledge graph embedding (KGE) and multi-task learning, for simultaneous prediction of drug-target interactions (DTIs) and drug-drug interactions (DDIs) and enhancing the performance of each task, even when data availability is limited. Via KGE, we extract heterogeneous features from the drug knowledge graph to enhance the structural features of drug and protein nodes, thereby improving the quality of features. Additionally, employing multi-task learning, we introduce an innovative predictor that comprises the task-aware Convolutional Neural Network-based (CNN-based) encoder and the task-aware attention decoder which can fuse better multimodal features, capture the contextual interactions of molecular tasks and enhance task awareness, leading to improved performance. Experiments on two imbalanced datasets for DTIs and DDIs demonstrate the superiority of KGE-UNIT, achieving high area under the receiver operating characteristics curves (AUROCs) (0.942, 0.987) and area under the precision-recall curve ( AUPRs) (0.930, 0.980) for DTIs and high AUROCs (0.975, 0.989) and AUPRs (0.966, 0.988) for DDIs. Notably, on the LUO dataset where the data were more limited, KGE-UNIT exhibited a more pronounced improvement, with increases of 4.32$\%$ in AUROC and 3.56$\%$ in AUPR for DTIs and 6.56$\%$ in AUROC and 8.17$\%$ in AUPR for DDIs. The scalability of KGE-UNIT is demonstrated through its extension to protein-protein interactions prediction, ablation studies and case studies further validate its effectiveness.
Collapse
Affiliation(s)
- Chengcheng Zhang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zang
- Department of Computer Science, Harbin Institute of Technology, Harbin, 150001, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
18
|
Zhang R, Xie X, Ni D, Wang H, Li J, Xiao W. MT-EpiPred: Multitask Learning for Prediction of Small-Molecule Epigenetic Modulators. J Chem Inf Model 2024; 64:110-118. [PMID: 38109786 DOI: 10.1021/acs.jcim.3c01368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Epigenetic modulators play an increasingly crucial role in the treatment of various diseases. In this case, it is imperative to systematically investigate the activity of these agents and understand their influence on the entire epigenetic regulatory network rather than solely concentrate on individual targets. This work introduces MT-EpiPred, a multitask learning method capable of predicting the activity of compounds against 78 epigenetic targets. MT-EpiPred demonstrated outstanding performance, boasting an average auROC of 0.915 and the ability to handle few-shot targets. In comparison to the existing method, MT-EpiPred not only expands the target pool but also achieves superior predictive performance with the same data set. MT-EpiPred was then applied to predict the epigenetic target of a newly synthesized compound (1), where the molecular target was unknown. The method identified KDM4D as a potential target, which was subsequently validated through an in vitro enzyme inhibition assay, revealing an IC50 of 4.8 μM. The MT-EpiPred method has been implemented in the web server MT-EpiPred (http://epipred.com), providing free accessibility. In summary, this work presents a convenient and accurate tool for discovering novel small-molecule epigenetic modulators, particularly in the development of selective inhibitors and evaluating the impact of these inhibitors over a broad epigenetic network.
Collapse
Affiliation(s)
- Ruihan Zhang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Xingran Xie
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Dongxuan Ni
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Hairong Wang
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Jin Li
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| | - Weilie Xiao
- Key Laboratory of Medicinal Chemistry for Natural Resource, Ministry of Education; Yunnan Key Laboratory of Research and Development for Natural Products; The Cloud Computing Engineering Research Center of Yunnan Province; Key Laboratory of Software Engineering of Yunnan Province; School of Software; School of Pharmacy, Yunnan University, Kunming 650500, P. R. China
| |
Collapse
|
19
|
Das K, Paltani M, Tripathi PK, Kumar R, Verma S, Kumar S, Jain CK. Current implications and challenges of artificial intelligence technologies in therapeutic intervention of colorectal cancer. EXPLORATION OF TARGETED ANTI-TUMOR THERAPY 2023; 4:1286-1300. [PMID: 38213536 PMCID: PMC10776591 DOI: 10.37349/etat.2023.00197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 08/28/2023] [Indexed: 01/13/2024] Open
Abstract
Irrespective of men and women, colorectal cancer (CRC), is the third most common cancer in the population with more than 1.85 million cases annually. Fewer than 20% of patients only survive beyond five years from diagnosis. CRC is a highly preventable disease if diagnosed at the early stage of malignancy. Several screening methods like endoscopy (like colonoscopy; gold standard), imaging examination [computed tomographic colonography (CTC)], guaiac-based fecal occult blood (gFOBT), immunochemical test from faeces, and stool DNA test are available with different levels of sensitivity and specificity. The available screening methods are associated with certain drawbacks like invasiveness, cost, or sensitivity. In recent years, computer-aided systems-based screening, diagnosis, and treatment have been very promising in the early-stage detection and diagnosis of CRC cases. Artificial intelligence (AI) is an enormously in-demand, cost-effective technology, that uses various tools machine learning (ML), and deep learning (DL) to screen, diagnose, and stage, and has great potential to treat CRC. Moreover, different ML algorithms and neural networks [artificial neural network (ANN), k-nearest neighbors (KNN), and support vector machines (SVMs)] have been deployed to predict precise and personalized treatment options. This review examines and summarizes different ML and DL models used for therapeutic intervention in CRC cancer along with the gap and challenges for AI.
Collapse
Affiliation(s)
- Kriti Das
- Department of Artificial Intelligence and Precision Medicine, School of Allied Health Sciences and Management, Delhi Pharmaceutical Sciences and Research University, New Delhi 110017, India
| | - Maanvi Paltani
- Department of Artificial Intelligence and Precision Medicine, School of Allied Health Sciences and Management, Delhi Pharmaceutical Sciences and Research University, New Delhi 110017, India
| | - Pankaj Kumar Tripathi
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida 201309, Uttar Pradesh, India
| | - Rajnish Kumar
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Saniya Verma
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Subodh Kumar
- Department of Medical Laboratory Technology, School of Allied Health Sciences, Delhi Pharmaceutical Sciences and Research University, Delhi 110017, India
| | - Chakresh Kumar Jain
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida 201309, Uttar Pradesh, India
| |
Collapse
|
20
|
Hua Y, Luo L, Qiu H, Huang D, Zhao Y, Liu H, Lu T, Chen Y, Zhang Y, Jiang Y. Multimodal multi-task deep neural network framework for kinase-target prediction. Mol Divers 2023; 27:2491-2503. [PMID: 36369613 DOI: 10.1007/s11030-022-10565-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022]
Abstract
Kinase plays a significant role in various disease signaling pathways. Due to the highly conserved sequence of kinase family members, understanding the selectivity profile of kinase inhibitors remains a priority for drug discovery. Previous methods for kinase selectivity identification use biochemical assays, which are very useful but limited by the protein available. The lack of kinase selectivity can exert benefits but also can cause adverse effects. With the explosion of the dataset for kinase activities, current computational methods can achieve accuracy for large-scale selectivity predictions. Here, we present a multimodal multi-task deep neural network model for kinase selectivity prediction by calculating the fingerprint and physiochemical descriptors. With the multimodal inputs of structure and physiochemical properties information, the multi-task framework could accurately predict the kinome map for selectivity analysis. The proposed model displays better performance for kinase-target prediction based on system evaluations.
Collapse
Affiliation(s)
- Yi Hua
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Yulei Jiang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
21
|
Wang K, Amidon GL, Smith DE. Physiological Dynamics in the Upper Gastrointestinal Tract and the Development of Gastrointestinal Absorption Models for the Immediate-Release Oral Dosage Forms in Healthy Adult Human. Pharm Res 2023; 40:2607-2626. [PMID: 37783928 DOI: 10.1007/s11095-023-03597-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/26/2023] [Indexed: 10/04/2023]
Abstract
This review is a revisit of various oral drug absorption models developed in the past decades, focusing on how to incorporate the physiological dynamics in the upper gastrointestinal (GI) tract. For immediate-release oral drugs, GI absorption is a critical input of drug exposure and subsequent human body response, yet difficult to model largely due to the complex GI environment. One of the biggest hurdles lies at capturing the high within-subject variability (WSV) of bioavailability measures, which can be mechanistically explained by the GI physiological dynamics. A thorough summary of how GI dynamics is handled in the absorption models would promote the development of mechanism-based oral drug absorption models, aid in the design of clinical studies regarding dosing regimens and bioequivalence studies based on WSV, and advance the decision-making on formulation selection.
Collapse
Affiliation(s)
- Kai Wang
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI, 48109, USA.
| | - Gordon L Amidon
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI, 48109, USA
| | - David E Smith
- Department of Pharmaceutical Sciences, University of Michigan, Ann Arbor, MI, 48109, USA
| |
Collapse
|
22
|
Schaller D, Christ CD, Chodera JD, Volkamer A. Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557138. [PMID: 37745489 PMCID: PMC10515787 DOI: 10.1101/2023.09.11.557138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand-utilizing shape overlap with or without maximum common substructure matching-are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies-which utilize the OpenEye Toolkit-were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.
Collapse
Affiliation(s)
- David Schaller
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Clara D. Christ
- Molecular Design, Research and Development, Pharmaceuticals, Bayer AG, 13342 Berlin, Germany
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Andrea Volkamer
- In Silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
- Data Driven Drug Design, Faculty of Mathematics and Computer Sciences, Saarland University, Saarbrücken, Germany
| |
Collapse
|
23
|
Lui R, Guan D, Matthews S. Mechanistic Task Groupings Enhance Multitask Deep Learning of Strain-Specific Ames Mutagenicity. Chem Res Toxicol 2023; 36:1248-1254. [PMID: 37478285 DOI: 10.1021/acs.chemrestox.2c00385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]
Abstract
The Ames test is a gold standard mutagenicity assay that utilizes various Salmonella typhimurium strains with and without S9 fraction to provide insights into the mechanisms by which a chemical can mutate DNA. Multitask deep learning is an ideal framework for developing QSAR models with multiple end points, such as the Ames test, as the joint training of multiple predictive tasks may synergistically improve the prediction accuracy of each task. This work investigated how toxicology domain knowledge can be used to handcraft task groupings that better guide the training of multitask neural networks compared to a naïve ungrouped multitask neural network developed on a complete set of tasks. Sixteen S. typhimurium ± S9 strain tasks were used to generate groupings based on mutagenic and metabolic mechanisms that were reflected in correlation data analyses. Both grouped and ungrouped multitask neural networks predicted the 16 strain tasks with a higher balanced accuracy compared with single task controls, with grouped multitask neural networks consistently featuring incremental increases in predictivity over the ungrouped approach. We conclude that the main variable driving these performance improvements is the general multitask effect with mechanistic task groupings acting as an enhancement step to further concentrate synergistic training signals united by a common biological mechanism. This approach enables incorporation of toxicology domain knowledge into multitask QSAR model development allowing for more transparent and accurate Ames mutagenicity prediction.
Collapse
Affiliation(s)
- Raymond Lui
- Computational Pharmacology and Toxicology Laboratory, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
| | - Davy Guan
- Computational Pharmacology and Toxicology Laboratory, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
| | - Slade Matthews
- Computational Pharmacology and Toxicology Laboratory, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
24
|
AbdulHameed MDM, Liu R, Wallqvist A. Using a Graph Convolutional Neural Network Model to Identify Bile Salt Export Pump Inhibitors. ACS OMEGA 2023; 8:21853-21861. [PMID: 37360478 PMCID: PMC10286257 DOI: 10.1021/acsomega.3c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 05/19/2023] [Indexed: 06/28/2023]
Abstract
The bile salt export pump (BSEP) is a key transporter involved in the efflux of bile salts from hepatocytes to bile canaliculi. Inhibition of BSEP leads to the accumulation of bile salts within the hepatocytes, leading to possible cholestasis and drug-induced liver injury. Screening for and identification of chemicals that inhibit this transporter aid in understanding the safety liabilities of these chemicals. Moreover, computational approaches to identify BSEP inhibitors provide an alternative to the more resource-intensive, gold standard experimental approaches. Here, we used publicly available data to develop predictive machine learning models for the identification of potential BSEP inhibitors. Specifically, we analyzed the utility of a graph convolutional neural network (GCNN)-based approach in combination with multitask learning to identify BSEP inhibitors. Our analyses showed that the developed GCNN model performed better than the variable-nearest neighbor and Bayesian machine learning approaches, with a cross-validation receiver operating characteristic area under the curve of 0.86. In addition, we compared GCNN-based single-task and multitask models and evaluated their utility in addressing data limitation challenges commonly observed in bioactivity modeling. We found that multitask models performed better than single-task models and can be utilized to identify active molecules for targets with limited data availability. Overall, our developed multitask GCNN-based BSEP model provides a useful tool for prioritizing hits during early drug discovery and in risk assessment of chemicals.
Collapse
Affiliation(s)
- Mohamed Diwan M. AbdulHameed
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Ruifeng Liu
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
- The
Henry M. Jackson Foundation for the Advancement of Military Medicine,
Inc., Bethesda 20817, Maryland, United States
| | - Anders Wallqvist
- Department
of Defense Biotechnology High Performance Computing Software Applications
Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Development Command, Fort Detrick 21702, Maryland, United States
| |
Collapse
|
25
|
Fang C, Wang Y, Grater R, Kapadnis S, Black C, Trapa P, Sciabola S. Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective. J Chem Inf Model 2023. [PMID: 37216672 DOI: 10.1021/acs.jcim.3c00160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Absorption, distribution, metabolism, and excretion (ADME), which collectively define the concentration profile of a drug at the site of action, are of critical importance to the success of a drug candidate. Recent advances in machine learning algorithms and the availability of larger proprietary as well as public ADME data sets have generated renewed interest within the academic and pharmaceutical science communities in predicting pharmacokinetic and physicochemical endpoints in early drug discovery. In this study, we collected 120 internal prospective data sets over 20 months across six ADME in vitro endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. A variety of machine learning algorithms in combination with different molecular representations were evaluated. Our results suggest that gradient boosting decision tree and deep learning models consistently outperformed random forest over time. We also observed better performance when models were retrained on a fixed schedule, and the more frequent retraining generally resulted in increased accuracy, while hyperparameters tuning only improved the prospective predictions marginally.
Collapse
Affiliation(s)
- Cheng Fang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Ye Wang
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| | - Richard Grater
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | | | - Cheryl Black
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Patrick Trapa
- DMPK, Biogen, Cambridge, Massachusetts 02142, United States
| | - Simone Sciabola
- Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
26
|
Zhao Y, Tian Y, Pang X, Li G, Shi S, Yan A. Classification of FLT3 inhibitors and SAR analysis by machine learning methods. Mol Divers 2023:10.1007/s11030-023-10640-8. [PMID: 37142889 DOI: 10.1007/s11030-023-10640-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 03/17/2023] [Indexed: 05/06/2023]
Abstract
FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.
Collapse
Affiliation(s)
- Yunyang Zhao
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Yujia Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Xiaoyang Pang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Guo Li
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China
| | - Shenghui Shi
- College of Information Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China.
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, P.O. Box 53, Beijing, 100029, People's Republic of China.
| |
Collapse
|
27
|
Qin D, Jiao L, Wang R, Zhao Y, Hao Y, Liang G. Prediction of antioxidant peptides using a quantitative structure-activity relationship predictor (AnOxPP) based on bidirectional long short-term memory neural network and interpretable amino acid descriptors. Comput Biol Med 2023; 154:106591. [PMID: 36701965 DOI: 10.1016/j.compbiomed.2023.106591] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 01/15/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Antioxidant peptides can protect against free radical-mediated diseases, especially food-derived antioxidant peptides are considered as potential competitors among synthetic antioxidants due to their safety, high activity and abundant sources. However, wet experimental methods can not meet the need for effectively screening and clearly elucidating the structure-activity relationship of antioxidant peptides. Therefore, it is particularly important to build a reliable prediction platform for antioxidant peptides. In this work, we developed a platform, AnOxPP, for prediction of antioxidant peptides using the bidirectional long short-term memory (BiLSTM) neural network. The sequence characteristics of peptides were converted into feature codes based on amino acid descriptors (AADs). Our results showed that the feature conversion ability of the combined-AADs optimized by the forward feature selection method was more accurate than that of the single-AADs. Especially, the model trained by the optimal descriptor SDPZ27 significantly outperformed the existing predictor on two independent test sets (Accuracy = 0.967 and 0.819, respectively). The SDPZ27-based AnOxPP learned four key structure-activity features of antioxidant peptides, with the following importance as steric properties > hydrophobic properties > electronic properties > hydrogen bond contributions. AnOxPP is a valuable tool for screening and design of peptide drugs, and the web-server is accessible at http://www.cqudfbp.net/AnOxPP/index.jsp.
Collapse
Affiliation(s)
- Dongya Qin
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400030, China
| | - Linna Jiao
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400030, China
| | - Ruihong Wang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400030, China
| | - Yi Zhao
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400030, China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400030, China.
| |
Collapse
|
28
|
Kour S, Biswas I, Sheoran S, Arora S, Sheela P, Duppala SK, Murthy DK, Pawar SC, Singh H, Kumar D, Prabhu D, Vuree S, Kumar R. Artificial intelligence and nanotechnology for cervical cancer treatment: Current status and future perspectives. J Drug Deliv Sci Technol 2023. [DOI: 10.1016/j.jddst.2023.104392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
29
|
Gui C, Li Y, Peng T. Development of predictive QSAR models for the substrates/inhibitors of OATP1B1 by deep neural networks. Toxicol Lett 2023; 376:20-25. [PMID: 36649904 DOI: 10.1016/j.toxlet.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/10/2023] [Accepted: 01/12/2023] [Indexed: 01/15/2023]
Abstract
The organic anion transporting polypeptide 1B1 (OATP1B1) is an important hepatic uptake transporter. Inhibition of its normal function could lead to drug-drug interactions. In silico prediction is an effective means to identify potential OATP1B1 inhibitors and quantitative structure-activity relationship (QSAR) modeling is extensively used. As the structures of OATP1B1 substrates/inhibitors are quite diverse, machine learning based methods should be a good option for their QSAR analysis. In the present study, deep neural networks (DNNs) were employed to develop QSAR models for the substrates/inhibitors of OATP1B1 with different molecular fingerprints. Our results showed that QSAR models based on 4-hidden layer DNNs and ECFP4/FCFP4 fingerprints had the best generalization performance. The correlation coefficients (R2) of test set for ECFP4 and FCFP4 models were 0.641 and 0.653, respectively. Model application domain (AD) was calculated with Euclidean distance-based method, and AD could improve the performance of ECFP4 model but has little effect on FCFP4 model. Finally, the prediction of additional 8 compounds that not included in the data set further demonstrated that our QSAR models had a good predictive ability (averaged prediction accuracy >92%). The developed QSAR models could be used to screen large data sets and discover novel inhibitors for OATP1B1.
Collapse
Affiliation(s)
- Chunshan Gui
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China.
| | - Ying Li
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| | - Taotao Peng
- College of Pharmaceutical Sciences, Soochow University, 199 Renai Road, Suzhou Industrial Park, Suzhou 215123, China
| |
Collapse
|
30
|
Tian Y, Yang Z, Wang H, Yan A. Prediction of bioactivities of microsomal prostaglandin E 2 synthase-1 inhibitors by machine learning algorithms. Chem Biol Drug Des 2023; 101:1307-1321. [PMID: 36752697 DOI: 10.1111/cbdd.14214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 12/24/2022] [Accepted: 02/02/2023] [Indexed: 02/09/2023]
Abstract
There is a strong interest in the development of microsomal prostaglandin E2 synthase-1 (mPGES-1) inhibitors of their potential to safely and effectively treat inflammation. Herein, 70 QSAR models were built on the dataset (735 mPGES-1 inhibitors) characterized with RDKit descriptors by multiple linear regression (MLR), support vector machine (SVM), random forest (RF), deep neural networks (DNN), and eXtreme Gradient Boosting (XGBoost). The other three regression models on the dataset are represented by SMILES using self-attention recurrent neural networks (RNN) and Graph Convolutional Networks (GCN). For the best model (Model C2), which was developed by SVM with RDKit descriptors, the coefficient of determination (R2 ) of 0.861 and root mean squared error (RMSE) of 0.235 were achieved for the test set. Additionally, R2 of 0.692 and RMSE of 0.383 were obtained on the external test set. We investigated the applicability domain (AD) of Model C2 with the rivality index (RI), the prediction of Model C2 on 78.92% of molecules in the test set, and 78.33% of molecules in the external test set were reliable. After dissecting the RDKit descriptors of Model C2, we found important physicochemical properties of highly active mPGES-1 inhibitors. Besides, by analyzing the attention weight of each atom of each inhibitor from the attention layer, we found that the benzamide group and the trifluoromethyl cyclohexane group are favorable substructures for mPGES-1 inhibitors.
Collapse
Affiliation(s)
- Yujia Tian
- Department of Pharmaceutical Engineering, State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing, People's Republic of China
| | - Zhenwu Yang
- Department of Pharmaceutical Engineering, State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing, People's Republic of China
| | - Hongzhao Wang
- Department of Pharmaceutical Engineering, State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing, People's Republic of China
| | - Aixia Yan
- Department of Pharmaceutical Engineering, State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing, People's Republic of China
| |
Collapse
|
31
|
Ru C, Wen W, Zhong Y. Raman spectroscopy for on-line monitoring of botanical extraction process using convolutional neural network with background subtraction. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 284:121494. [PMID: 35715369 DOI: 10.1016/j.saa.2022.121494] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/02/2022] [Accepted: 06/07/2022] [Indexed: 06/15/2023]
Abstract
Aqueous extraction is the most common and cost-effective means of obtaining active ingredients from medicinal plants. However, botanical extracts generally contain high pigment content and complex chemical composition posing a challenge for the process analysis of aqueous extraction. Here, we employed Raman spectroscopy to monitor the physical and chemical properties during the extraction process using convolution neural network (CNN) with background subtraction. Real-time spectra were first preprocessed to eliminate fluorescence background interference. Next, two types of CNN models, the one-dimensional CNN (1D-CNN) based on one preprocessing method, and two-dimensional CNN (2D-CNN) based on a concatenation of differentially pretreated data blocks, were used to receive the preprocessed spectra data. Two case studies were conducted for 1D- and 2D-CNN: the extraction of Aurantii fructus, and the co-extraction of Radix Salvia miltiorrhiza and Rhizoma Ligusticum chuanxiong. Furthermore, partial least squares (PLS) models and sequential preprocessing through orthogonalization (SPORT) models were developed and compared with 1D-CNN and 2D-CNN, respectively. CNN-based methods were superior to other models in terms of prediction accuracy, with 2D-CNN yielding the best results. These results indicated that preprocessing and CNN methods were highly complementary, and could effectively remove the fluorescence effect and artefacts introduced by pretreatment in spectral profile. To the best of our knowledge, this is the first study to demonstrate that a combination of preprocessing and CNN leads to improved prediction performance of analytes when using Raman spectroscopy for online monitoring high-pigmented samples.
Collapse
Affiliation(s)
- Chenlei Ru
- State Key Laboratory of Fluid Power and Mechatronic Systems, School of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China.
| | - Wu Wen
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yi Zhong
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China; Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
| |
Collapse
|
32
|
|
33
|
Umemori Y, Handa K, Sakamoto S, Kageyama M, Iijima T. QSAR model to predict K p,uu,brain with a small dataset, incorporating predicted values of related parameter. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:885-897. [PMID: 36420623 DOI: 10.1080/1062936x.2022.2149619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 11/14/2022] [Indexed: 06/16/2023]
Abstract
The unbound brain-to-plasma concentration ratio (Kp,uu,brain) is a parameter that indicates the extent of central nervous system penetration. Pharmaceutical companies build prediction models because many experiments are required to obtain Kp,uu,brain. However, the lack of data hinders the design of an accurate prediction model. To construct a quantitative structure-activity relationship (QSAR) model with a small dataset of Kp,uu,brain, we investigated whether the prediction accuracy could be improved by incorporating software-predicted brain penetration-related parameters (BPrPs) as explanatory variables for pharmacokinetic parameter prediction. We collected 88 compounds with experimental Kp,uu,brain from various official publications. Random forest was used as the machine learning model. First, we developed prediction models using only structural descriptors. Second, we verified the predictive accuracy of each model with the predicted values of BPrPs incorporated in various combinations. Third, the Kp,uu,brain of the in-house compounds was predicted and compared with the experimental values. The prediction accuracy was improved using five-fold cross-validation (RMSE = 0.455, r2 = 0.726) by incorporating BPrPs. Additionally, this model was verified using an external in-house dataset. The result suggested that using BPrPs as explanatory variables improve the prediction accuracy of the Kp,uu,brain QSAR model when the available number of datasets is small.
Collapse
Affiliation(s)
- Y Umemori
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - K Handa
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - S Sakamoto
- Pharmaceutical Development Coordination Department, Teijin Pharma Limited, Chiyoda-ku, Japan
| | - M Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| | - T Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, Hino-shi, Japan
| |
Collapse
|
34
|
Moon C, Kim D. Prediction of drug-target interactions through multi-task learning. Sci Rep 2022; 12:18323. [PMID: 36316405 PMCID: PMC9622881 DOI: 10.1038/s41598-022-23203-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/26/2022] [Indexed: 12/31/2022] Open
Abstract
Identifying the binding between the target proteins and molecules is essential in drug discovery. The multi-task learning method has been introduced to facilitate knowledge sharing among tasks when the amount of information for each task is small. However, multi-task learning sometimes worsens the overall performance or generates a trade-off between individual task's performance. In this study, we propose a general multi-task learning scheme that not only increases the average performance but also minimizes individual performance degradation, through group selection and knowledge distillation. The groups are selected on the basis of chemical similarity between ligand sets of targets, and the similar targets in the same groups are trained together. During training, we apply knowledge distillation with teacher annealing. The multi-task learning models are guided by the predictions of the single-task learning models. This method results in higher average performance than that from single-task learning and classic multi-task learning. Further analysis reveals that multi-task learning is particularly effective for low performance tasks, and knowledge distillation helps the model avoid the degradation in individual task performance in multi-task learning.
Collapse
Affiliation(s)
- Chaeyoung Moon
- grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| | - Dongsup Kim
- grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| |
Collapse
|
35
|
Shavalieva G, Papadokonstantakis S, Peters G. Prior Knowledge for Predictive Modeling: The Case of Acute Aquatic Toxicity. J Chem Inf Model 2022; 62:4018-4031. [PMID: 35998659 PMCID: PMC9472271 DOI: 10.1021/acs.jcim.1c01079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Early assessment of the potential impact of chemicals
on health
and the environment requires toxicological properties of the molecules.
Predictive modeling is often used to estimate the property values in silico from pre-existing experimental data, which is
often scarce and uncertain. One of the ways to advance the predictive
modeling procedure might be the use of knowledge existing in the field.
Scientific publications contain a vast amount of knowledge. However,
the amount of manual work required to process the enormous volumes
of information gathered in scientific articles might hinder its utilization.
This work explores the opportunity of semiautomated knowledge extraction
from scientific papers and investigates a few potential ways of its
use for predictive modeling. The knowledge extraction and predictive
modeling are applied to the field of acute aquatic toxicity. Acute
aquatic toxicity is an important parameter of the safety assessment
of chemicals. The extensive amount of diverse information existing
in the field makes acute aquatic toxicity an attractive area for investigation
of knowledge use for predictive modeling. The work demonstrates that
the knowledge collection and classification procedure could be useful
in hybrid modeling studies concerning the model and predictor selection,
addressing data gaps, and evaluation of models’ performance.
Collapse
Affiliation(s)
- Gulnara Shavalieva
- Department of Space, Earth and Environment, Division of Energy Technology, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Stavros Papadokonstantakis
- Department of Space, Earth and Environment, Division of Energy Technology, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden.,Institute of Chemical, Environmental and Bioscience Engineering, TU Wien, Getreidemarkt 9, 1060 Vienna, Austria
| | - Gregory Peters
- Department of Technology Management and Economics, Chalmers University of Technology, SE-411 33 Gothenburg, Sweden
| |
Collapse
|
36
|
Walter M, Allen LN, de la Vega de León A, Webb SJ, Gillet VJ. Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction. J Cheminform 2022; 14:32. [PMID: 35672779 PMCID: PMC9172131 DOI: 10.1186/s13321-022-00611-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/12/2022] [Indexed: 11/21/2022] Open
Abstract
Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.
Collapse
|
37
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
38
|
Dey V, Machiraju R, Ning X. Improving Compound Activity Classification via Deep Transfer and Representation Learning. ACS OMEGA 2022; 7:9465-9483. [PMID: 35350358 PMCID: PMC8945064 DOI: 10.1021/acsomega.1c06805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/23/2022] [Indexed: 06/14/2023]
Abstract
Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure-activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.
Collapse
Affiliation(s)
- Vishal Dey
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
| | - Raghu Machiraju
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| | - Xia Ning
- Department
of Computer Science and Engineering, The
Ohio State University, Columbus, Ohio 43210, United States
- Biomedical
Informatics, The Ohio State University, Columbus, Ohio 43210, United States
- Translational
Data Analytics Institute, The Ohio State
University, Columbus, Ohio 43210, United
States
| |
Collapse
|
39
|
Wang Y, Gu Y, Lou C, Gong Y, Wu Z, Li W, Tang Y, Liu G. A multitask GNN-based interpretable model for discovery of selective JAK inhibitors. J Cheminform 2022; 14:16. [PMID: 35292114 PMCID: PMC8922399 DOI: 10.1186/s13321-022-00593-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 02/26/2022] [Indexed: 11/10/2022] Open
Abstract
The Janus kinase (JAK) family plays a pivotal role in most cytokine-mediated inflammatory and autoimmune responses via JAK/STAT signaling, and administration of JAK inhibitors is a promising therapeutic strategy for several diseases including COVID-19. However, to screen and design selective JAK inhibitors is a daunting task due to the extremely high homology among four JAK isoforms. In this study, we aimed to simultaneously predict pIC50 values of compounds for all JAK subtypes by constructing an interpretable GNN multitask regression model. The final model performance was positive, with R2 values of 0.96, 0.79 and 0.78 on the training, validation and test sets, respectively. Meanwhile, we calculated and visualized atom weights, followed by the rank sum tests and local mean comparisons to obtain key atoms and substructures that could be fine-tuned to design selective JAK inhibitors. Several successful case studies have demonstrated that our approach is feasible and our model could learn the interactions between proteins and small molecules well, which could provide practitioners with a novel way to discover and design JAK inhibitors with selectivity.
Collapse
Affiliation(s)
- Yimeng Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yaxin Gu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Chaofeng Lou
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yuning Gong
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| |
Collapse
|
40
|
Yang Z, Zhong W, Lv Q, Chen CYC. Multitask deep learning with dynamic task balancing for quantum mechanical properties prediction. Phys Chem Chem Phys 2022; 24:5383-5393. [PMID: 35169821 DOI: 10.1039/d1cp05172e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Predicting quantum mechanical properties (QMPs) is very important for the innovation of material and chemistry science. Multitask deep learning models have been widely used in QMPs prediction. However, existing multitask learning models often train multiple QMPs prediction tasks simultaneously without considering the internal relationships and differences between tasks, which may cause the model to overfit easy tasks. In this study, we first proposed a multiscale dynamic attention graph neural network (MDGNN) for molecular representation learning. The MDGNN was designed in a multitask learning fashion that can solve multiple learning tasks at the same time. We then introduced a dynamic task balancing (DTB) strategy combining task differences and difficulties to reduce overfitting across multiple tasks. Finally, we adopted gradient-weighted class activation mapping (Grad-CAM) to analyze a deep learning model for frontier molecular orbital, highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) energy level predictions. We evaluated our approach using two large QMPs datasets and compared the proposed method to the state-of-the-art multitask learning models. The MDGNN outperforms other multitask learning approaches on two datasets. The DTB strategy can further improve the performance of MDGNN significantly. Moreover, we show that Grad-CAM creates explanations that are consistent with the molecular orbitals theory. These advantages demonstrate that the proposed method improves the generalization and interpretation capability of QMPs prediction modeling.
Collapse
Affiliation(s)
- Ziduo Yang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China.
| | - Weihe Zhong
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China.
| | - Qiujie Lv
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China.
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China. .,Department of Medical Research, China Medical University Hospital, Taichung 40447, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan
| |
Collapse
|
41
|
Wu J, Lan C, Mei Z, Chen X, Zhu Y, Hu H, Diao Y. Transfer learning with molecular graph convolutional networks for accurate modelling and representation of bioactivities of ligands targeting GPCRs without sufficient data. Comput Biol Chem 2022; 98:107664. [DOI: 10.1016/j.compbiolchem.2022.107664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 02/23/2022] [Accepted: 03/06/2022] [Indexed: 11/29/2022]
|
42
|
Brown BP, Vu O, Geanes AR, Kothiwale S, Butkiewicz M, Lowe EW, Mueller R, Pape R, Mendenhall J, Meiler J. Introduction to the BioChemical Library (BCL): An Application-Based Open-Source Toolkit for Integrated Cheminformatics and Machine Learning in Computer-Aided Drug Discovery. Front Pharmacol 2022; 13:833099. [PMID: 35264967 PMCID: PMC8899505 DOI: 10.3389/fphar.2022.833099] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 01/24/2022] [Indexed: 01/31/2023] Open
Abstract
The BioChemical Library (BCL) cheminformatics toolkit is an application-based academic open-source software package designed to integrate traditional small molecule cheminformatics tools with machine learning-based quantitative structure-activity/property relationship (QSAR/QSPR) modeling. In this pedagogical article we provide a detailed introduction to core BCL cheminformatics functionality, showing how traditional tasks (e.g., computing chemical properties, estimating druglikeness) can be readily combined with machine learning. In addition, we have included multiple examples covering areas of advanced use, such as reaction-based library design. We anticipate that this manuscript will be a valuable resource for researchers in computer-aided drug discovery looking to integrate modular cheminformatics and machine learning tools into their pipelines.
Collapse
Affiliation(s)
- Benjamin P. Brown
- Chemical and Physical Biology Program, Medical Scientist Training Program, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- *Correspondence: Jens Meiler, ; Jeffrey Mendenhall, ; Benjamin P. Brown,
| | - Oanh Vu
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Alexander R. Geanes
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Sandeepkumar Kothiwale
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Mariusz Butkiewicz
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Edward W. Lowe
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Ralf Mueller
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Richard Pape
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Jeffrey Mendenhall
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- *Correspondence: Jens Meiler, ; Jeffrey Mendenhall, ; Benjamin P. Brown,
| | - Jens Meiler
- Department of Chemistry, Departments of Pharmacology and Biomedical Informatics, Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Institute for Drug Discovery, Leipzig University Medical School, Leipzig, Germany
- *Correspondence: Jens Meiler, ; Jeffrey Mendenhall, ; Benjamin P. Brown,
| |
Collapse
|
43
|
Nakarin F, Boonpalit K, Kinchagawat J, Wachiraphan P, Rungrotmongkol T, Nutanong S. Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation. Molecules 2022; 27:molecules27041226. [PMID: 35209011 PMCID: PMC8878292 DOI: 10.3390/molecules27041226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022] Open
Abstract
A multitargeted therapeutic approach with hybrid drugs is a promising strategy to enhance anticancer efficiency and overcome drug resistance in nonsmall cell lung cancer (NSCLC) treatment. Estimating affinities of small molecules against targets of interest typically proceeds as a preliminary action for recent drug discovery in the pharmaceutical industry. In this investigation, we employed machine learning models to provide a computationally affordable means for computer-aided screening to accelerate the discovery of potential drug compounds. In particular, we introduced a quantitative structure–activity-relationship (QSAR)-based multitask learning model to facilitate an in silico screening system of multitargeted drug development. Our method combines a recently developed graph-based neural network architecture, principal neighborhood aggregation (PNA), with a descriptor-based deep neural network supporting synergistic utilization of molecular graph and fingerprint features. The model was generated by more than ten-thousands affinity-reported ligands of seven crucial receptor tyrosine kinases in NSCLC from two public data sources. As a result, our multitask model demonstrated better performance than all other benchmark models, as well as achieving satisfying predictive ability regarding applicable QSAR criteria for most tasks within the model’s applicability. Since our model could potentially be a screening tool for practical use, we have provided a model implementation platform with a tutorial that is freely accessible hence, advising the first move in a long journey of cancer drug development.
Collapse
Affiliation(s)
- Fahsai Nakarin
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
- Correspondence: ; Tel.: +66-33-014-444
| | - Kajjana Boonpalit
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Jiramet Kinchagawat
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Patcharapol Wachiraphan
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| | - Thanyada Rungrotmongkol
- Center of Excellence in Biocatalyst and Sustainable Biotechnology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sarana Nutanong
- School of Information Science and Technology, Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong 21210, Thailand; (K.B.); (J.K.); (P.W.); (S.N.)
| |
Collapse
|
44
|
Oguike OE, Ugwuishiwu CH, Asogwa CN, Nnadi CO, Obonga WO, Attama AA. Systematic review on the application of machine learning to quantitative structure-activity relationship modeling against Plasmodium falciparum. Mol Divers 2022; 26:3447-3462. [PMID: 35064444 PMCID: PMC8782692 DOI: 10.1007/s11030-022-10380-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 01/07/2022] [Indexed: 11/29/2022]
Abstract
Malaria accounts for over two million deaths globally. To flatten this curve, there is a need to develop new and high potent drugs against Plasmodium falciparum. Some major challenges include the dearth of suitable animal models for anti-P. falciparum assays, resistance to first-line drugs, lack of vaccines and the complex life cycle of Plasmodium. Gladly, newer approaches to antimalarial drug discovery have emerged due to the release of large datasets by pharmaceutical companies. This review provides insights into these new approaches to drug discovery covering different machine learning tools, which enhance the development of new compounds. It provides a systematic review on the use and prospects of machine learning in predicting, classifying and clustering IC50 values of bioactive compounds against P. falciparum. The authors identified many machine learning tools yet to be applied for this purpose. However, Random Forest and Support Vector Machines have been extensively applied though on a limited dataset of compounds.
Collapse
Affiliation(s)
- Osondu Everestus Oguike
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Chikodili Helen Ugwuishiwu
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Caroline Ngozi Asogwa
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Computer Science, Faculty of Physical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Charles Okeke Nnadi
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria. .,Deprtment of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.
| | - Wilfred Ofem Obonga
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Deprtment of Pharmaceutical and Medicinal Chemistry, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| | - Anthony Amaechi Attama
- Machine Learning Research Group, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria.,Department of Pharmaceutics, Faculty of Pharmaceutical Sciences, University of Nigeria, Nsukka, 410001, Enugu State, Nigeria
| |
Collapse
|
45
|
Abstract
Quantitative structure-activity relationship (QSAR) models are routinely applied computational tools in the drug discovery process. QSAR models are regression or classification models that predict the biological activities of molecules based on the features derived from their molecular structures. These models are usually used to prioritize a list of candidate molecules for future laboratory experiments and to help chemists gain better insights into how structural changes affect a molecule's biological activities. Developing accurate and interpretable QSAR models is therefore of the utmost importance in the drug discovery process. Deep neural networks, which are powerful supervised learning algorithms, have shown great promise for addressing regression and classification problems in various research fields, including the pharmaceutical industry. In this chapter, we briefly review the applications of deep neural networks in QSAR modeling and describe commonly used techniques to improve model performance.
Collapse
|
46
|
Korolev VV, Nevolin YM, Manz TA, Protsenko PV. Parametrization of Nonbonded Force Field Terms for Metal-Organic Frameworks Using Machine Learning Approach. J Chem Inf Model 2021; 61:5774-5784. [PMID: 34787430 DOI: 10.1021/acs.jcim.1c01124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
The enormous structural and chemical diversity of metal-organic frameworks (MOFs) forces researchers to actively use simulation techniques as often as experiments. MOFs are widely known for their outstanding adsorption properties, so a precise description of the host-guest interactions is essential for high-throughput screening aimed at ranking the most promising candidates. However, highly accurate ab initio calculations cannot be routinely applied to model thousands of structures due to the demanding computational costs. Furthermore, methods based on force field (FF) parametrization suffer from low transferability. To resolve this accuracy-efficiency dilemma, we applied a machine learning (ML) approach: extreme gradient boosting. The trained models reproduced the atom-in-material quantities, including partial charges, polarizabilities, dispersion coefficients, quantum Drude oscillator, and electron cloud parameters, with accuracy similar to the reference data set. The aforementioned FF precursors make it possible to thoroughly describe noncovalent interactions typical for MOF-adsorbate systems: electrostatic, dispersion, polarization, and short-range repulsion. The presented approach can also readily facilitate hybrid atomistic simulation/ML workflows.
Collapse
Affiliation(s)
- Vadim V Korolev
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
| | - Yuriy M Nevolin
- Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow 119071, Russia
| | - Thomas A Manz
- Department of Chemical & Materials Engineering, New Mexico State University, Las Cruces, New Mexico 88003-8001, United States
| | - Pavel V Protsenko
- Department of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
| |
Collapse
|
47
|
Grebner C, Matter H, Hessler G. Artificial Intelligence in Compound Design. Methods Mol Biol 2021; 2390:349-382. [PMID: 34731477 DOI: 10.1007/978-1-0716-1787-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
Artificial intelligence has seen an incredibly fast development in recent years. Many novel technologies for property prediction of drug molecules as well as for the design of novel molecules were introduced by different research groups. These artificial intelligence-based design methods can be applied for suggesting novel chemical motifs in lead generation or scaffold hopping as well as for optimization of desired property profiles during lead optimization. In lead generation, broad sampling of the chemical space for identification of novel motifs is required, while in the lead optimization phase, a detailed exploration of the chemical neighborhood of a current lead series is advantageous. These different requirements for successful design outcomes render different combinations of artificial intelligence technologies useful. Overall, we observe that a combination of different approaches with tailored scoring and evaluation schemes appears beneficial for efficient artificial intelligence-based compound design.
Collapse
Affiliation(s)
- Christoph Grebner
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Hans Matter
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany
| | - Gerhard Hessler
- Sanofi-Aventis Deutschland GmbH, R&D, Integrated Drug Discovery, Frankfurt am Main, Germany.
| |
Collapse
|
48
|
Machine Learning for In Silico ADMET Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:447-460. [PMID: 34731482 DOI: 10.1007/978-1-0716-1787-8_20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
ADMET (absorption, distribution, metabolism, excretion, and toxicity) describes a drug molecule's pharmacokinetics and pharmacodynamics properties. ADMET profile of a bioactive compound can impact its efficacy and safety. Moreover, efficacy and safety are considered some of the major causes of clinical attrition in the development of new chemical entities. In past decades, various machine learning or quantitative structure-activity relationship (QSAR) methods have been successfully integrated in the modeling of ADMET. Recent advances have been made in the collection of data and the development of various in silico methods to assess and predict ADMET of bioactive compounds in the early stages of drug discovery and development process.
Collapse
|
49
|
Parsimonious Optimization of Multitask Neural Network Hyperparameters. Molecules 2021; 26:molecules26237254. [PMID: 34885837 PMCID: PMC8658836 DOI: 10.3390/molecules26237254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/17/2021] [Accepted: 11/25/2021] [Indexed: 11/29/2022] Open
Abstract
Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure–Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search.
Collapse
|
50
|
Castro LHE, Sant'Anna CMR. Molecular Modeling Techniques Applied to the Design of Multitarget Drugs: Methods and Applications. Curr Top Med Chem 2021; 22:333-346. [PMID: 34844540 DOI: 10.2174/1568026621666211129140958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/23/2021] [Accepted: 10/28/2021] [Indexed: 11/22/2022]
Abstract
Multifactorial diseases, such as cancer and diabetes present a challenge for the traditional "one-target, one disease" paradigm due to their complex pathogenic mechanisms. Although a combination of drugs can be used, a multitarget drug may be a better choice face of its efficacy, lower adverse effects and lower chance of resistance development. The computer-based design of these multitarget drugs can explore the same techniques used for single-target drug design, but the difficulties associated to the obtention of drugs that are capable of modulating two or more targets with similar efficacy impose new challenges, whose solutions involve the adaptation of known techniques and also to the development of new ones, including machine-learning approaches. In this review, some SBDD and LBDD techniques for the multitarget drug design are discussed, together with some cases where the application of such techniques led to effective multitarget ligands.
Collapse
Affiliation(s)
| | - Carlos Mauricio R Sant'Anna
- Programa de Pós-Graduação em Química, Instituto de Química, Universidade Federal Rural do Rio de Janeiro, Seropédica. Brazil
| |
Collapse
|