1
|
Karasev DA, Sobolev BN, Filimonov DA, Lagunin A. Prediction of viral protease inhibitors using proteochemometrics approach. Comput Biol Chem 2024; 110:108061. [PMID: 38574417 DOI: 10.1016/j.compbiolchem.2024.108061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/21/2024] [Accepted: 03/23/2024] [Indexed: 04/06/2024]
Abstract
Being widely accepted tools in computational drug search, the (Q)SAR methods have limitations related to data incompleteness. The proteochemometrics (PCM) approach expands the applicability area by using description for both protein and ligand structures. The PCM algorithms are urgently required for the development of new antiviral agents. We suggest the PCM method using the TLMNA descriptors, combining the MNA descriptors of ligands and protein sequence N-grams. Our method was validated on the viral chymotrypsin-like proteases and their ligands. We have developed an original protocol allowing us to collect a comprehensive set of 15 protein sequences and more than 9000 ligands from the ChEMBL database. The N-grams were derived from the 3D-based alignment, accurately superposing ligand-binding regions. In testing the ligand set in SAR mode with MNA descriptors, an accuracy above 0.95 was determined that shows the perspective of the antiviral drug search in virtual chemical libraries. The effective PCM models were built with the TLMNA descriptor. The strong validation procedure with pair exclusion simulated the prediction of interactions between the new ligands and new targets, resulting in accuracy estimation up to 0.89. The PCM approach shows slightly lower accuracy caused by more uncertainty compared with SAR, but it overcomes the problem of data incompleteness.
Collapse
Affiliation(s)
- Dmitry A Karasev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia.
| | - Boris N Sobolev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Dmitry A Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow 117997, Russia
| |
Collapse
|
2
|
Lee I, Nam H. Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform 2022; 14:5. [PMID: 35135622 PMCID: PMC8822694 DOI: 10.1186/s13321-022-00584-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open
Abstract
Identifying drug-target interactions (DTIs) is important for drug discovery. However, searching all drug-target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model's performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein-ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence-ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub ( https://github.com/GIST-CSBL/HoTS ).
Collapse
Affiliation(s)
- Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| |
Collapse
|
3
|
Rasti B. Quantitative Characterization of the Chemical Space Governed by Human Carbonic Anhydrases and selenium-containing derivatives of solfonamides. BRAZ J PHARM SCI 2022. [DOI: 10.1590/s2175-97902022e19704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
4
|
|
5
|
Bongers BJ, IJzerman AP, Van Westen GJP. Proteochemometrics - recent developments in bioactivity and selectivity modeling. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:89-98. [PMID: 33386099 DOI: 10.1016/j.ddtec.2020.08.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/12/2023]
Abstract
Proteochemometrics is a machine learning based modeling approach relying on a combination of ligand and protein descriptors. With ongoing developments in machine learning and increases in public data the technique is more frequently applied in early drug discovery, typically in ligand-target binding prediction. Common applications include improvements to single target quantitative structure-activity relationship models, protein selectivity and promiscuity modeling, and large-scale deep learning approaches. The increase in predictive power using proteochemometrics is observed in multi-target bioactivity modeling, opening the door to more extensive studies covering whole protein families. On top of that, with deep learning fueling more complex and larger scale models, proteochemometrics allows faster and higher quality computational models supporting the design, make, test cycle.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P Van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|
6
|
Rasti B, Mazraedoost S, Panahi H, Falahati M, Attar F. New insights into the selective inhibition of the β-carbonic anhydrases of pathogenic bacteria Burkholderia pseudomallei and Francisella tularensis: a proteochemometrics study. Mol Divers 2018; 23:263-273. [PMID: 30120657 DOI: 10.1007/s11030-018-9869-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 08/09/2018] [Indexed: 10/28/2022]
Abstract
Nowadays, antibiotic resistance has turned into one of the most important worldwide health problems. Biological end point of critical enzymes induced by potent inhibitors is recently being considered as a highly effective and popular strategy to defeat antibiotic-resistant pathogens. For instance, the simple but critical β-carbonic anhydrase has recently been in the center of attention for anti-pathogen drug discoveries. However, no β-carbonic anhydrase selective inhibitor has yet been developed. Available β-carbonic anhydrase inhibitors are also highly potent with regard to human carbonic anhydrases, leading to severe inevitable side effects in case of usage. Therefore, developing novel inhibitors with high selectivity against pathogenic β-carbonic anhydrases is of great essence. Herein, for the first time, we have conducted a proteochemometric study to explore the structural and the chemical aspects of the interactions governed by bacterial β-carbonic anhydrases and their inhibitors. We have found valuable information which can lead to designing novel inhibitors with better selectivity for bacterial β-carbonic anhydrases.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Microbiology, Faculty of Basic Sciences, Lahijan Branch, Islamic Azad University (IAU), Lahijan, Guilan, Iran.
| | - Sargol Mazraedoost
- Department of Microbiology, Faculty of Basic Sciences, Lahijan Branch, Islamic Azad University (IAU), Lahijan, Guilan, Iran
| | - Hanieh Panahi
- Department of Mathematics and Statistics, Lahijan Branch, Islamic Azad University, Lahijan, Iran
| | - Mojtaba Falahati
- Department of Nanotechnology, Faculty of Advance Science and Technology, Pharmaceutical Sciences Branch, Islamic Azad University (IAUPS), Tehran, Iran
| | - Farnoosh Attar
- Department of Biology, Faculty of Food Industry and Agriculture, Standard Research Institute (SRI), Karaj, Iran
| |
Collapse
|
7
|
Rasti B, Heravi YE. Probing the chemical interaction space governed by 4-aminosubstituted benzenesulfonamides and carbonic anhydrase isoforms. Res Pharm Sci 2018; 13:192-204. [PMID: 29853929 PMCID: PMC5921400 DOI: 10.4103/1735-5362.228940] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Isoform diversity, critical physiological roles and involvement in major diseases/disorders such as glaucoma, epilepsy, Alzheimer's disease, obesity, and cancers have made carbonic anhydrase (CA), one of the most interesting case studies in the field of computer aided drug design. Since applying non-selective inhibitors can result in major side effects, there have been considerable efforts so far to achieve selective inhibitors for different isoforms of CA. Using proteochemometrics approach, the chemical interaction space governed by a group of 4-amino-substituted benzenesulfonamides and human CAs has been explored in the present study. Several validation methods have been utilized to assess the validity, robustness and predictivity power of the proposed proteochemometric model. Our model has offered major structural information that can be applied to design new selective inhibitors for distinct isoforms of CA. To prove the applicability of the proposed model, new compounds have been designed based on the offered discriminative structural features.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Microbiology, Faculty of Basic Sciences, Lahijan Branch, Islamic Azad University (IAU), Lahijan, Guilan, I.R. Iran
| | | |
Collapse
|
8
|
Nazarshodeh E, Sheikhpour R, Gharaghani S, Sarram MA. A novel proteochemometrics model for predicting the inhibition of nine carbonic anhydrase isoforms based on supervised Laplacian score and k-nearest neighbour regression. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:419-437. [PMID: 29882433 DOI: 10.1080/1062936x.2018.1447995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 02/28/2018] [Indexed: 06/08/2023]
Abstract
Carbonic anhydrases (CAs) are essential enzymes in biological processes. Prediction of the activity of compounds towards CA isoforms could be evaluated by computational techniques to discover a novel therapeutic inhibitor. Studies such as quantitative structure-activity relationships (QSARs), molecular docking and pharmacophore modelling have been carried out to design potent inhibitors. Unfortunately, QSAR does not consider the information of target space in the model. We successfully developed an in silico proteochemometrics model that simultaneously uses target and ligand descriptors to predict the activities of CA inhibitors. Herein, a strong predictive model was built for the prediction of protein-ligand binding affinity between nine human CA isoforms and 549 ligands. We applied descriptors obtained from the PROFEAT webserver for the proteins. Ligands were encoded by descriptors from PaDEL-Descriptor software. Supervised Laplacian score (SLS) and particle swarm optimization were used for feature selection. Models were derived using k-nearest neighbour (KNN) regression and a kernel smoother model. The predictive ability of the models was evaluated by an external validation test. Statistical results (Q2ext = 0.7806, r2test = 0.7811 and RMSEtest = 0.5549) showed that the model generated using SLS and KNN regression outperformed the other models. Consequently, the selectivity of compounds towards these enzymes will be predicted prior to synthesis.
Collapse
Affiliation(s)
- E Nazarshodeh
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - R Sheikhpour
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| | - S Gharaghani
- a Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics , University of Tehran , Tehran , Iran
| | - M A Sarram
- b Department of Computer Engineering , Yazd University , Yazd , Iran
| |
Collapse
|
9
|
Rasti B, Shahangian SS. Proteochemometric modeling of the origin of thymidylate synthase inhibition. Chem Biol Drug Des 2018; 91:1007-1016. [PMID: 29251822 DOI: 10.1111/cbdd.13163] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Revised: 11/09/2017] [Accepted: 12/01/2017] [Indexed: 12/11/2022]
Affiliation(s)
- Behnam Rasti
- Department of Microbiology; Faculty of Basic Sciences; Lahijan Branch; Islamic Azad University (IAU); Lahijan Guilan Iran
| | | |
Collapse
|
10
|
Nazarshodeh E, Gharaghani S. Toward a hierarchical virtual screening and toxicity risk analysis for identifying novel CA XII inhibitors. Biosystems 2017; 162:35-43. [PMID: 28899791 DOI: 10.1016/j.biosystems.2017.09.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 09/06/2017] [Accepted: 09/07/2017] [Indexed: 12/13/2022]
Abstract
Carbonic anhydrase isoform XII (CA XII) is a potential target for cancer treatment. In this study, pharmacophore modeling, hierarchical virtual screening, and toxicity risk analysis were performed for identifying novel CA XII inhibitors. A pharmacophore model of two classes of CA XII inhibitors was generated. The pharmacophore model indicated the important features of inhibitors for the binding with the CA XII. The model was then utilized to screen the ZINC and CoCoCo databases for retrieving potential hit compounds of CA XII. For accurate conclusions about the selectivity of inhibitors, the retrieved molecules which obey of Lipinski's rule of five (RO5) and have no toxicity risk were docked in a CA XII 3D structure by smina. Finally, on the basis of binding affinity and the binding mode of the molecules, twelve molecules were prioritized as promising hits. It should be noted that two of hits H5 and H6 were previously reported in the CHEMBL database. This hierarchical method is worthy of reducing the time and using almost all information available. The final hits may be used as a lead to discovery novel CA XII inhibitors.
Collapse
Affiliation(s)
- Elmira Nazarshodeh
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
11
|
Subramanian V, Ain QU, Henno H, Pietilä LO, Fuchs JE, Prusis P, Bender A, Wohlfahrt G. 3D proteochemometrics: using three-dimensional information of proteins and ligands to address aspects of the selectivity of serine proteases. MEDCHEMCOMM 2017; 8:1037-1045. [PMID: 30108817 PMCID: PMC6072133 DOI: 10.1039/c6md00701e] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 03/14/2017] [Indexed: 11/21/2022]
Abstract
The high similarity between certain sub-pockets of serine proteases may lead to low selectivity of protease inhibitors. Therefore the application of proteochemometrics (PCM), which quantifies the relationship between protein/ligand descriptors and affinity for multiple ligands and targets simultaneously, is useful to understand and improve the selectivity profiles of potential inhibitors. In this study, protein field-based PCM that uses knowledge-based and WaterMap derived fields to describe proteins in combination with 2D (RDKit and MOE fingerprints) and 3D (4 point pharmacophoric fingerprints and GRIND) ligand descriptors was used to model the bioactivities of 24 homologous serine proteases and 5863 inhibitors in an integrated fashion. Of the multiple field-based PCM models generated based on different ligand descriptors, RDKit fingerprints showed the best performance in terms of external prediction with Rtest2 of 0.72 and RMSEP of 0.81. Further, visual interpretation of the models highlights sub-pocket specific regions that influence affinity and selectivity of serine protease inhibitors.
Collapse
Affiliation(s)
- Vigneshwari Subramanian
- Division of Pharmaceutical Chemistry and Technology , Faculty of Pharmacy , University of Helsinki , 00014 Helsinki , Finland
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Qurrat Ul Ain
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
| | - Helena Henno
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Lars-Olof Pietilä
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Julian E Fuchs
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
- Institute of General , Inorganic and Theoretical Chemistry , University of Innsbruck , Innrain 82 , 6020 Innsbruck , Austria
| | - Peteris Prusis
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| | - Andreas Bender
- Centre for Molecular Informatics , Department of Chemistry , Lensfield Road , CB2 1EW Cambridge , UK
| | - Gerd Wohlfahrt
- Computer-Aided Drug Design , Orion Pharma , Orionintie 1 , 02101 Espoo , Finland .
| |
Collapse
|
12
|
Rasti B, Schaduangrat N, Shahangian SS, Nantasenamat C. Exploring the origin of phosphodiesterase inhibition via proteochemometric modeling. RSC Adv 2017. [DOI: 10.1039/c7ra02332d] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
A proteochemometric study of a set of phosphodiesterase 4B and 4D inhibitors sheds light on the origin of their inhibition and selectivities.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Microbiology
- Faculty of Basic Sciences
- Lahijan Branch
- Islamic Azad University (IAU)
- Lahijan
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics
- Faculty of Medical Technology
- Mahidol University
- Bangkok 10700
- Thailand
| | - S. Shirin Shahangian
- Department of Biology
- Faculty of Sciences
- University of Guilan
- Rasht 41938-33697
- Iran
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics
- Faculty of Medical Technology
- Mahidol University
- Bangkok 10700
- Thailand
| |
Collapse
|
13
|
Rasti B, Namazi M, Karimi-Jafari MH, Ghasemi JB. Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure-based and Sequence-based Descriptors. Mol Inform 2016; 36. [PMID: 27860295 DOI: 10.1002/minf.201600102] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 10/26/2016] [Indexed: 11/08/2022]
Abstract
Due to its physiological and clinical roles, carbonic anhydrase (CA) is one of the most interesting case studies. There are different classes of CAinhibitors including sulfonamides, polyamines, coumarins and dithiocarbamates (DTCs). However, many of them hardly act as a selective inhibitor against a specific isoform. Therefore, finding highly selective inhibitors for different isoforms of CA is still an ongoing project. Proteochemometrics modeling (PCM) is able to model the bioactivity of multiple compounds against different isoforms of a protein. Therefore, it would be extremely applicable when investigating the selectivity of different ligands towards different receptors. Given the facts, we applied PCM to investigate the interaction space and structural properties that lead to the selective inhibition of CA isoforms by some dithiocarbamates. Our models have provided interesting structural information that can be considered to design compounds capable of inhibiting different isoforms of CA in an improved selective manner. Validity and predictivity of the models were confirmed by both internal and external validation methods; while Y-scrambling approach was applied to assess the robustness of the models. To prove the reliability and the applicability of our findings, we showed how ligands-receptors selectivity can be affected by removing any of these critical findings from the modeling process.
Collapse
Affiliation(s)
- Behnam Rasti
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Mohsen Namazi
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - M H Karimi-Jafari
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Jahan B Ghasemi
- Department of Analytical Chemistry, School of Chemistry, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|