1
|
Prediction of drug protein interactions based on variable scale characteristic pyramid convolution network. Methods 2023; 211:42-47. [PMID: 36804213 DOI: 10.1016/j.ymeth.2023.02.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/31/2022] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
MOTIVATION In the process of drug screening, it is significant to improve the accuracy of drug-target binding affinity prediction. A multilayer convolutional neural network is one of the most popular existing methods for predicting affinity based on deep learning. It uses multiple convolution layers to extract features from the simplified molecular input system (SMILES) strings of the compounds and amino acid sequences of proteins and then performs affinity prediction analysis. However, the semantic information contained in low-level features can gradually be lost due to the increasing network depth, which affects the prediction performance. RESULT We propose a novel method called the Pyramid Network Convolution Drug-Target Binding Affinity (PCNN-DTA) method for drug-target binding affinity prediction. The PCNN-DTA method, which is based on a feature pyramid network (FPN), fuses the features extracted from each layer of a multilayer convolution network to retain more low-level feature information, thus improving the prediction accuracy. PCNN-DTA is compared with other typical algorithms on three benchmark datasets, namely, the KIBA, Davis, and Binding DB datasets. Experimental results show that the PCNN-DTA method is superior to existing regression prediction methods using convolutional neural networks, which further demonstrates its effectiveness.
Collapse
|
2
|
Pinel P, Guichaoua G, Najm M, Labouille S, Drizard N, Gaston-Mathé Y, Hoffmann B, Stoven V. Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. Mol Inform 2023; 42:e2200216. [PMID: 36633361 DOI: 10.1002/minf.202200216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/19/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.
Collapse
Affiliation(s)
- Philippe Pinel
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France.,Iktos SAS, 75017, Paris, France
| | - Gwenn Guichaoua
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | - Matthieu Najm
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | | | | | | | | | - Véronique Stoven
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| |
Collapse
|
3
|
Hamre J, Jafri MS. Optimizing peptide inhibitors of SARS-Cov-2 nsp10/nsp16 methyltransferase predicted through molecular simulation and machine learning. INFORMATICS IN MEDICINE UNLOCKED 2022; 29:100886. [PMID: 35252541 PMCID: PMC8883729 DOI: 10.1016/j.imu.2022.100886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/04/2022] [Accepted: 02/16/2022] [Indexed: 11/30/2022] Open
Abstract
Coronaviruses, including the recent pandemic strain SARS-Cov-2, use a multifunctional 2′-O-methyltransferase (2′-O-MTase) to restrict the host defense mechanism and to methylate RNA. The nonstructural protein 16 2′-O-MTase (nsp16) becomes active when nonstructural protein 10 (nsp10) and nsp16 interact. Novel peptide drugs have shown promise in the treatment of numerous diseases and new research has established that nsp10 derived peptides can disrupt viral methyltransferase activity via interaction of nsp16. This study had the goal of optimizing new analogous nsp10 peptides that have the ability to bind nsp16 with equal to or higher affinity than those naturally occurring. The following research demonstrates that in silico molecular simulations can shed light on peptide structures and predict the potential of new peptides to interrupt methyltransferase activity via the nsp10/nsp16 interface. The simulations suggest that misalignments at residues F68, H80, I81, D94, and Y96 or rotation at H80 abrogate MTase function. We develop a new set of peptides based on conserved regions of the nsp10 protein in the Coronaviridae species and test these to known MTase variant values. This results in the prediction that the H80R variant is a solid new candidate for potential new testing. We envision that this new lead is the beginning of a reputable foundation of a new computational method that combats coronaviruses and that is beneficial for new peptide drug development.
Collapse
|
4
|
Najm M, Azencott CA, Playe B, Stoven V. Drug Target Identification with Machine Learning: How to Choose Negative Examples. Int J Mol Sci 2021; 22:ijms22105118. [PMID: 34066072 PMCID: PMC8151112 DOI: 10.3390/ijms22105118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/30/2021] [Accepted: 05/07/2021] [Indexed: 11/24/2022] Open
Abstract
Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.
Collapse
Affiliation(s)
- Matthieu Najm
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
- Correspondence:
| | - Chloé-Agathe Azencott
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| | - Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| | - Véronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France; (C.-A.A.); (B.P.); (V.S.)
- Institut Curie, 75248 Paris, France
- INSERM U900, 75428 Paris, France
| |
Collapse
|
5
|
Mulligan VK. The emerging role of computational design in peptide macrocycle drug discovery. Expert Opin Drug Discov 2020; 15:833-852. [PMID: 32345066 DOI: 10.1080/17460441.2020.1751117] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Drug discovery is a laborious process with rising cost per new drug. Peptide macrocycles are promising therapeutics, though conformational flexibility can reduce target affinity and specificity. Recent computational advancements address this problem by enabling rational design of rigidly folded peptide macrocycles. AREAS COVERED This review summarizes currently approved peptide macrocycle therapeutics and discusses advantages of mesoscale drugs over small molecules or protein therapeutics. It describes the history, rationale, and state of the art of computational tools, such as Rosetta, that allow the design of rigidly structured peptide macrocycles. The emerging pipeline for designing peptide macrocycle drugs is described, including current challenges in designing permeable molecules that can emulate the chameleonic behavior of natural macrocycles. Prospects for reducing computational cost and improving accuracy with emerging computational technologies are also discussed. EXPERT OPINION To embrace computational design of peptide macrocycle drugs, we must shift current attitudes regarding the role of computation in drug discovery, and move beyond Lipinski's rules. This technology has the potential to shift failures to earlier in silico stages of the drug discovery process, improving success rates in costly clinical trials. Given the available tools, now is the time for drug developers to incorporate peptide macrocycle design into drug discovery pipelines.
Collapse
Affiliation(s)
- Vikram K Mulligan
- Systems Biology, Center for Computational Biology, Flatiron Institute , New York, NY, USA
| |
Collapse
|
6
|
Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J Cheminform 2020; 12:11. [PMID: 33431042 PMCID: PMC7011501 DOI: 10.1186/s13321-020-0413-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023] Open
Abstract
Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75248, Paris, France
| | - Veronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France. .,Institut Curie, 75248, Paris, France. .,INSERM U900, 75248, Paris, France.
| |
Collapse
|
7
|
Abbasi K, Poso A, Ghasemi J, Amanlou M, Masoudi-Nejad A. Deep Transferable Compound Representation across Domains and Tasks for Low Data Drug Discovery. J Chem Inf Model 2019; 59:4528-4539. [PMID: 31661955 DOI: 10.1021/acs.jcim.9b00626] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The main problem of small molecule-based drug discovery is to find a candidate molecule with increased pharmacological activity, proper ADME, and low toxicity. Recently, machine learning has driven a significant contribution to drug discovery. However, many machine learning methods, such as deep learning-based approaches, require a large amount of training data to form accurate predictions for unseen data. In lead optimization step, the amount of available biological data on small molecule compounds is low, which makes it a challenging problem to apply machine learning methods. The main goal of this study is to design a new approach to handle these situations. To this end, source assay (auxiliary assay) knowledge is utilized to learn a better model to predict the property of new compounds in the target assay. Up to now, the current approaches did not consider that source and target assays are adapted to different target groups with different compounds distribution. In this paper, we propose a new architecture by utilizing graph convolutional network and adversarial domain adaptation network to tackle this issue. To evaluate the proposed approach, we applied it to Tox21, ToxCast, SIDER, HIV, and BACE collections. The results showed the effectiveness of the proposed approach in transferring the related knowledge from source to target data set.
Collapse
Affiliation(s)
- Karim Abbasi
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics , University of Tehran , Tehran 1417614411 , Iran
| | - Antti Poso
- School of Pharmacy, Faculty of Health Sciences , University of Eastern Finland , Kuopio 80100 , Finland
| | - Jahanbakhsh Ghasemi
- Chemistry Department, Faculty of Sciences , University of Tehran , Tehran 1417614418 , Iran
| | - Massoud Amanlou
- Drug Design and Development Research Center, Department of Medicinal Chemistry , Tehran University of Medical Sciences , Tehran 1416753955 , Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics , University of Tehran , Tehran 1417614411 , Iran
| |
Collapse
|
8
|
A Multi-Label Learning Framework for Drug Repurposing. Pharmaceutics 2019; 11:pharmaceutics11090466. [PMID: 31505805 PMCID: PMC6781509 DOI: 10.3390/pharmaceutics11090466] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 08/22/2019] [Accepted: 09/05/2019] [Indexed: 01/10/2023] Open
Abstract
Drug repurposing plays an important role in screening old drugs for new therapeutic efficacy. The existing methods commonly treat prediction of drug-target interaction as a problem of binary classification, in which a large number of randomly sampled drug-target pairs accounting for over 50% of the entire training dataset are necessarily required. Such a large number of negative examples that do not come from experimental observations inevitably decrease the credibility of predictions. In this study, we propose a multi-label learning framework to find new uses for old drugs and discover new drugs for known target genes. In the framework, each drug is treated as a class label and its target genes are treated as the class-specific training data to train a supervised learning model of l2-regularized logistic regression. As such, the inter-drug associations are explicitly modelled into the framework and all the class-specific training data come from experimental observations. In addition, the data constraint is less demanding, for instance, the chemical substructures of a drug are no longer needed and the novel target genes are inferred only from the underlying patterns of the known genes targeted by the drug. Stratified multi-label cross-validation shows that 84.9% of known target genes have at least one drug correctly recognized, and the proposed framework correctly recognizes 86.73% of the independent test drug-target interactions (DTIs) from DrugBank. These results show that the proposed framework could generalize well in the large drug/class space without the information of drug chemical structures and target protein structures. Furthermore, we use the trained model to predict new drugs for the known target genes, identify new genes for the old drugs, and infer new associations between old drugs and new disease phenotypes via the OMIM database. Gene ontology (GO) enrichment analyses and the disease associations reported in recent literature provide supporting evidences to the computational results, which potentially shed light on new clinical therapies for new and/or old disease phenotypes.
Collapse
|