1
|
Zhang C, Jørgensen FS, van de Weert M, Bjerregaard S, Rantanen J, Yang M. Amino acids as stabilizers for lysozyme during the spray-drying process and storage. Int J Pharm 2024; 659:124217. [PMID: 38734275 DOI: 10.1016/j.ijpharm.2024.124217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/07/2024] [Accepted: 05/08/2024] [Indexed: 05/13/2024]
Abstract
Amino acids (AAs) have been used as excipients in protein formulations both in solid and liquid state products due to their stabilizing effect. However, the mechanisms by which they can stabilize a protein have not been fully elucidated yet. The purpose of this study was to investigate the effect of AAs with distinct physicochemical properties on the stability of a model protein (lysozyme, LZM) during the spray-drying process and subsequent storage. Molecular descriptor based multivariate data analysis was used to select distinct AAs from the group of 20 natural AAs. Then, LZM and the five selected AAs (1:1 wt ratio) were spray-dried (SD). The solid form, residual moisture content (RMC), hygroscopicity, morphology, secondary/tertiary structure and enzymatic activity of LZM were evaluated before and after storage under 40 °C/75 % RH for 30 days. Arginine (Arg), leucine (Leu), glycine (Gly), tryptophan (Trp), aspartic acid (Asp) were selected because of their distinct properties by using principal component analysis (PCA). The SD LZM powders containing Arg, Trp, or Asp were amorphous, while SD LZM powders containing Leu or Gly were crystalline. Recrystallization of Arg, Trp, Asp and polymorph transition of Gly were observed after the storage under accelerated conditions. The morphologies of the SD particles vary upon the different AAs formulated with LZM, implying different drying kinetics of the five model systems. A tertiary structural change of LZM was observed in the SD powder containing Arg, while a decrease in the enzymatic activity of LZM was observed in the powders containing Arg or Asp after the storage. This can be attributed to the extremely basic and acidic conditions that Arg and Asp create, respectively. This study suggests that when AAs are used as stabilizers instead of traditional disaccharides, not only do classic vitrification theory and water replacement theory play a role, but the microenvironmental pH conditions created by basic or acidic AAs in the starting solution or during the storage of solid matter are also crucial for the stability of SD protein products.
Collapse
Affiliation(s)
- Chengqian Zhang
- Department of Pharmacy, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | - Jukka Rantanen
- Department of Pharmacy, University of Copenhagen, Copenhagen, Denmark
| | - Mingshi Yang
- Department of Pharmacy, University of Copenhagen, Copenhagen, Denmark; Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang, China.
| |
Collapse
|
2
|
Karolak A, Urbaniak K, Monastyrskyi A, Duckett DR, Branciamore S, Stewart PA. Structure-independent machine-learning predictions of the CDK12 interactome. Biophys J 2024:S0006-3495(24)00344-8. [PMID: 38762754 DOI: 10.1016/j.bpj.2024.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 04/24/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024] Open
Abstract
Cyclin-dependent kinase 12 (CDK12) is a critical regulatory protein involved in transcription and DNA repair processes. Dysregulation of CDK12 has been implicated in various diseases, including cancer. Understanding the CDK12 interactome is pivotal for elucidating its functional roles and potential therapeutic targets. Traditional methods for interactome prediction often rely on protein structure information, limiting applicability to CDK12 characterized by partly disordered terminal C region. In this study, we present a structure-independent machine-learning model that utilizes proteins' sequence and functional data to predict the CDK12 interactome. This approach is motivated by the disordered character of the CDK12 C-terminal region mitigating a structure-driven search for binding partners. Our approach incorporates multiple data sources, including protein-protein interaction networks, functional annotations, and sequence-based features, to construct a comprehensive CDK12 interactome prediction model. The ability to predict CDK12 interactions without relying on structural information is a significant advancement, as many potential interaction partners may lack crystallographic data. In conclusion, our structure-independent machine-learning model presents a powerful tool for predicting the CDK12 interactome and holds promise in advancing our understanding of CDK12 biology, identifying potential therapeutic targets, and facilitating precision-medicine approaches for CDK12-associated diseases.
Collapse
Affiliation(s)
| | - Konstancja Urbaniak
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | | | - Derek R Duckett
- Department of Drug Discovery, Moffitt Cancer Center, Tampa, Florida
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | - Paul A Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida
| |
Collapse
|
3
|
Huang J, Osthushenrich T, MacNamara A, Mälarstig A, Brocchetti S, Bradberry S, Scarabottolo L, Ferrada E, Sosnin S, Digles D, Superti-Furga G, Ecker GF. ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction. RSC Adv 2024; 14:13083-13094. [PMID: 38655474 PMCID: PMC11034476 DOI: 10.1039/d4ra00748d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.
Collapse
Affiliation(s)
- Jiahui Huang
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Tanja Osthushenrich
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Aidan MacNamara
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Anders Mälarstig
- Emerging Science & Innovation, Pfizer Worldwide Research, Development and Medical Cambridge MA USA
| | | | | | | | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Sergey Sosnin
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| |
Collapse
|
4
|
Wang MQ, You ZN, Yang BY, Xia ZW, Chen Q, Pan J, Li CX, Xu JH. Machine-Learning-Guided Engineering of an NADH-Dependent 7β-Hydroxysteroid Dehydrogenase for Economic Synthesis of Ursodeoxycholic Acid. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:19672-19681. [PMID: 38016669 DOI: 10.1021/acs.jafc.3c06339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Enzymatic synthesis of ursodeoxycholic acid (UDCA) catalyzed by an NADH-dependent 7β-hydroxysteroid dehydrogenase (7β-HSDH) is more economic compared with an NADPH-dependent 7β-HSDH when considering the much higher cost of NADP+/NADPH than that of NAD+/NADH. However, the poor catalytic performance of NADH-dependent 7β-HSDH significantly limits its practical applications. Herein, machine-learning-guided protein engineering was performed on an NADH-dependent Rt7β-HSDHM0 from Ruminococcus torques. We combined random forest, Gaussian Naïve Bayes classifier, and Gaussian process regression with limited experimental data, resulting in the best variant Rt7β-HSDHM3 (R40I/R41K/F94Y/S196A/Y253F) with improvements in specific activity and half-life (40 °C) by 4.1-fold and 8.3-fold, respectively. The preparative biotransformation using a "two stage in one pot" sequential process coupled with Rt7β-HSDHM3 exhibited a space-time yield (STY) of 192 g L-1 d-1, which is so far the highest productivity for the biosynthesis of UDCA from chenodeoxycholic acid (CDCA) with NAD+ as a cofactor. More importantly, the cost of raw materials for the enzymatic production of UDCA employing Rt7β-HSDHM3 decreased by 22% in contrast to that of Rt7β-HSDHM0, indicating the tremendous potential of the variant Rt7β-HSDHM3 for more efficient and economic production of UDCA.
Collapse
Affiliation(s)
- Mu-Qiang Wang
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Zhi-Neng You
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Bing-Yi Yang
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Zi-Wei Xia
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Qi Chen
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Jiang Pan
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Chun-Xiu Li
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Jian-He Xu
- Laboratory of Biocatalysis and Synthetic Biotechnology, State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
- Shanghai Collaborative Innovation Center for Biomanufacturing, School of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| |
Collapse
|
5
|
Kumar A, Rana PS. A deep learning based ensemble approach for protein allergen classification. PeerJ Comput Sci 2023; 9:e1622. [PMID: 37869456 PMCID: PMC10588724 DOI: 10.7717/peerj-cs.1622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/07/2023] [Indexed: 10/24/2023]
Abstract
In recent years, the increased population has led to an increase in the demand for various industrially processed edibles and other consumable products. These industries regularly alter the proteins found in raw materials to generate more commercially viable end-products in order to keep up with consumer demand. These modifications result in a substance that may cause allergic reactions in consumers, thereby creating a protein allergen. The detection of such proteins in various substances is essential for the prevention, diagnosis and treatment of allergic conditions. Bioinformatics and computational methods can be used to analyze the information contained in amino-acid sequences to detect possible allergens. The article presents a deep learning based ensemble approach to identify protein allergens using Extra Tree, Deep Belief Network (DBN), and CatBoost models. The proposed ensemble model achieves higher detection accuracy by combining the prediction results of the three models using majority voting. The evaluation of the proposed model was carried out on the benchmark protein allergen dataset, and the performance analysis revealed that the proposed model outperforms the other state-of-the-art literature techniques with a protein allergen detection accuracy of 89.16%.
Collapse
Affiliation(s)
- Arun Kumar
- Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| | - Prashant Singh Rana
- Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| |
Collapse
|
6
|
Gorostiola González M, van den Broek RL, Braun TGM, Chatzopoulou M, Jespers W, IJzerman AP, Heitman LH, van Westen GJP. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. J Cheminform 2023; 15:74. [PMID: 37641107 PMCID: PMC10463931 DOI: 10.1186/s13321-023-00745-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/10/2023] [Indexed: 08/31/2023] Open
Abstract
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein sequence and neglect the dynamic nature of proteins. This dynamic nature can be computationally simulated with molecular dynamics (MD). Here, novel 3D dynamic protein descriptors (3DDPDs) were designed to be applied in bioactivity prediction tasks with PCM models. As a test case, publicly available G protein-coupled receptor (GPCR) MD data from GPCRmd was used. GPCRs are membrane-bound proteins, which are activated by hormones and neurotransmitters, and constitute an important target family for drug discovery. GPCRs exist in different conformational states that allow the transmission of diverse signals and that can be modified by ligand interactions, among other factors. To translate the MD-encoded protein dynamics two types of 3DDPDs were considered: one-hot encoded residue-specific (rs) and embedding-like protein-specific (ps) 3DDPDs. The descriptors were developed by calculating distributions of trajectory coordinates and partial charges, applying dimensionality reduction, and subsequently condensing them into vectors per residue or protein, respectively. 3DDPDs were benchmarked on several PCM tasks against state-of-the-art non-dynamic protein descriptors. Our rs- and ps3DDPDs outperformed non-dynamic descriptors in regression tasks using a temporal split and showed comparable performance with a random split and in all classification tasks. Combinations of non-dynamic descriptors with 3DDPDs did not result in increased performance. Finally, the power of 3DDPDs to capture dynamic fluctuations in mutant GPCRs was explored. The results presented here show the potential of including protein dynamic information on machine learning tasks, specifically bioactivity prediction, and open opportunities for applications in drug discovery, including oncology.
Collapse
Affiliation(s)
- Marina Gorostiola González
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Remco L van den Broek
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Thomas G M Braun
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Magdalini Chatzopoulou
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
- ONCODE Institute, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
7
|
Jagota M, Ye C, Albors C, Rastogi R, Koehl A, Ioannidis N, Song YS. Cross-protein transfer learning substantially improves disease variant prediction. Genome Biol 2023; 24:182. [PMID: 37550700 PMCID: PMC10408151 DOI: 10.1186/s13059-023-03024-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/27/2023] [Indexed: 08/09/2023] Open
Abstract
BACKGROUND Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.
Collapse
Affiliation(s)
- Milind Jagota
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Chengzhong Ye
- Department of Statistics, University of California, Berkeley, 94720, CA, USA
| | - Carlos Albors
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Ruchir Rastogi
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
| | - Antoine Koehl
- Department of Statistics, University of California, Berkeley, 94720, CA, USA
| | - Nilah Ioannidis
- Computer Science Division, University of California, Berkeley, 94720, CA, USA
- Chan Zuckerberg Biohub, San Francisco, 94158, CA, USA
- Center for Computational Biology, University of California, Berkeley, 94720, CA, USA
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, 94720, CA, USA.
- Department of Statistics, University of California, Berkeley, 94720, CA, USA.
- Center for Computational Biology, University of California, Berkeley, 94720, CA, USA.
| |
Collapse
|
8
|
Bournez C, Riool M, de Boer L, Cordfunke RA, de Best L, van Leeuwen R, Drijfhout JW, Zaat SAJ, van Westen GJP. CalcAMP: A New Machine Learning Model for the Accurate Prediction of Antimicrobial Activity of Peptides. Antibiotics (Basel) 2023; 12:antibiotics12040725. [PMID: 37107088 PMCID: PMC10135148 DOI: 10.3390/antibiotics12040725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/24/2023] [Accepted: 03/31/2023] [Indexed: 04/29/2023] Open
Abstract
To combat infection by microorganisms host organisms possess a primary arsenal via the innate immune system. Among them are defense peptides with the ability to target a wide range of pathogenic organisms, including bacteria, viruses, parasites, and fungi. Here, we present the development of a novel machine learning model capable of predicting the activity of antimicrobial peptides (AMPs), CalcAMP. AMPs, in particular short ones (<35 amino acids), can become an effective solution to face the multi-drug resistance issue arising worldwide. Whereas finding potent AMPs through classical wet-lab techniques is still a long and expensive process, a machine learning model can be useful to help researchers to rapidly identify whether peptides present potential or not. Our prediction model is based on a new data set constructed from the available public data on AMPs and experimental antimicrobial activities. CalcAMP can predict activity against both Gram-positive and Gram-negative bacteria. Different features either concerning general physicochemical properties or sequence composition have been assessed to retrieve higher prediction accuracy. CalcAMP can be used as an promising prediction asset to identify short AMPs among given peptide sequences.
Collapse
Affiliation(s)
- Colin Bournez
- Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
| | - Martijn Riool
- Department of Medical Microbiology and Infection Prevention, Amsterdam Institute for Infection and Immunity, Amsterdam UMC, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Leonie de Boer
- Department of Medical Microbiology and Infection Prevention, Amsterdam Institute for Infection and Immunity, Amsterdam UMC, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Robert A Cordfunke
- Department Immunology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Leonie de Best
- Madam Therapeutics B.V., Pivot Park Life Sciences Community, Kloosterstraat 9, 5349 AB Oss, The Netherlands
| | - Remko van Leeuwen
- Madam Therapeutics B.V., Pivot Park Life Sciences Community, Kloosterstraat 9, 5349 AB Oss, The Netherlands
| | - Jan Wouter Drijfhout
- Department Immunology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Sebastian A J Zaat
- Department of Medical Microbiology and Infection Prevention, Amsterdam Institute for Infection and Immunity, Amsterdam UMC, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands
| | - Gerard J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA Leiden, The Netherlands
| |
Collapse
|
9
|
Atas Guvenilir H, Doğan T. How to approach machine learning-based prediction of drug/compound-target interactions. J Cheminform 2023; 15:16. [PMID: 36747300 PMCID: PMC9901167 DOI: 10.1186/s13321-023-00689-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/30/2023] [Indexed: 02/08/2023] Open
Abstract
The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.
Collapse
Affiliation(s)
- Heval Atas Guvenilir
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey
- Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey
| | - Tunca Doğan
- Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.
- Institute of Informatics, Hacettepe University, Ankara, Turkey.
- Department of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, Ankara, Turkey.
| |
Collapse
|
10
|
Lin J, Wen L, Zhou Y, Wang S, Ye H, Su J, Li J, Shu J, Huang J, Zhou P. PepQSAR: a comprehensive data source and information platform for peptide quantitative structure-activity relationships. Amino Acids 2023; 55:235-242. [PMID: 36474016 DOI: 10.1007/s00726-022-03219-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022]
Abstract
Peptide quantitative structure-activity relationships (pQSARs) have been widely applied to the statistical modeling and empirical prediction of peptide activity, property and feature. In the procedure, the peptide structure is characterized at sequence level using amino acid descriptors (AADs) and then correlated with observations by machine learning methods (MLMs), consequently resulting in a variety of quantitative regression models used to explain the structural factors that govern peptide activities, to generalize peptide properties of unknown from known samples, and to design new peptides with desired features. In this study, we developed a comprehensive platform, termed PepQSAR database, which is a systematic collection and decomposition of various data sources and abundant information regarding the pQSARs, including AADs, MLMs, data sets, peptide sequences, measured activities, model statistics, and literatures. The database also provides a comparison function for the various previously built pQSAR models reported by different groups via distinct approaches. The structured and searchable PepQSAR database is expected to provide a useful resource and powerful tool for the computational peptidology community, which is freely available at http://i.uestc.edu.cn/PQsarDB .
Collapse
Affiliation(s)
- Jing Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Li Wen
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Yuwei Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Shaozhou Wang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Haiyang Ye
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jun Su
- College of Music, Chengdu Normal University, Chengdu, 611130, China
| | - Juelin Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jianping Shu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jian Huang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China.
| | - Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China.
| |
Collapse
|
11
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
12
|
Janairo JIB. Machine Learning Model for Biomimetic Chromatography Peptide Ligands. ACS APPLIED BIO MATERIALS 2022; 5:5264-5269. [PMID: 36265018 DOI: 10.1021/acsabm.2c00684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Purification is an essential part of antibody production, which are important therapeutic biomolecules. Common methods of antibody purification rely on affinity chromatography (AC), wherein whole proteins are oftentimes used as ligands to catch the antibodies to be purified. While AC has been successful in purifying antibodies, it is associated with multiple challenges such as high cost and low stability, among others. A promising alternative is using short peptide sequences in place of whole proteins as the stationary phase for the chromatographic separation of the antibodies. In an effort to accelerate the discovery and development of short peptides for biomimetic chromatography, this study reports the creation of a machine learning classification which was trained and tested on 480 tetrapeptides. The optimized logistic regression model uses Cruciani properties as the input variables and can categorize peptides into one of two classes based on their binding affinity with immunoglobulin G (IgG). The externally validated model demonstrates satisfactory predictive performance and excellent discrimination as demonstrated by performance metrics such as AUC = 0.874, Balanced Accuracy = 0.874, F1 = 0.871, Precision = 0.884, and Recall = 0.859. Apart from this, the classifier has also provided valuable insights into important variables that influence the classification, such as electrostatic and hydrophobic interactions. Overall, the classifier can be regarded as a welcome development for biomimetic chromatography and is the first study that aims to integrate machine learning in the biomimetic chromatography peptide development process.
Collapse
Affiliation(s)
- Jose Isagani B Janairo
- Department of Biology, De La Salle University, 2401 Taft Avenue, 0922Manila, Philippines
| |
Collapse
|
13
|
Liu Q, van der Stel W, van der Noord VE, Leegwater H, Coban B, Elbertse K, Pruijs JTM, Béquignon OJM, van Westen G, Dévédec SEL, Danen EHJ. Hypoxia Triggers TAZ Phosphorylation in Basal A Triple Negative Breast Cancer Cells. Int J Mol Sci 2022; 23:ijms231710119. [PMID: 36077517 PMCID: PMC9456181 DOI: 10.3390/ijms231710119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 08/31/2022] [Accepted: 09/02/2022] [Indexed: 12/02/2022] Open
Abstract
Hypoxia and HIF signaling drive cancer progression and therapy resistance and have been demonstrated in breast cancer. To what extent breast cancer subtypes differ in their response to hypoxia has not been resolved. Here, we show that hypoxia similarly triggers HIF1 stabilization in luminal and basal A triple negative breast cancer cells and we use high throughput targeted RNA sequencing to analyze its effects on gene expression in these subtypes. We focus on regulation of YAP/TAZ/TEAD targets and find overlapping as well as distinct target genes being modulated in luminal and basal A cells under hypoxia. We reveal a HIF1 mediated, basal A specific response to hypoxia by which TAZ, but not YAP, is phosphorylated at Ser89. While total YAP/TAZ localization is not affected by hypoxia, hypoxia drives a shift of [p-TAZ(Ser89)/p-YAP(Ser127)] from the nucleus to the cytoplasm in basal A but not luminal breast cancer cells. Cell fractionation and YAP knock-out experiments confirm cytoplasmic sequestration of TAZ(Ser89) in hypoxic basal A cells. Pharmacological and genetic interference experiments identify c-Src and CDK3 as kinases involved in such phosphorylation of TAZ at Ser89 in hypoxic basal A cells. Hypoxia attenuates growth of basal A cells and the effect of verteporfin, a disruptor of YAP/TAZ-TEAD–mediated transcription, is diminished under those conditions, while expression of a TAZ-S89A mutant does not confer basal A cells with a growth advantage under hypoxic conditions, indicating that other hypoxia regulated pathways suppressing cell growth are dominant.
Collapse
|
14
|
Lertampaiporn S, Hongsthong A, Wattanapornprom W, Thammarongtham C. Ensemble-AHTPpred: A Robust Ensemble Machine Learning Model Integrated With a New Composite Feature for Identifying Antihypertensive Peptides. Front Genet 2022; 13:883766. [PMID: 35571042 PMCID: PMC9096110 DOI: 10.3389/fgene.2022.883766] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
- *Correspondence: Chinae Thammarongtham,
| |
Collapse
|
15
|
Janairo JIB. A Machine Learning Classification Model for Gold-Binding Peptides. ACS OMEGA 2022; 7:14069-14073. [PMID: 35559171 PMCID: PMC9089360 DOI: 10.1021/acsomega.2c00640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/31/2022] [Indexed: 06/15/2023]
Abstract
There has been growing interest in using peptides for the controlled synthesis of nanomaterials. Peptides play a crucial role not only in regulating the nanostructure formation process but also in influencing the resulting properties of the nanomaterials. Leveraging machine learning (ML) in the biomimetic workflow is anticipated to accelerate peptide discovery, make the process more resource-efficient, and unravel associations among attributes that may be useful in peptide design. In this study, a binary ML classifier is formulated that was trained and tested on 1720 peptide examples. The support vector machine classifier uses Kidera factors to categorize peptides into one of two groups based on their binding ability. The classifier exhibits satisfactory performance, as demonstrated by various performance metrics. In addition, key variables that bear a huge impact on the model were identified, such as peptide hydrophobicity. As these trends were derived from a large and diverse dataset, the insights drawn from the data are expected to be generalizable and robust. Thus, the presented ML model is an important step toward the rational and predictive peptide design.
Collapse
|
16
|
Li W, Sun T, Li M, He Y, Li L, Wang L, Wang H, Li J, Wen H, Liu Y, Chen Y, Fan Y, Xin B, Zhang J. GNIFdb: a neoantigen intrinsic feature database for glioma. Database (Oxford) 2022; 2022:6527499. [PMID: 35150127 PMCID: PMC9216533 DOI: 10.1093/database/baac004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 01/06/2022] [Accepted: 01/29/2022] [Indexed: 12/24/2022]
Abstract
ABSTRACT Neoantigens are mutation-containing immunogenic peptides from tumor cells. Neoantigen intrinsic features are neoantigens' sequence-associated features characterized by different amino acid descriptors and physical-chemical properties, which have a crucial function in prioritization of neoantigens with immunogenic potentials and predicting patients with better survival. Different intrinsic features might have functions to varying degrees in evaluating neoantigens' potentials of immunogenicity. Identification and comparison of intrinsic features among neoantigens are particularly important for developing neoantigen-based personalized immunotherapy. However, there is still no public repository to host the intrinsic features of neoantigens. Therefore, we developed GNIFdb, a glioma neoantigen intrinsic feature database specifically designed for hosting, exploring and visualizing neoantigen and intrinsic features. The database provides a comprehensive repository of computationally predicted Human leukocyte antigen class I (HLA-I) restricted neoantigens and their intrinsic features; a systematic annotation of neoantigens including sequence, neoantigen-associated mutation, gene expression, glioma prognosis, HLA-I subtype and binding affinity between neoantigens and HLA-I; and a genome browser to visualize them in an interactive manner. It represents a valuable resource for the neoantigen research community and is publicly available at http://www.oncoimmunobank.cn/index.php. DATABASE URL http://www.oncoimmunobank.cn/index.php.
Collapse
Affiliation(s)
- Wendong Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Ting Sun
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Muyang Li
- Department of Plant Genetics and Breeding, State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, China Agricultural University, No.17 Qinghua East Road, Haidian District, Beijing 100193, P. R. China
| | - Yufei He
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Lin Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Lu Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Haoyu Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Jing Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Hao Wen
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yong Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yifan Chen
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yubo Fan
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Beibei Xin
- Department of Plant Genetics and Breeding, State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, China Agricultural University, No.17 Qinghua East Road, Haidian District, Beijing 100193, P. R. China
| | - Jing Zhang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| |
Collapse
|
17
|
Lee I, Nam H. Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform 2022; 14:5. [PMID: 35135622 PMCID: PMC8822694 DOI: 10.1186/s13321-022-00584-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open
Abstract
Identifying drug–target interactions (DTIs) is important for drug discovery. However, searching all drug–target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model’s performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein–ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence–ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub (https://github.com/GIST-CSBL/HoTS).
Collapse
Affiliation(s)
- Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005, Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
18
|
Unsupervised Representation Learning for Proteochemometric Modeling. Int J Mol Sci 2021; 22:ijms222312882. [PMID: 34884688 PMCID: PMC8657702 DOI: 10.3390/ijms222312882] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/26/2021] [Indexed: 11/18/2022] Open
Abstract
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.
Collapse
|
19
|
Tam C, Kumar A, Zhang KYJ. NbX: Machine Learning-Guided Re-Ranking of Nanobody-Antigen Binding Poses. Pharmaceuticals (Basel) 2021; 14:ph14100968. [PMID: 34681192 PMCID: PMC8537642 DOI: 10.3390/ph14100968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/17/2021] [Accepted: 09/21/2021] [Indexed: 12/02/2022] Open
Abstract
Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.
Collapse
Affiliation(s)
- Chunlai Tam
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
| | - Kam Y. J. Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
- Correspondence:
| |
Collapse
|
20
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
21
|
Machine Learning for the Cleaner Production of Antioxidant Peptides. Int J Pept Res Ther 2021. [DOI: 10.1007/s10989-021-10232-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
Cretin G, Galochkina T, de Brevern AG, Gelly JC. PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction. Int J Mol Sci 2021; 22:ijms22168831. [PMID: 34445537 PMCID: PMC8396346 DOI: 10.3390/ijms22168831] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 02/07/2023] Open
Abstract
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to protein structure alignment and protein structure prediction. In the current study, we present a new model, PYTHIA (predicting any conformation at high accuracy), for the prediction of the protein local conformations in terms of PBs directly from the amino acid sequence. PYTHIA is based on a deep residual inception-inside-inception neural network with convolutional block attention modules, predicting 1 of 16 PB classes from evolutionary information combined to physicochemical properties of individual amino acids. PYTHIA clearly outperforms the LOCUSTRA reference method for all PB classes and demonstrates great performance for PB prediction on particularly challenging proteins from the CASP14 free modelling category.
Collapse
Affiliation(s)
- Gabriel Cretin
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G. de Brevern
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- Correspondence:
| |
Collapse
|
23
|
Bo W, Chen L, Qin D, Geng S, Li J, Mei H, Li B, Liang G. Application of quantitative structure-activity relationship to food-derived peptides: Methods, situations, challenges and prospects. Trends Food Sci Technol 2021. [DOI: 10.1016/j.tifs.2021.05.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Sharma G, Rana PS, Bawa S. Hybrid Machine Learning Models for Predicting Types of Human T-cell Lymphotropic Virus. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1524-1534. [PMID: 31567100 DOI: 10.1109/tcbb.2019.2944610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Life threatening diseases like adult T-cell leukemia, neurodegenerative diseases, and demyelinating diseases such as HTLV-1 based myelopathy/tropical spastic paraparesis (HAM/TSP), hypocalcaemia, and bone lesions are caused by a group of human retrovirus known as Human T-cell Lymphotropic virus (HTLV). Out of the four different types of HTLVs, HTLV-1 is most prominent in scourging over 20 million people around the world and still not much effort has been made in understanding the epidemiology and controlling the prevalence of this virus. This condition further worsens when most of the infected cases remain asymptomatic throughout their lifetime due to the limited diagnostic methods; that are most of the times unavailable for timely detection of infected individuals. Moreover, at present, there is no licensed vaccination for HTLV-1 infection. Therefore, there is a need to develop the faster and efficient diagnostic method for the detection of HTLV-1. Influenced from the outcomes of the machine learning techniques in the field of bio-informatics, this is the first study in which 64 hybrid machine learning techniques have been proposed for the prediction of different type of HTLVs (HTLV-1, HTLV-2, and HTLV-3). The hybrid techniques are built by permutation and combination of four classification methods, four feature weighting, and four feature selection techniques. The proposed hybrid models when evaluated on the basis of various model evaluation parameters are found to be capable of efficiently predicting the type of HTLVs. The best hybrid model has been identified by having accuracy, an AUROC value, and F1 score of 99.85 percent, 0.99, and 0.99, respectively. This kind of the system can assist the current diagnostic system for the detection of HTLV-1 as after the molecular diagnostics of HTLV by various screening tests like enzyme-linked immunoassay or particle agglutination assays there is always a need of confirmatory tests like western blotting, immuno-fluorescence assay, or radio-immuno-precipitation assay for distinguishing HTLV-1 from HTLV-2. These confirmatory tests are indeed very complex analytical techniques involving various steps. The proposed hybrid techniques can be used to support and verify the results of confirmatory test from the protein mixture. Furthermore, better insights about the virus can be obtained by exploring the physicochemical properties of the protein sequences of HTLVs.
Collapse
|
25
|
Abbasi WA, Abbas SA, Andleeb S. PANDA: Predicting the change in proteins binding affinity upon mutations by finding a signal in primary structures. J Bioinform Comput Biol 2021; 19:2150015. [PMID: 34126874 DOI: 10.1142/s0219720021500153] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately determining a change in protein binding affinity upon mutations is important to find novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be supported with computational methods. Most of the available computational prediction techniques depend upon protein structures that bound their applicability to only protein complexes with recognized 3D structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation and question the effectiveness of [Formula: see text]-fold cross-validation (CV) across mutations adopted in previous studies to assess the generalization ability of such predictors with no known mutation during training. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA performs comparably to the existing methods gauged through an appropriate CV scheme and an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods. We made PANDA easily accessible through a cloud-based webserver and python code available at https://sites.google.com/view/wajidarshad/software and https://github.com/wajidarshad/panda, respectively.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Syed Ali Abbas
- Computational Biology and Data Analysis Lab., Department of Computer Sciences & Information Technology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| | - Saiqa Andleeb
- Biotechnology Lab., Department of Zoology, King Abdullah Campus, University of Azad Jammu & Kashmir, Muzaffarabad, AJ&K 13100, Pakistan
| |
Collapse
|
26
|
Meher PK, Mohapatra A, Satpathy S, Sharma A, Saini I, Pradhan SK, Rai A. PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel. PLANT METHODS 2021; 17:46. [PMID: 33902670 PMCID: PMC8074503 DOI: 10.1186/s13007-021-00744-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 04/07/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.
Collapse
Affiliation(s)
| | - Ansuman Mohapatra
- Orissa University of Agriculture and Technology, Bhubaneswar, Odisha India
| | - Subhrajit Satpathy
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anuj Sharma
- Uttarakhand Council for Biotechnology, Pantnagar, Uttarakhand India
| | - Isha Saini
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
27
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
28
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
29
|
Vander Meersche Y, Cretin G, de Brevern AG, Gelly JC, Galochkina T. MEDUSA: Prediction of Protein Flexibility from Sequence. J Mol Biol 2021; 433:166882. [PMID: 33972018 DOI: 10.1016/j.jmb.2021.166882] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 02/12/2021] [Accepted: 02/13/2021] [Indexed: 12/11/2022]
Abstract
Information on the protein flexibility is essential to understand crucial molecular mechanisms such as protein stability, interactions with other molecules and protein functions in general. B-factor obtained in the X-ray crystallography experiments is the most common flexibility descriptor available for the majority of the resolved protein structures. Since the gap between the number of the resolved protein structures and available protein sequences is continuously growing, it is important to provide computational tools for protein flexibility prediction from amino acid sequence. In the current study, we report a Deep Learning based protein flexibility prediction tool MEDUSA (https://www.dsimb.inserm.fr/MEDUSA). MEDUSA uses evolutionary information extracted from protein homologous sequences and amino acid physico-chemical properties as input for a convolutional neural network to assign a flexibility class to each protein sequence position. Trained on a non-redundant dataset of X-ray structures, MEDUSA provides flexibility prediction in two, three and five classes. MEDUSA is freely available as a web-server providing a clear visualization of the prediction results as well as a standalone utility (https://github.com/DSIMB/medusa). Analysis of the MEDUSA output allows a user to identify the potentially highly deformable protein regions and general dynamic properties of the protein.
Collapse
Affiliation(s)
- Yann Vander Meersche
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Gabriel Cretin
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G de Brevern
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France.
| | - Tatiana Galochkina
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France.
| |
Collapse
|
30
|
Kooistra AJ, Mordalski S, Pándy-Szekeres G, Esguerra M, Mamyrbekov A, Munk C, Keserű GM, Gloriam D. GPCRdb in 2021: integrating GPCR sequence, structure and function. Nucleic Acids Res 2021; 49:D335-D343. [PMID: 33270898 PMCID: PMC7778909 DOI: 10.1093/nar/gkaa1080] [Citation(s) in RCA: 215] [Impact Index Per Article: 71.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Revised: 10/20/2020] [Accepted: 10/22/2020] [Indexed: 01/27/2023] Open
Abstract
G protein-coupled receptors (GPCRs) form both the largest family of membrane proteins and drug targets, mediating the action of one-third of medicines. The GPCR database, GPCRdb serves >4 000 researchers every month and offers reference data, analysis of own or literature data, experiment design and dissemination of published datasets. Here, we describe new and updated GPCRdb resources with a particular focus on integration of sequence, structure and function. GPCRdb contains all human non-olfactory GPCRs (and >27 000 orthologs), G-proteins and arrestins. It includes over 2 000 drug and in-trial agents and nearly 200 000 ligands with activity and availability data. GPCRdb annotates all published GPCR structures (updated monthly), which are also offered in a refined version (with re-modeled missing/distorted regions and reverted mutations) and provides structure models of all human non-olfactory receptors in inactive, intermediate and active states. Mutagenesis data in the GPCRdb spans natural genetic variants, GPCR-G protein interfaces, ligand sites and thermostabilising mutations. A new sequence signature tool for identification of functional residue determinants has been added and two data driven tools to design ligand site mutations and constructs for structure determination have been updated extending their coverage of receptors and modifications. The GPCRdb is available at https://gpcrdb.org.
Collapse
Affiliation(s)
- Albert J Kooistra
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Stefan Mordalski
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Gáspár Pándy-Szekeres
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
- Medicinal Chemistry Research Group, Research Center for Natural Sciences, Budapest H-1117, Hungary
| | - Mauricio Esguerra
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Alibek Mamyrbekov
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Christian Munk
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - György M Keserű
- Medicinal Chemistry Research Group, Research Center for Natural Sciences, Budapest H-1117, Hungary
| | - David E Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| |
Collapse
|
31
|
|
32
|
Burggraaff L, Lenselink EB, Jespers W, van Engelen J, Bongers BJ, González MG, Liu R, Hoos HH, van Vlijmen HWT, IJzerman AP, van Westen GJP. Successive Statistical and Structure-Based Modeling to Identify Chemically Novel Kinase Inhibitors. J Chem Inf Model 2020; 60:4283-4295. [PMID: 32343143 PMCID: PMC7525794 DOI: 10.1021/acs.jcim.9b01204] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
![]()
Kinases are frequently
studied in the context of anticancer drugs.
Their involvement in cell responses, such as proliferation, differentiation,
and apoptosis, makes them interesting subjects in multitarget drug
design. In this study, a workflow is presented that models the bioactivity
spectra for two panels of kinases: (1) inhibition of RET, BRAF, SRC,
and S6K, while avoiding inhibition of MKNK1, TTK, ERK8, PDK1, and
PAK3, and (2) inhibition of AURKA, PAK1, FGFR1, and LKB1, while avoiding
inhibition of PAK3, TAK1, and PIK3CA. Both statistical and structure-based
models were included, which were thoroughly benchmarked and optimized.
A virtual screening was performed to test the workflow for one of
the main targets, RET kinase. This resulted in 5 novel and chemically
dissimilar RET inhibitors with remaining RET activity of <60% (at
a concentration of 10 μM) and similarities with known RET inhibitors
from 0.18 to 0.29 (Tanimoto, ECFP6). The four more potent inhibitors
were assessed in a concentration range and proved to be modestly active
with a pIC50 value of 5.1 for the most active compound.
The experimental validation of inhibitors for RET strongly indicates
that the multitarget workflow is able to detect novel inhibitors for
kinases, and hence, this workflow can potentially be applied in polypharmacology
modeling. We conclude that this approach can identify new chemical
matter for existing targets. Moreover, this workflow can easily be
applied to other targets as well.
Collapse
Affiliation(s)
- Lindsey Burggraaff
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Eelke B Lenselink
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Willem Jespers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Department of Cell and Molecular Biology, Uppsala University, Uppsala 75124, Sweden
| | - Jesper van Engelen
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Brandon J Bongers
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Marina Gorostiola González
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Rongfang Liu
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Holger H Hoos
- Department of Computer Science, Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
| | - Herman W T van Vlijmen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.,Janssen Research & Development, Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Adriaan P IJzerman
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| |
Collapse
|
33
|
Le T, Winter R, Noé F, Clevert DA. Neuraldecipher - reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci 2020; 11:10378-10389. [PMID: 34094299 PMCID: PMC8162443 DOI: 10.1039/d0sc03115a] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/10/2020] [Indexed: 12/22/2022] Open
Abstract
Protecting molecular structures from disclosure against external parties is of great relevance for industrial and private associations, such as pharmaceutical companies. Within the framework of external collaborations, it is common to exchange datasets by encoding the molecular structures into descriptors. Molecular fingerprints such as the extended-connectivity fingerprints (ECFPs) are frequently used for such an exchange, because they typically perform well on quantitative structure-activity relationship tasks. ECFPs are often considered to be non-invertible due to the way they are computed. In this paper, we present a fast reverse-engineering method to deduce the molecular structure given revealed ECFPs. Our method includes the Neuraldecipher, a neural network model that predicts a compact vector representation of compounds, given ECFPs. We then utilize another pre-trained model to retrieve the molecular structure as SMILES representation. We demonstrate that our method is able to reconstruct molecular structures to some extent, and improves, when ECFPs with larger fingerprint sizes are revealed. For example, given ECFP count vectors of length 4096, we are able to correctly deduce up to 69% of molecular structures on a validation set (112 K unique samples) with our method.
Collapse
Affiliation(s)
- Tuan Le
- Department of Digital Technologies, Bayer AG Berlin Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin Berlin Germany
| | - Robin Winter
- Department of Digital Technologies, Bayer AG Berlin Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin Berlin Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin Berlin Germany
| | | |
Collapse
|
34
|
Deep Learning Modeling of Androgen Receptor Responses to Prostate Cancer Therapies. Int J Mol Sci 2020; 21:ijms21165847. [PMID: 32823970 PMCID: PMC7461580 DOI: 10.3390/ijms21165847] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 08/06/2020] [Accepted: 08/12/2020] [Indexed: 01/08/2023] Open
Abstract
Gain-of-function mutations in human androgen receptor (AR) are among the major causes of drug resistance in prostate cancer (PCa). Identifying mutations that cause resistant phenotype is of critical importance for guiding treatment protocols, as well as for designing drugs that do not elicit adverse responses. However, experimental characterization of these mutations is time consuming and costly; thus, predictive models are needed to anticipate resistant mutations and to guide the drug discovery process. In this work, we leverage experimental data collected on 68 AR mutants, either observed in the clinic or described in the literature, to train a deep neural network (DNN) that predicts the response of these mutants to currently used and experimental anti-androgens and testosterone. We demonstrate that the use of this DNN, with general 2D descriptors, provides a more accurate prediction of the biological outcome (inhibition, activation, no-response, mixed-response) in AR mutant-drug pairs compared to other machine learning approaches. Finally, the developed approach was used to make predictions of AR mutant response to the latest AR inhibitor darolutamide, which were then validated by in-vitro experiments.
Collapse
|
35
|
Evidence Supporting an Antimicrobial Origin of Targeting Peptides to Endosymbiotic Organelles. Cells 2020; 9:cells9081795. [PMID: 32731621 PMCID: PMC7463930 DOI: 10.3390/cells9081795] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/24/2020] [Accepted: 07/24/2020] [Indexed: 12/15/2022] Open
Abstract
Mitochondria and chloroplasts emerged from primary endosymbiosis. Most proteins of the endosymbiont were subsequently expressed in the nucleo-cytosol of the host and organelle-targeted via the acquisition of N-terminal presequences, whose evolutionary origin remains enigmatic. Using a quantitative assessment of their physico-chemical properties, we show that organelle targeting peptides, which are distinct from signal peptides targeting other subcellular compartments, group with a subset of antimicrobial peptides. We demonstrate that extant antimicrobial peptides target a fluorescent reporter to either the mitochondria or the chloroplast in the green alga Chlamydomonas reinhardtii and, conversely, that extant targeting peptides still display antimicrobial activity. Thus, we provide strong computational and functional evidence for an evolutionary link between organelle-targeting and antimicrobial peptides. Our results support the view that resistance of bacterial progenitors of organelles to the attack of host antimicrobial peptides has been instrumental in eukaryogenesis and in the emergence of photosynthetic eukaryotes.
Collapse
|
36
|
|
37
|
Playe B, Stoven V. Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity. J Cheminform 2020; 12:11. [PMID: 33431042 PMCID: PMC7011501 DOI: 10.1186/s13321-020-0413-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023] Open
Abstract
Chemogenomics, also called proteochemometrics, covers a range of computational methods that can be used to predict protein–ligand interactions at large scales in the protein and chemical spaces. They differ from more classical ligand-based methods (also called QSAR) that predict ligands for a given protein receptor. In the context of drug discovery process, chemogenomics allows to tackle the question of predicting off-target proteins for drug candidates, one of the main causes of undesirable side-effects and failure within drugs development processes. The present study compares shallow and deep machine-learning approaches for chemogenomics, and explores data augmentation techniques for deep learning algorithms in chemogenomics. Shallow machine-learning algorithms rely on expert-based chemical and protein descriptors, while recent developments in deep learning algorithms enable to learn abstract numerical representations of molecular graphs and protein sequences, in order to optimise the performance of the prediction task. We first propose a formulation of chemogenomics with deep learning, called the chemogenomic neural network (CN), as a feed-forward neural network taking as input the combination of molecule and protein representations learnt by molecular graph and protein sequence encoders. We show that, on large datasets, the deep learning CN model outperforms state-of-the-art shallow methods, and competes with deep methods with expert-based descriptors. However, on small datasets, shallow methods present better prediction performance than deep learning methods. Then, we evaluate data augmentation techniques, namely multi-view and transfer learning, to improve the prediction performance of the chemogenomic neural network. We conclude that a promising research direction is to integrate heterogeneous sources of data such as auxiliary tasks for which large datasets are available, or independently, multiple molecule and protein attribute views.
Collapse
Affiliation(s)
- Benoit Playe
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75248, Paris, France
| | - Veronique Stoven
- Center for Computational Biology, Mines ParisTech, PSL Research University, 60 Bd Saint-Michel, 75006, Paris, France. .,Institut Curie, 75248, Paris, France. .,INSERM U900, 75248, Paris, France.
| |
Collapse
|
38
|
Jiang M, Li Z, Zhang S, Wang S, Wang X, Yuan Q, Wei Z. Drug–target affinity prediction using graph neural network and contact maps. RSC Adv 2020; 10:20701-20712. [PMID: 35517730 PMCID: PMC9054320 DOI: 10.1039/d0ra02297g] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 05/07/2020] [Indexed: 02/01/2023] Open
Abstract
Computer-aided drug design uses high-performance computers to simulate the tasks in drug design, which is a promising research area. Drug–target affinity (DTA) prediction is the most important step of computer-aided drug design, which could speed up drug development and reduce resource consumption. With the development of deep learning, the introduction of deep learning to DTA prediction and improving the accuracy have become a focus of research. In this paper, utilizing the structural information of molecules and proteins, two graphs of drug molecules and proteins are built up respectively. Graph neural networks are introduced to obtain their representations, and a method called DGraphDTA is proposed for DTA prediction. Specifically, the protein graph is constructed based on the contact map output from the prediction method, which could predict the structural characteristics of the protein according to its sequence. It can be seen from the test of various metrics on benchmark datasets that the method proposed in this paper has strong robustness and generalizability. Prediction of drug–target affinity by constructing both molecule and protein graphs.![]()
Collapse
Affiliation(s)
- Mingjian Jiang
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Zhen Li
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Shugang Zhang
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Shuang Wang
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Xiaofeng Wang
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Qing Yuan
- Department of Computer Science and Technology
- Ocean University of China
- China
| | - Zhiqiang Wei
- Department of Computer Science and Technology
- Ocean University of China
- China
| |
Collapse
|
39
|
Siedhoff NE, Schwaneberg U, Davari MD. Machine learning-assisted enzyme engineering. Methods Enzymol 2020; 643:281-315. [DOI: 10.1016/bs.mie.2020.05.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
40
|
Bongers BJ, IJzerman AP, Van Westen GJP. Proteochemometrics - recent developments in bioactivity and selectivity modeling. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:89-98. [PMID: 33386099 DOI: 10.1016/j.ddtec.2020.08.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/12/2023]
Abstract
Proteochemometrics is a machine learning based modeling approach relying on a combination of ligand and protein descriptors. With ongoing developments in machine learning and increases in public data the technique is more frequently applied in early drug discovery, typically in ligand-target binding prediction. Common applications include improvements to single target quantitative structure-activity relationship models, protein selectivity and promiscuity modeling, and large-scale deep learning approaches. The increase in predictive power using proteochemometrics is observed in multi-target bioactivity modeling, opening the door to more extensive studies covering whole protein families. On top of that, with deep learning fueling more complex and larger scale models, proteochemometrics allows faster and higher quality computational models supporting the design, make, test cycle.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P Van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|
41
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
42
|
Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 2019; 20:1878-1912. [PMID: 30084866 PMCID: PMC6917215 DOI: 10.1093/bib/bby061] [Citation(s) in RCA: 223] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 05/25/2018] [Indexed: 01/16/2023] Open
Abstract
The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey
| | - Heval Atas
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| | - Rengul Cetin-Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tunca Doğan
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| |
Collapse
|
43
|
Lo Monte M, Manelfi C, Gemei M, Corda D, Beccari AR. ADPredict: ADP-ribosylation site prediction based on physicochemical and structural descriptors. Bioinformatics 2019; 34:2566-2574. [PMID: 29554239 PMCID: PMC6061869 DOI: 10.1093/bioinformatics/bty159] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 03/14/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation ADP-ribosylation is a post-translational modification (PTM) implicated in several crucial cellular processes, ranging from regulation of DNA repair and chromatin structure to cell metabolism and stress responses. To date, a complete understanding of ADP-ribosylation targets and their modification sites in different tissues and disease states is still lacking. Identification of ADP-ribosylation sites is required to discern the molecular mechanisms regulated by this modification. This motivated us to develop a computational tool for the prediction of ADP-ribosylated sites. Results Here, we present ADPredict, the first dedicated computational tool for the prediction of ADP-ribosylated aspartic and glutamic acids. This predictive algorithm is based on (i) physicochemical properties, (ii) in-house designed secondary structure-related descriptors and (iii) three-dimensional features of a set of human ADP-ribosylated proteins that have been reported in the literature. ADPredict was developed using principal component analysis and machine learning techniques; its performance was evaluated both internally via intensive bootstrapping and in predicting two external experimental datasets. It outperformed the only other available ADP-ribosylation prediction tool, ModPred. Moreover, a novel secondary structure descriptor, HM-ratio, was introduced and successfully contributed to the model development, thus representing a promising tool for bioinformatics studies, such as PTM prediction. Availability and implementation ADPredict is freely available at www.ADPredict.net. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Lo Monte
- Institute of Protein Biochemistry, National Research Council, Naples, Italy
| | | | - Marica Gemei
- Dompé Farmaceutici SpA, L'Aquila.,Dipartimento di Scienze Farmaceutiche, Università degli Studi di Milano, Milano, Italy
| | - Daniela Corda
- Institute of Protein Biochemistry, National Research Council, Naples, Italy
| | - Andrea Rosario Beccari
- Institute of Protein Biochemistry, National Research Council, Naples, Italy.,Dompé Farmaceutici SpA, L'Aquila
| |
Collapse
|
44
|
Lee M, Kim H, Joe H, Kim HG. Multi-channel PINN: investigating scalable and transferable neural networks for drug discovery. J Cheminform 2019; 11:46. [PMID: 31289963 PMCID: PMC6617572 DOI: 10.1186/s13321-019-0368-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 07/02/2019] [Indexed: 12/19/2022] Open
Abstract
Analysis of compound–protein interactions (CPIs) has become a crucial prerequisite for drug discovery and drug repositioning. In vitro experiments are commonly used in identifying CPIs, but it is not feasible to discover the molecular and proteomic space only through experimental approaches. Machine learning’s advances in predicting CPIs have made significant contributions to drug discovery. Deep neural networks (DNNs), which have recently been applied to predict CPIs, performed better than other shallow classifiers. However, such techniques commonly require a considerable volume of dense data for each training target. Although the number of publicly available CPI data has grown rapidly, public data is still sparse and has a large number of measurement errors. In this paper, we propose a novel method, Multi-channel PINN, to fully utilize sparse data in terms of representation learning. With representation learning, Multi-channel PINN can utilize three approaches of DNNs which are a classifier, a feature extractor, and an end-to-end learner. Multi-channel PINN can be fed with both low and high levels of representations and incorporates each of them by utilizing all approaches within a single model. To fully utilize sparse public data, we additionally explore the potential of transferring representations from training tasks to test tasks. As a proof of concept, Multi-channel PINN was evaluated on fifteen combinations of feature pairs to investigate how they affect the performance in terms of highest performance, initial performance, and convergence speed. The experimental results obtained indicate that the multi-channel models using protein features performed better than single-channel models or multi-channel models using compound features. Therefore, Multi-channel PINN can be advantageous when used with appropriate representations. Additionally, we pretrained models on a training task then finetuned them on a test task to figure out whether Multi-channel PINN can capture general representations for compounds and proteins. We found that there were significant differences in performance between pretrained models and non-pretrained models.
Collapse
Affiliation(s)
- Munhwan Lee
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyeyeon Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hyunwhan Joe
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Laboratory, Seoul National University, 1 Gwanak-ro, Seoul, Republic of Korea.
| |
Collapse
|
45
|
DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks. Sci Rep 2019; 9:7344. [PMID: 31089211 PMCID: PMC6517386 DOI: 10.1038/s41598-019-43708-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 04/27/2019] [Indexed: 01/22/2023] Open
Abstract
Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the ‘biofilm formation process’ in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.
Collapse
|
46
|
Abbasi WA, Asif A, Ben-Hur A, Minhas FUAA. Learning protein binding affinity using privileged information. BMC Bioinformatics 2018; 19:425. [PMID: 30442086 PMCID: PMC6238365 DOI: 10.1186/s12859-018-2448-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 10/25/2018] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data. RESULTS In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well. CONCLUSIONS The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
- Information Technology Center (ITC), University of Azad Jammu & Kashmir, Muzaffarabad, Azad Kashmir, 13100, Pakistan
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA
| | - Amina Asif
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University (CSU), Fort Collins, CO, 80523, USA.
| | - Fayyaz Ul Amir Afsar Minhas
- Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan.
| |
Collapse
|
47
|
Saito Y, Oikawa M, Nakazawa H, Niide T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins. ACS Synth Biol 2018; 7:2014-2022. [PMID: 30103599 DOI: 10.1021/acssynbio.8b00155] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Molecular evolution based on mutagenesis is widely used in protein engineering. However, optimal proteins are often difficult to obtain due to a large sequence space. Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we conduct two rounds of mutagenesis where an initial library of protein variants is used to train a machine-learning model to guide mutagenesis for the second-round library. This enables us to prepare a small library suited for screening experiments with high enrichment of functional proteins. We demonstrated a proof-of-concept of our approach by altering the reference green fluorescent protein (GFP) so that its fluorescence is changed into yellow. We successfully obtained a number of proteins showing yellow fluorescence, 12 of which had longer wavelengths than the reference yellow fluorescent protein (YFP). These results show the potential of our approach as a powerful method for directed evolution of fluorescent proteins.
Collapse
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Teppei Niide
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
48
|
Karlberg M, von Stosch M, Glassey J. Exploiting mAb structure characteristics for a directed QbD implementation in early process development. Crit Rev Biotechnol 2018. [DOI: 10.1080/07388551.2017.1421899] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Micael Karlberg
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Moritz von Stosch
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Jarka Glassey
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
49
|
Barley MH, Turner NJ, Goodacre R. Improved Descriptors for the Quantitative Structure-Activity Relationship Modeling of Peptides and Proteins. J Chem Inf Model 2018; 58:234-243. [PMID: 29338232 DOI: 10.1021/acs.jcim.7b00488] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The ability to model the activity of a protein using quantitative structure-activity relationships (QSAR) requires descriptors for the 20 naturally coded amino acids. In this work we show that by modifying some established descriptors we were able to model the activity data of 140 mutants of the enzyme epoxide hydrolase with improved accuracy. These new descriptors (referred to as physical descriptors) also gave very good results when tested against a series of four dipeptide data sets. The physical descriptors encode the amino acids using only two orthogonal scales: the first is strongly linked to hydrophilicity/hydrophobicity, and the second, to the volume of the amino acid residue. The use of these new amino acid descriptors should result in simpler and more readily interpretable models for the enzyme activity (and potentially other functions of interest, e.g., secondary and tertiary structure) of peptides and proteins.
Collapse
Affiliation(s)
- Mark H Barley
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Nicholas J Turner
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Royston Goodacre
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| |
Collapse
|
50
|
Tresadern G, Trabanco AA, Pérez-Benito L, Overington JP, van Vlijmen HWT, van Westen GJP. Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling. J Chem Inf Model 2017; 57:2976-2985. [PMID: 29172488 PMCID: PMC5755953 DOI: 10.1021/acs.jcim.7b00338] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 01/07/2023]
Abstract
Proteochemometric modeling (PCM) is a computational approach that can be considered an extension of quantitative structure-activity relationship (QSAR) modeling, where a single model incorporates information for a family of targets and all the associated ligands instead of modeling activity versus one target. This is especially useful for situations where bioactivity data exists for similar proteins but is scarce for the protein of interest. Here we demonstrate the application of PCM to identify allosteric modulators of metabotropic glutamate (mGlu) receptors. Given our long-running interest in modulating mGlu receptor function we compiled a matrix of compound-target bioactivity data. Some members of the mGlu family are well explored both internally and in the public domain, while there are much fewer examples of ligands for other targets such as the mGlu7 receptor. Using a PCM approach mGlu7 receptor hits were found. In comparison to conventional single target modeling the identified hits were more diverse, had a better confirmation rate, and provide starting points for further exploration. We conclude that the robust structure-activity relationship from well explored target family members translated to better quality hits for PCM compared to virtual screening (VS) based on a single target.
Collapse
Affiliation(s)
- Gary Tresadern
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Andres A. Trabanco
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - Laura Pérez-Benito
- Computational
Chemistry and Neuroscience Medicinal Chemistry, Janssen
Research & Development, Janssen-Cilag
S.A., Jarama 75A, 45007 Toledo, Spain
| | - John P. Overington
- ChEMBL Group, EMBL-EBI,
Wellcome Trust Genome Campus, CB10 1SD Hinxton, United Kingdom
| | - Herman W. T. van Vlijmen
- Computational
Chemistry, Janssen Research & Development, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | | |
Collapse
|