1
|
Bitencourt-Ferreira G, Villarreal MA, Quiroga R, Biziukova N, Poroikov V, Tarasova O, de Azevedo Junior WF. Exploring Scoring Function Space: Developing Computational Models for Drug Discovery. Curr Med Chem 2024; 31:2361-2377. [PMID: 36944627 DOI: 10.2174/0929867330666230321103731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/15/2022] [Accepted: 12/29/2022] [Indexed: 03/23/2023]
Abstract
BACKGROUND The idea of scoring function space established a systems-level approach to address the development of models to predict the affinity of drug molecules by those interested in drug discovery. OBJECTIVE Our goal here is to review the concept of scoring function space and how to explore it to develop machine learning models to address protein-ligand binding affinity. METHODS We searched the articles available in PubMed related to the scoring function space. We also utilized crystallographic structures found in the protein data bank (PDB) to represent the protein space. RESULTS The application of systems-level approaches to address receptor-drug interactions allows us to have a holistic view of the process of drug discovery. The scoring function space adds flexibility to the process since it makes it possible to see drug discovery as a relationship involving mathematical spaces. CONCLUSION The application of the concept of scoring function space has provided us with an integrated view of drug discovery methods. This concept is useful during drug discovery, where we see the process as a computational search of the scoring function space to find an adequate model to predict receptor-drug binding affinity.
Collapse
Affiliation(s)
| | - Marcos A Villarreal
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Rodrigo Quiroga
- CONICET-Departamento de Matemática y Física, Instituto de Investigaciones en Fisicoquímica de Córdoba (INFIQC), Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina
| | - Nadezhda Biziukova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Vladimir Poroikov
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Olga Tarasova
- Institute of Biomedical Chemistry, Pogodinskaya Str., 10/8, Moscow, 119121, Russia
| | - Walter F de Azevedo Junior
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
- Specialization Program in Bioinformatics, The Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681 Porto Alegre / RS 90619-900, Brazil
| |
Collapse
|
2
|
Zhang YW. Designing High Binding Affinity Peptides for MHC Class I Using MAM: An In Silico Approach. Methods Mol Biol 2024; 2809:263-274. [PMID: 38907903 DOI: 10.1007/978-1-0716-3874-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2024]
Abstract
The availability of extensive MHC-peptide binding data has boosted machine learning-based approaches for predicting binding affinity and identifying binding motifs. These computational tools leverage the wealth of binding data to extract essential features and generate a multitude of potential peptides, thereby significantly reducing the cost and time required for experimental procedures. MAM is one such tool for predicting the MHC-I-peptide binding affinity, extracting binding motifs, and generating new peptides with high affinity. This manuscript provides step-by-step guidance on installing, configuring, and executing MAM while also discussing the best practices when using this tool.
Collapse
Affiliation(s)
- Yu Wei Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.
| |
Collapse
|
3
|
Feng H, Wang F, Li N, Xu Q, Zheng G, Sun X, Hu M, Li X, Xing G, Zhang G. Use of tree-based machine learning methods to screen affinitive peptides based on docking data. Mol Inform 2023; 42:e202300143. [PMID: 37696773 DOI: 10.1002/minf.202300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/03/2023] [Accepted: 09/11/2023] [Indexed: 09/13/2023]
Abstract
Screening peptides with good affinity is an important step in peptide-drug discovery. Recent advancement in computer and data science have made machine learning a useful tool in accurately affinitive-peptide screening. In current study, four different tree-based algorithms, including Classification and regression trees (CART), C5.0 decision tree (C50), Bagged CART (BAG) and Random Forest (RF), were employed to explore the relationship between experimental peptide affinities and virtual docking data, and the performance of each model was also compared in parallel. All four algorithms showed better performances on dataset pre-scaled, -centered and -PCA than other pre-processed dataset. After model re-built and hyperparameter optimization, the optimal C50 model (C50O) showed the best performances in terms of Accuracy, Kappa, Sensitivity, Specificity, F1, MCC and AUC when validated on test data and an unknown PEDV datasets evaluation (Accuracy=80.4 %). BAG and RFO (the optimal RF), as two best models during training process, did not performed as expecting during in testing and unknown dataset validations. Furthermore, the high correlation of the predictions of RFO and BAG to C50O implied the high stability and robustness of their prediction. Whereas although the good performance on unknown dataset, the poor performance in test data validation and correlation analysis indicated CARTO could not be used for future data prediction. To accurately evaluate the peptide affinity, the current study firstly gave a tree-model competition on affinitive peptide prediction by using virtual docking data, which would expand the application of machine learning algorithms in studying PepPIs and benefit the development of peptide therapeutics.
Collapse
Affiliation(s)
- Hua Feng
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Fangyu Wang
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Ning Li
- College of Food Science and Technology, Henan Agricultural University, Zhengzhou, China
| | - Qian Xu
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Guanming Zheng
- Public Health and Preventive Medicine Teaching and Research Center, Henan University of Chinese Medicine, Zhengzhou, Henan, China
| | - Xuefeng Sun
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Man Hu
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Xuewu Li
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Guangxu Xing
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
| | - Gaiping Zhang
- Henan Key Laboratory of Animal Immunology, Henan Academy of Agricultural Sciences, Zhengzhou, China
- Longhu Modern Immunology Laboratory, Zhengzhou, China
- School of Advanced Agricultural sciences, Peking University, Beijing, China
- Jiangsu Co-Innovation Center for the Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou University, Yangzhou, Jiangsu, China
| |
Collapse
|
4
|
Grasso S, Dabene V, Hendriks MMW, Zwartjens P, Pellaux R, Held M, Panke S, van Dijl JM, Meyer A, van Rij T. Signal Peptide Efficiency: From High-Throughput Data to Prediction and Explanation. ACS Synth Biol 2023; 12:390-404. [PMID: 36649479 PMCID: PMC9942255 DOI: 10.1021/acssynbio.2c00328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The passage of proteins across biological membranes via the general secretory (Sec) pathway is a universally conserved process with critical functions in cell physiology and important industrial applications. Proteins are directed into the Sec pathway by a signal peptide at their N-terminus. Estimating the impact of physicochemical signal peptide features on protein secretion levels has not been achieved so far, partially due to the extreme sequence variability of signal peptides. To elucidate relevant features of the signal peptide sequence that influence secretion efficiency, an evaluation of ∼12,000 different designed signal peptides was performed using a novel miniaturized high-throughput assay. The results were used to train a machine learning model, and a post-hoc explanation of the model is provided. By describing each signal peptide with a selection of 156 physicochemical features, it is now possible to both quantify feature importance and predict the protein secretion levels directed by each signal peptide. Our analyses allow the detection and explanation of the relevant signal peptide features influencing the efficiency of protein secretion, generating a versatile tool for the de novo design and in silico evaluation of signal peptides.
Collapse
Affiliation(s)
- Stefano Grasso
- Department
of Medical Microbiology, University of Groningen,
University Medical Center Groningen, Hanzeplein 1, Groningen 9700 RB, The Netherlands,DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands
| | - Valentina Dabene
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland,FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | | | - Priscilla Zwartjens
- DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands
| | - René Pellaux
- FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | - Martin Held
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland
| | - Sven Panke
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland
| | - Jan Maarten van Dijl
- Department
of Medical Microbiology, University of Groningen,
University Medical Center Groningen, Hanzeplein 1, Groningen 9700 RB, The Netherlands,. Phone: +31503615187
| | - Andreas Meyer
- FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | - Tjeerd van Rij
- DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands,. Phone: +31628441843
| |
Collapse
|
5
|
MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:ph15060707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2–20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
|
6
|
Wani MA, Garg P, Roy KK. Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides. Med Biol Eng Comput 2021; 59:2397-2408. [PMID: 34632545 DOI: 10.1007/s11517-021-02443-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 09/14/2021] [Indexed: 10/20/2022]
Abstract
The ubiquitous antimicrobial peptides (AMPs), with a broad range of antimicrobial activities, represent a great promise for combating the multi-drug resistant infections. In this study, using a large and diverse set of AMPs (2638) and non-AMPs (3700), we have explored a variety of machine learning classifiers to build in silico models for AMP prediction, including Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Decision Tree (DT), Naive Bayes (NB), Quadratic Discriminant Analysis (QDA), and ensemble learning. Among the various models generated, the RF classifier-based model top-performed in both the internal [Accuracy: 91.40%, Precision: 89.37%, Sensitivity: 90.05%, and Specificity: 92.36%] and external validations [Accuracy: 89.43%, Precision: 88.92%, Sensitivity: 85.21%, and Specificity: 92.43%]. In addition, the RF classifier-based model correctly predicted the known AMPs and non-AMPs; those kept aside as an additional external validation set. The performance assessment revealed three features viz. ChargeD2001, PAAC12 (pseudo amino acid composition), and polarity T13 that are likely to play vital roles in the antimicrobial activity of AMPs. The developed RF-based classification model may further be useful in the design and prediction of the novel potential AMPs.
Collapse
Affiliation(s)
- Mushtaq Ahmad Wani
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Mohali, 160062, Punjab, India
| | - Kuldeep K Roy
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, Kolkata, 700054, West Bengal, India. .,Department of Pharmaceutical Sciences, School of Health Sciences, University of Petroleum and Energy Studies (UPES), P.O. Bidholi, Dehradun, 248007, Uttarakhand, India.
| |
Collapse
|
7
|
Baudry J, Bondar AN, Cournia Z, Parks JM, Petridis L, Roux B. Editorial: Advances in computational molecular biophysics. Biochim Biophys Acta Gen Subj 2021; 1865:129888. [PMID: 33662454 DOI: 10.1016/j.bbagen.2021.129888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Jerome Baudry
- The University of Alabama in Huntsville, Department of Biological Sciences, 301 Sparkman Drive, Huntsville, AL 35899, USA.
| | - Ana-Nicoleta Bondar
- Freie Universität Berlin, Department of Physics, Theoretical Molecular Biophysics, Arnimallee 14, D-14195 Berlin, Germany.
| | - Zoe Cournia
- Soranou Ephessiou, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece.
| | - Jerry M Parks
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN 37831-6309, USA.
| | - Loukas Petridis
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN 37831-6309, USA.
| | - Benoit Roux
- Department of Biochemistry and Molecular Biology, The University of Chicago, 929 E57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
8
|
Mei S, Li F, Xiang D, Ayala R, Faridi P, Webb GI, Illing PT, Rossjohn J, Akutsu T, Croft NP, Purcell AW, Song J. Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinform 2021; 22:6102669. [PMID: 33454737 DOI: 10.1093/bib/bbaa415] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/29/2020] [Accepted: 12/16/2020] [Indexed: 12/17/2022] Open
Abstract
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such binding. With the use of mass spectrometry, the scale of naturally presented HLA ligands that could be used to develop such predictors has been expanded. However, there are rarely efforts that focus on the integration of these experimental data with computational algorithms to efficiently develop up-to-date predictors. Here, we present Anthem for accurate HLA-I binding prediction. In particular, we have developed a user-friendly framework to support the development of customisable HLA-I binding prediction models to meet challenges associated with the rapidly increasing availability of large amounts of immunopeptidomic data. Our extensive evaluation, using both independent and experimental datasets shows that Anthem achieves an overall similar or higher area under curve value compared with other contemporary tools. It is anticipated that Anthem will provide a unique opportunity for the non-expert user to analyse and interpret their own in-house or publicly deposited datasets.
Collapse
Affiliation(s)
- Shutao Mei
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Rochelle Ayala
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Pouya Faridi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Patricia T Illing
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jamie Rossjohn
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Nathan P Croft
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Biochemistry and Molecular Biology, Monash University, Australia
| |
Collapse
|
9
|
Acharya A, Agarwal R, Baker M, Baudry J, Bhowmik D, Boehm S, Byler KG, Chen S, Coates L, Cooper C, Demerdash O, Daidone I, Eblen J, Ellingson S, Forli S, Glaser J, Gumbart JC, Gunnels J, Hernandez O, Irle S, Kneller D, Kovalevsky A, Larkin J, Lawrence T, LeGrand S, Liu SH, Mitchell J, Park G, Parks J, Pavlova A, Petridis L, Poole D, Pouchard L, Ramanathan A, Rogers D, Santos-Martins D, Scheinberg A, Sedova A, Shen Y, Smith J, Smith M, Soto C, Tsaris A, Thavappiragasam M, Tillack A, Vermaas J, Vuong V, Yin J, Yoo S, Zahran M, Zanetti-Polzi L. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19. J Chem Inf Model 2020; 60:5832-5852. [PMID: 33326239 PMCID: PMC7754786 DOI: 10.1021/acs.jcim.0c01010] [Citation(s) in RCA: 109] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Indexed: 01/18/2023]
Abstract
We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.
Collapse
Affiliation(s)
- A. Acharya
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - R. Agarwal
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
| | - M. Baker
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - J. Baudry
- The University of Alabama in Huntsville, Department of Biological Sciences. 301 Sparkman Drive, Huntsville, AL 35899, USA
| | - D. Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - S. Boehm
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - K. G. Byler
- The University of Alabama in Huntsville, Department of Biological Sciences. 301 Sparkman Drive, Huntsville, AL 35899, USA
| | - S.Y. Chen
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - L. Coates
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - C.J. Cooper
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
| | - O. Demerdash
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - I. Daidone
- Department of Physical and Chemical Sciences, University of L’Aquila, I-67010 L’Aquila, Italy
| | - J.D. Eblen
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
| | - S. Ellingson
- University of Kentucky, Division of Biomedical Informatics, College of Medicine, UK Medical Center MN 150, Lexington KY, 40536, USA
| | - S. Forli
- Scripps Research, La Jolla, CA, 92037, USA
| | - J. Glaser
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - J. C. Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - J. Gunnels
- HPC Engineering, Amazon Web Services, Seattle, WA 98121, USA
| | - O. Hernandez
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - S. Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN 37996, USA
| | - D.W. Kneller
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - A. Kovalevsky
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - J. Larkin
- NVIDIA Corporation, Santa Clara, CA 95051, USA
| | - T.J. Lawrence
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - S. LeGrand
- NVIDIA Corporation, Santa Clara, CA 95051, USA
| | - S.-H. Liu
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
| | - J.C. Mitchell
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - G. Park
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - J.M. Parks
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
| | - A. Pavlova
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - L. Petridis
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
| | - D. Poole
- NVIDIA Corporation, Santa Clara, CA 95051, USA
| | - L. Pouchard
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - A. Ramanathan
- Data Science and Learning Division, Argonne National Lab, Lemont, IL 60439, USA
| | - D. Rogers
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | | | | | - A. Sedova
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830, USA
| | - Y. Shen
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
| | - J.C. Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
| | - M.D. Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830, USA
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996, USA
| | - C. Soto
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - A. Tsaris
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | | | | | - J.V. Vermaas
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - V.Q. Vuong
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN 37996, USA
| | - J. Yin
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
| | - S. Yoo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - M. Zahran
- Department of Biological Sciences, New York City College of Technology, The City University of New York (CUNY), Brooklyn, NY 11201, USA
| | | |
Collapse
|
10
|
Aranha MP, Jewel YSM, Beckman RA, Weiner LM, Mitchell JC, Parks JM, Smith JC. Combining Three-Dimensional Modeling with Artificial Intelligence to Increase Specificity and Precision in Peptide-MHC Binding Predictions. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2020; 205:1962-1977. [PMID: 32878910 PMCID: PMC7511449 DOI: 10.4049/jimmunol.1900918] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 08/01/2020] [Indexed: 02/06/2023]
Abstract
The reliable prediction of the affinity of candidate peptides for the MHC is important for predicting their potential antigenicity and thus influences medical applications, such as decisions on their inclusion in T cell-based vaccines. In this study, we present a rapid, predictive computational approach that combines a popular, sequence-based artificial neural network method, NetMHCpan 4.0, with three-dimensional structural modeling. We find that the ensembles of bound peptide conformations generated by the programs MODELLER and Rosetta FlexPepDock are less variable in geometry for strong binders than for low-affinity peptides. In tests on 1271 peptide sequences for which the experimental dissociation constants of binding to the well-characterized murine MHC allele H-2Db are known, by applying thresholds for geometric fluctuations the structure-based approach in a standalone manner drastically improves the statistical specificity, reducing the number of false positives. Furthermore, filtering candidates generated with NetMHCpan 4.0 with the structure-based predictor led to an increase in the positive predictive value (PPV) of the peptides correctly predicted to bind very strongly (i.e., K d < 100 nM) from 40 to 52% (p = 0.027). The combined method also significantly improved the PPV when tested on five human alleles, including some with limited data for training. Overall, an average increase of 10% in the PPV was found over the standalone sequence-based method. The combined method should be useful in the rapid design of effective T cell-based vaccines.
Collapse
Affiliation(s)
- Michelle P Aranha
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37916
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - Yead S M Jewel
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37916
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - Robert A Beckman
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC 20007
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC 20007
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| | - Louis M Weiner
- Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - Jerry M Parks
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - Jeremy C Smith
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37916;
- Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| |
Collapse
|
11
|
Acharya A, Agarwal R, Baker M, Baudry J, Bhowmik D, Boehm S, Byler KG, Coates L, Chen SY, Cooper CJ, Demerdash O, Daidone I, Eblen JD, Ellingson S, Forli S, Glaser J, Gumbart JC, Gunnels J, Hernandez O, Irle S, Larkin J, Lawrence TJ, LeGrand S, Liu SH, Mitchell JC, Park G, Parks JM, Pavlova A, Petridis L, Poole D, Pouchard L, Ramanathan A, Rogers D, Santos-Martins D, Scheinberg A, Sedova A, Shen S, Smith JC, Smith MD, Soto C, Tsaris A, Thavappiragasam M, Tillack AF, Vermaas JV, Vuong VQ, Yin J, Yoo S, Zahran M, Zanetti-Polzi L. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19. CHEMRXIV : THE PREPRINT SERVER FOR CHEMISTRY 2020:12725465. [PMID: 33200117 PMCID: PMC7668744 DOI: 10.26434/chemrxiv.12725465] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Revised: 07/29/2020] [Indexed: 01/18/2023]
Abstract
We present a supercomputer-driven pipeline for in-silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. We also describe preliminary results obtained for 23 systems involving eight protein targets of the proteome of SARS CoV-2. THe MD performed is temperature replica-exchange enhanced sampling, making use of the massively parallel supercomputing on the SUMMIT supercomputer at Oak Ridge National Laboratory, with which more than 1ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to ten configurations of each of the 23 SARS CoV-2 systems using AutoDock Vina. We also demonstrate that using Autodock-GPU on SUMMIT, it is possible to perform exhaustive docking of one billion compounds in under 24 hours. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and AI methods to cluster MD trajectories and rescore docking poses.
Collapse
Affiliation(s)
- A Acharya
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
| | - R Agarwal
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996
| | - M Baker
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - J Baudry
- The University of Alabama in Huntsville, Department of Biological Sciences. 301 Sparkman Drive, Huntsville, AL 35899
| | - D Bhowmik
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
| | - S Boehm
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - K G Byler
- The University of Alabama in Huntsville, Department of Biological Sciences. 301 Sparkman Drive, Huntsville, AL 35899
| | - L Coates
- Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
| | - S Y Chen
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
| | - C J Cooper
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996
| | - O Demerdash
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - I Daidone
- Department of Physical and Chemical Sciences, University of L'Aquila, I-67010 L'Aquila, Italy
| | - J D Eblen
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
| | - S Ellingson
- University of Kentucky, Division of Biomedical Informatics, College of Medicine, UK Medical Center MN 150, Lexington KY, 40536
| | - S Forli
- Scripps Research, La Jolla, CA, 92037
| | - J Glaser
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - J C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
| | - J Gunnels
- HPC Engineering, Amazon Web Services, Seattle, WA 98121
| | - O Hernandez
- Computer Science and Mathematics Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - S Irle
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN 37996
| | - J Larkin
- NVIDIA Corporation, Santa Clara, CA 95051
| | - T J Lawrence
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - S LeGrand
- NVIDIA Corporation, Santa Clara, CA 95051
| | - S-H Liu
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
| | - J C Mitchell
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - G Park
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
| | - J M Parks
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996
| | - A Pavlova
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332
| | - L Petridis
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
| | - D Poole
- NVIDIA Corporation, Santa Clara, CA 95051
| | - L Pouchard
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
| | - A Ramanathan
- Data Science and Learning Division, Argonne National Lab, Lemont, IL 60439
| | - D Rogers
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | | | | | - A Sedova
- Biosciences Division, Oak Ridge National Lab, Oak Ridge, TN 37830
| | - S Shen
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996
| | - J C Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
| | - M D Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, 37830
- The University of Tennessee, Knoxville. Department of Biochemistry & Cellular and Molecular Biology, 309 Ken and Blaire Mossman Bldg. 1311 Cumberland Avenue Knoxville, TN, 37996
| | - C Soto
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
| | - A Tsaris
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | | | | | - J V Vermaas
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - V Q Vuong
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
- Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN 37996
| | - J Yin
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830
| | - S Yoo
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
| | - M Zahran
- Department of Biological Sciences, New York City College of Technology, The City University of New York (CUNY), Brooklyn, NY 11201
| | | |
Collapse
|
12
|
Abella JR, Antunes DA, Clementi C, Kavraki LE. Large-Scale Structure-Based Prediction of Stable Peptide Binding to Class I HLAs Using Random Forests. Front Immunol 2020; 11:1583. [PMID: 32793224 PMCID: PMC7387700 DOI: 10.3389/fimmu.2020.01583] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 06/15/2020] [Indexed: 01/13/2023] Open
Abstract
Prediction of stable peptide binding to Class I HLAs is an important component for designing immunotherapies. While the best performing predictors are based on machine learning algorithms trained on peptide-HLA (pHLA) sequences, the use of structure for training predictors deserves further exploration. Given enough pHLA structures, a predictor based on the residue-residue interactions found in these structures has the potential to generalize for alleles with little or no experimental data. We have previously developed APE-Gen, a modeling approach able to produce pHLA structures in a scalable manner. In this work we use APE-Gen to model over 150,000 pHLA structures, the largest dataset of its kind, which were used to train a structure-based pan-allele model. We extract simple, homogenous features based on residue-residue distances between peptide and HLA, and build a random forest model for predicting stable pHLA binding. Our model achieves competitive AUROC values on leave-one-allele-out validation tests using significantly less data when compared to popular sequence-based methods. Additionally, our model offers an interpretation analysis that can reveal how the model composes the features to arrive at any given prediction. This interpretation analysis can be used to check if the model is in line with chemical intuition, and we showcase particular examples. Our work is a significant step toward using structure to achieve generalizable and more interpretable prediction for stable pHLA binding.
Collapse
Affiliation(s)
- Jayvee R. Abella
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Dinler A. Antunes
- Department of Computer Science, Rice University, Houston, TX, United States
| | - Cecilia Clementi
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
- Department of Chemistry, Rice University, Houston, TX, United States
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, TX, United States
| |
Collapse
|