1
|
Olawade DB, Teke J, Fapohunda O, Weerasinghe K, Usman SO, Ige AO, Clement David-Olawade A. Leveraging artificial intelligence in vaccine development: A narrative review. J Microbiol Methods 2024; 224:106998. [PMID: 39019262 DOI: 10.1016/j.mimet.2024.106998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/12/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024]
Abstract
Vaccine development stands as a cornerstone of public health efforts, pivotal in curbing infectious diseases and reducing global morbidity and mortality. However, traditional vaccine development methods are often time-consuming, costly, and inefficient. The advent of artificial intelligence (AI) has ushered in a new era in vaccine design, offering unprecedented opportunities to expedite the process. This narrative review explores the role of AI in vaccine development, focusing on antigen selection, epitope prediction, adjuvant identification, and optimization strategies. AI algorithms, including machine learning and deep learning, leverage genomic data, protein structures, and immune system interactions to predict antigenic epitopes, assess immunogenicity, and prioritize antigens for experimentation. Furthermore, AI-driven approaches facilitate the rational design of immunogens and the identification of novel adjuvant candidates with optimal safety and efficacy profiles. Challenges such as data heterogeneity, model interpretability, and regulatory considerations must be addressed to realize the full potential of AI in vaccine development. Integrating emerging technologies, such as single-cell omics and synthetic biology, promises to enhance vaccine design precision and scalability. This review underscores the transformative impact of AI on vaccine development and highlights the need for interdisciplinary collaborations and regulatory harmonization to accelerate the delivery of safe and effective vaccines against infectious diseases.
Collapse
Affiliation(s)
- David B Olawade
- Department of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London, United Kingdom; Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom.
| | - Jennifer Teke
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom; Faculty of Medicine, Health and Social Care, Canterbury Christ Church University, United Kingdom
| | | | - Kusal Weerasinghe
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, United Kingdom
| | - Sunday O Usman
- Department of Systems and Industrial Engineering, University of Arizona, USA
| | - Abimbola O Ige
- Department of Chemistry, Faculty of Science, University of Ibadan, Ibadan, Nigeria
| | | |
Collapse
|
2
|
Friedman RZ, Ramu A, Lichtarge S, Myers CA, Granas DM, Gause M, Corbo JC, Cohen BA, White MA. Active learning of enhancer and silencer regulatory grammar in photoreceptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554146. [PMID: 37662358 PMCID: PMC10473580 DOI: 10.1101/2023.08.21.554146] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model's internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models of cis-regulatory function after exhausting naturally occurring training examples in the genome.
Collapse
Affiliation(s)
- Ryan Z. Friedman
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Avinash Ramu
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Sara Lichtarge
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Connie A. Myers
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - David M. Granas
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Maria Gause
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Joseph C. Corbo
- Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Barak A. Cohen
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| | - Michael A. White
- The Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, Saint Louis, MO, 63110
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, 63110
| |
Collapse
|
3
|
Choi G, Kim W, Koo J. Investigating the Performance of Machine Learning Methods in Predicting Functional Properties of the Hydrogenase Variants. BIOTECHNOL BIOPROC E 2023. [DOI: 10.1007/s12257-022-0330-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
|
4
|
Jokinen E, Huuhtanen J, Mustjoki S, Heinonen M, Lähdesmäki H. Predicting recognition between T cell receptors and epitopes with TCRGP. PLoS Comput Biol 2021; 17:e1008814. [PMID: 33764977 PMCID: PMC8023491 DOI: 10.1371/journal.pcbi.1008814] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 04/06/2021] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open
Abstract
Adaptive immune system uses T cell receptors (TCRs) to recognize pathogens and to consequently initiate immune responses. TCRs can be sequenced from individuals and methods analyzing the specificity of the TCRs can help us better understand individuals' immune status in different disorders. For this task, we have developed TCRGP, a novel Gaussian process method that predicts if TCRs recognize specified epitopes. TCRGP can utilize the amino acid sequences of the complementarity determining regions (CDRs) from TCRα and TCRβ chains and learn which CDRs are important in recognizing different epitopes. Our comprehensive evaluation with epitope-specific TCR sequencing data shows that TCRGP achieves on average higher prediction accuracy in terms of AUROC score than existing state-of-the-art methods in epitope-specificity predictions. We also propose a novel analysis approach for combined single-cell RNA and TCRαβ (scRNA+TCRαβ) sequencing data by quantifying epitope-specific TCRs with TCRGP and identify HBV-epitope specific T cells and their transcriptomic states in hepatocellular carcinoma patients.
Collapse
MESH Headings
- Amino Acid Sequence
- Complementarity Determining Regions
- Computational Biology/methods
- Epitopes, T-Lymphocyte/chemistry
- Epitopes, T-Lymphocyte/genetics
- Epitopes, T-Lymphocyte/metabolism
- Humans
- Normal Distribution
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/metabolism
- Sequence Analysis, Protein/methods
Collapse
Affiliation(s)
- Emmi Jokinen
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Jani Huuhtanen
- Translational Immunology Research program and Department of Clinical Chemistry and Hematology, University of Helsinki, Helsinki, Finland
- Hematology Research Unit Helsinki, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland
| | - Satu Mustjoki
- Translational Immunology Research program and Department of Clinical Chemistry and Hematology, University of Helsinki, Helsinki, Finland
- Hematology Research Unit Helsinki, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland
- iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Markus Heinonen
- Department of Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology, Espoo, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
5
|
Sharma B, Ma Y, Ferguson AL, Liu AP. In search of a novel chassis material for synthetic cells: emergence of synthetic peptide compartment. SOFT MATTER 2020; 16:10769-10780. [PMID: 33179713 DOI: 10.1039/d0sm01644f] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Giant lipid vesicles have been used extensively as a synthetic cell model to recapitulate various life-like processes, including in vitro protein synthesis, DNA replication, and cytoskeleton organization. Cell-sized lipid vesicles are mechanically fragile in nature and prone to rupture due to osmotic stress, which limits their usability. Recently, peptide vesicles have been introduced as a synthetic cell model that would potentially overcome the aforementioned limitations. Peptide vesicles are robust, reasonably more stable than lipid vesicles and can withstand harsh conditions including pH, thermal, and osmotic variations. This mini-review summarizes the current state-of-the-art in the design, engineering, and realization of peptide-based chassis materials, including both experimental and computational work. We present an outlook for simulation-aided and data-driven design and experimental realization of engineered and multifunctional synthetic cells.
Collapse
Affiliation(s)
- Bineet Sharma
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | | | |
Collapse
|
6
|
Watson OP, Cortes-Ciriano I, Taylor AR, Watson JA. A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 2020; 35:4656-4663. [PMID: 31070704 PMCID: PMC6853675 DOI: 10.1093/bioinformatics/btz293] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/22/2019] [Accepted: 04/17/2019] [Indexed: 02/07/2023] Open
Abstract
Motivation Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. Results The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and ‘memorize’ the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. Availability and implementation All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Isidro Cortes-Ciriano
- Goring on Thames, Evariste Technologies Ltd., RG8 9AL UK.,Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Aimee R Taylor
- Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA.,Infectious Disease Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - James A Watson
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford OX3, 7LF UK.,Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
7
|
Zhao T, Cheng L, Zang T, Hu Y. Peptide-Major Histocompatibility Complex Class I Binding Prediction Based on Deep Learning With Novel Feature. Front Genet 2019; 10:1191. [PMID: 31850062 PMCID: PMC6892951 DOI: 10.3389/fgene.2019.01191] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 10/28/2019] [Indexed: 12/27/2022] Open
Abstract
Peptide-based vaccine development needs accurate prediction of the binding affinity between major histocompatibility complex I (MHC I) proteins and their peptide ligands. Nowadays more and more machine learning methods have been developed to predict binding affinity and some of them have become the popular tools. However most of them are designed by the shallow neural networks. Bengio said that deep neural networks can learn better fits with less data than shallow neural networks. In our case, some of the alleles only have dozens of peptide data. In addition, we transform each peptide into a characteristic matrix and input it into the model. As we know when dealing with the problem that the input is a matrix, convolutional neural network (CNN) can find the most critical features by itself. Obviously, compared with the traditional neural network model, CNN is more suitable for predicting binding affinity. Different from the previous studies which are based on blocks substitution matrix (BLOSUM), we used novel feature to do the prediction. Since we consider that the order of the sequence, hydropathy index, polarity and the length of the peptide could affect the binding affinity and the properties of these amino acids are key factors for their binding to MHC, we extracted these information from each peptide. In order to make full use of the data we have obtained, we have integrated different lengths of peptides into 15mer based on the binding mode of peptide to MHC I. In order to demonstrate that our method is reliable to predict peptide-MHC binding, we compared our method with several popular methods. The experiments show the superiority of our method.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Tianyi Zang
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Hu
- Department of Computer Science and Technology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
8
|
Jokinen E, Heinonen M, Lähdesmäki H. mGPfusion: predicting protein stability changes with Gaussian process kernel learning and data fusion. Bioinformatics 2019; 34:i274-i283. [PMID: 29949987 PMCID: PMC6022679 DOI: 10.1093/bioinformatics/bty238] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation Proteins are commonly used by biochemical industry for numerous processes. Refining these proteins’ properties via mutations causes stability effects as well. Accurate computational method to predict how mutations affect protein stability is necessary to facilitate efficient protein design. However, accuracy of predictive models is ultimately constrained by the limited availability of experimental data. Results We have developed mGPfusion, a novel Gaussian process (GP) method for predicting protein’s stability changes upon single and multiple mutations. This method complements the limited experimental data with large amounts of molecular simulation data. We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data. Our protein-specific model requires experimental data only regarding the protein of interest and performs well even with few experimental measurements. The mGPfusion models proteins by contact maps and infers the stability effects caused by mutations with a mixture of graph kernels. Our results show that mGPfusion outperforms state-of-the-art methods in predicting protein stability on a dataset of 15 different proteins and that incorporating molecular simulation data improves the model learning and prediction accuracy. Availability and implementation Software implementation and datasets are available at github.com/emmijokinen/mgpfusion. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emmi Jokinen
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Markus Heinonen
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology, Espoo, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
9
|
Spänig S, Heider D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min 2019; 12:7. [PMID: 30867681 PMCID: PMC6399931 DOI: 10.1186/s13040-019-0196-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 02/24/2019] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.
Collapse
Affiliation(s)
- Sebastian Spänig
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| | - Dominik Heider
- Department of Bioinformatics, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg, Germany
| |
Collapse
|
10
|
Cichonska A, Pahikkala T, Szedmak S, Julkunen H, Airola A, Heinonen M, Aittokallio T, Rousu J. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 2018; 34:i509-i518. [PMID: 29949975 PMCID: PMC6022556 DOI: 10.1093/bioinformatics/bty277] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation Code is available at https://github.com/aalto-ics-kepaco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna Cichonska
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
| | - Sandor Szedmak
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Heli Julkunen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
| | - Markus Heinonen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Tero Aittokallio
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Juho Rousu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| |
Collapse
|
11
|
Schumacher FR, Delamarre L, Jhunjhunwala S, Modrusan Z, Phung QT, Elias JE, Lill JR. Building proteomic tool boxes to monitor MHC class I and class II peptides. Proteomics 2017; 17. [PMID: 27928884 DOI: 10.1002/pmic.201600061] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 10/13/2016] [Accepted: 11/25/2016] [Indexed: 01/22/2023]
Abstract
Major histocompatibility complex Class I (MHCI) and Class II (MHCII) presented peptides powerfully modulate T cell immunity and play a vital role in generating effective anti-tumor and anti-viral immune responses in mammals. Characterizing these MHCI or MHCII presented peptides can help generate therapeutic treatments, afford information on T cell mediated biomarkers, provide insight into disease progression, and reduce adverse anti-drug side effects from engineered biotherapeutics. Here, we explore the tools and techniques commonly employed to discover both MHCI- and MHCII-presented peptides. We describe complementary strategies that enhance the characterization of these peptides and the informatics tools employed for both predicting and characterizing MHCI- and MHCII-presented epitopes. The evolution of methodologies for isolating MHC-presented peptides is discussed, as are the mass spectrometric workflows that can be employed for their characterization. We provide a perspective on where this field is headed, and how these tools may be applicable to the discovery and monitoring of epitopes in a variety of scenarios.
Collapse
Affiliation(s)
| | - Lélia Delamarre
- Department of Cancer Immunology, Genentech Inc., San Francisco, CA, USA
| | - Suchit Jhunjhunwala
- Department of Bioinformatics & Computational Biology, Genentech Inc., San Francisco, CA, USA
| | - Zora Modrusan
- Department of Molecular Biology, Genentech Inc., San Francisco, CA, USA
| | - Qui T Phung
- Department of Proteomics and Biological Resources, Genentech Inc., San Francisco, CA, USA
| | - Joshua E Elias
- Department of Chemical & Systems Biology, School of Medicine, Stanford University, San Francisco, CA, USA
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech Inc., San Francisco, CA, USA
| |
Collapse
|
12
|
A Novel Phosphorylation Site-Kinase Network-Based Method for the Accurate Prediction of Kinase-Substrate Relationships. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1826496. [PMID: 29312990 PMCID: PMC5660750 DOI: 10.1155/2017/1826496] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 08/14/2017] [Accepted: 09/05/2017] [Indexed: 01/06/2023]
Abstract
Protein phosphorylation is catalyzed by kinases which regulate many aspects that control death, movement, and cell growth. Identification of the phosphorylation site-specific kinase-substrate relationships (ssKSRs) is important for understanding cellular dynamics and provides a fundamental basis for further disease-related research and drug design. Although several computational methods have been developed, most of these methods mainly use local sequence of phosphorylation sites and protein-protein interactions (PPIs) to construct the prediction model. While phosphorylation presents very complicated processes and is usually involved in various biological mechanisms, the aforementioned information is not sufficient for accurate prediction. In this study, we propose a new and powerful computational approach named KSRPred for ssKSRs prediction, by introducing a novel phosphorylation site-kinase network (pSKN) profiles that can efficiently incorporate the relationships between various protein kinases and phosphorylation sites. The experimental results show that the pSKN profiles can efficiently improve the prediction performance in collaboration with local sequence and PPI information. Furthermore, we compare our method with the existing ssKSRs prediction tools and the results demonstrate that KSRPred can significantly improve the prediction performance compared with existing tools.
Collapse
|
13
|
Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors. PLoS Comput Biol 2017; 13:e1005678. [PMID: 28787438 PMCID: PMC5560747 DOI: 10.1371/journal.pcbi.1005678] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 08/17/2017] [Accepted: 07/11/2017] [Indexed: 01/09/2023] Open
Abstract
Due to relatively high costs and labor required for experimental profiling of the full target space of chemical compounds, various machine learning models have been proposed as cost-effective means to advance this process in terms of predicting the most potent compound-target interactions for subsequent verification. However, most of the model predictions lack direct experimental validation in the laboratory, making their practical benefits for drug discovery or repurposing applications largely unknown. Here, we therefore introduce and carefully test a systematic computational-experimental framework for the prediction and pre-clinical verification of drug-target interactions using a well-established kernel-based regression algorithm as the prediction model. To evaluate its performance, we first predicted unmeasured binding affinities in a large-scale kinase inhibitor profiling study, and then experimentally tested 100 compound-kinase pairs. The relatively high correlation of 0.77 (p < 0.0001) between the predicted and measured bioactivities supports the potential of the model for filling the experimental gaps in existing compound-target interaction maps. Further, we subjected the model to a more challenging task of predicting target interactions for such a new candidate drug compound that lacks prior binding profile information. As a specific case study, we used tivozanib, an investigational VEGF receptor inhibitor with currently unknown off-target profile. Among 7 kinases with high predicted affinity, we experimentally validated 4 new off-targets of tivozanib, namely the Src-family kinases FRK and FYN A, the non-receptor tyrosine kinase ABL1, and the serine/threonine kinase SLK. Our sub-sequent experimental validation protocol effectively avoids any possible information leakage between the training and validation data, and therefore enables rigorous model validation for practical applications. These results demonstrate that the kernel-based modeling approach offers practical benefits for probing novel insights into the mode of action of investigational compounds, and for the identification of new target selectivities for drug repurposing applications.
Collapse
|
14
|
Kanshin E, Giguère S, Jing C, Tyers M, Thibault P. Machine Learning of Global Phosphoproteomic Profiles Enables Discrimination of Direct versus Indirect Kinase Substrates. Mol Cell Proteomics 2017; 16:786-798. [PMID: 28265048 PMCID: PMC5417821 DOI: 10.1074/mcp.m116.066233] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 02/13/2017] [Indexed: 12/12/2022] Open
Abstract
Mass spectrometry allows quantification of tens of thousands of phosphorylation sites from minute amounts of cellular material. Despite this wealth of information, our understanding of phosphorylation-based signaling is limited, in part because it is not possible to deconvolute substrate phosphorylation that is directly mediated by a particular kinase versus phosphorylation that is mediated by downstream kinases. Here, we describe a framework for assignment of direct in vivo kinase substrates using a combination of selective chemical inhibition, quantitative phosphoproteomics, and machine learning techniques. Our workflow allows classification of phosphorylation events following inhibition of an analog-sensitive kinase into kinase-independent effects of the inhibitor, direct effects on cognate substrates, and indirect effects mediated by downstream kinases or phosphatases. We applied this method to identify many direct targets of Cdc28 and Snf1 kinases in the budding yeast Saccharomyces cerevisiae Global phosphoproteome analysis of acute time-series demonstrated that dephosphorylation of direct kinase substrates occurs more rapidly compared with indirect substrates, both after inhibitor treatment and under a physiological nutrient shift in wt cells. Mutagenesis experiments revealed a high proportion of functionally relevant phosphorylation sites on Snf1 targets. For example, Snf1 itself was inhibited through autophosphorylation on Ser391 and new phosphosites were discovered that modulate the activity of the Reg1 regulatory subunit of the Glc7 phosphatase and the Gal83 β-subunit of SNF1 complex. This methodology applies to any kinase for which a functional analog sensitive version can be constructed to facilitate the dissection of the global phosphorylation network.
Collapse
Affiliation(s)
- Evgeny Kanshin
- From the ‡Institute for Research in Immunology and Cancer
| | | | - Cheng Jing
- From the ‡Institute for Research in Immunology and Cancer
| | - Mike Tyers
- From the ‡Institute for Research in Immunology and Cancer,
- §Department of Medicine
| | - Pierre Thibault
- From the ‡Institute for Research in Immunology and Cancer,
- ¶Department of Chemistry, Université de Montréal, C.P. 6128, Succursale centre-ville, Montréal, Québec, H3C 3J7, Canada
| |
Collapse
|
15
|
Sasse A, de Vries SJ, Schindler CEM, de Beauchêne IC, Zacharias M. Rapid Design of Knowledge-Based Scoring Potentials for Enrichment of Near-Native Geometries in Protein-Protein Docking. PLoS One 2017; 12:e0170625. [PMID: 28118389 PMCID: PMC5261736 DOI: 10.1371/journal.pone.0170625] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 01/07/2017] [Indexed: 01/15/2023] Open
Abstract
Protein-protein docking protocols aim to predict the structures of protein-protein complexes based on the structure of individual partners. Docking protocols usually include several steps of sampling, clustering, refinement and re-scoring. The scoring step is one of the bottlenecks in the performance of many state-of-the-art protocols. The performance of scoring functions depends on the quality of the generated structures and its coupling to the sampling algorithm. A tool kit, GRADSCOPT (GRid Accelerated Directly SCoring OPTimizing), was designed to allow rapid development and optimization of different knowledge-based scoring potentials for specific objectives in protein-protein docking. Different atomistic and coarse-grained potentials can be created by a grid-accelerated directly scoring dependent Monte-Carlo annealing or by a linear regression optimization. We demonstrate that the scoring functions generated by our approach are similar to or even outperform state-of-the-art scoring functions for predicting near-native solutions. Of additional importance, we find that potentials specifically trained to identify the native bound complex perform rather poorly on identifying acceptable or medium quality (near-native) solutions. In contrast, atomistic long-range contact potentials can increase the average fraction of near-native poses by up to a factor 2.5 in the best scored 1% decoys (compared to existing scoring), emphasizing the need of specific docking potentials for different steps in the docking protocol.
Collapse
Affiliation(s)
- Alexander Sasse
- Physik Department T38, Technische Universität München, James-Franck-Straße, Garching, Germany
| | - Sjoerd J. de Vries
- Physik Department T38, Technische Universität München, James-Franck-Straße, Garching, Germany
| | | | | | - Martin Zacharias
- Physik Department T38, Technische Universität München, James-Franck-Straße, Garching, Germany
- * E-mail:
| |
Collapse
|
16
|
|
17
|
Sarkes DA, Hurley MM, Stratis-Cullum DN. Unraveling the Roots of Selectivity of Peptide Affinity Reagents for Structurally Similar Ribosomal Inactivating Protein Derivatives. Molecules 2016; 21:E1504. [PMID: 27834872 PMCID: PMC6272918 DOI: 10.3390/molecules21111504] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 11/02/2016] [Accepted: 11/04/2016] [Indexed: 11/17/2022] Open
Abstract
Peptide capture agents have become increasingly useful tools for a variety of sensing applications due to their ease of discovery, stability, and robustness. Despite the ability to rapidly discover candidates through biopanning bacterial display libraries and easily mature them to Protein Catalyzed Capture (PCC) agents with even higher affinity and selectivity, an ongoing challenge and critical selection criteria is that the peptide candidates and final reagent be selective enough to replace antibodies, the gold-standard across immunoassay platforms. Here, we have discovered peptide affinity reagents against abrax, a derivative of abrin with reduced toxicity. Using on-cell Fluorescence Activated Cell Sorting (FACS) assays, we show that the peptides are highly selective for abrax over RiVax, a similar derivative of ricin originally designed as a vaccine, with significant structural homology to abrax. We rank the newly discovered peptides for strongest affinity and analyze three observed consensus sequences with varying affinity and specificity. The strongest (Tier 1) consensus was FWDTWF, which is highly aromatic and hydrophobic. To better understand the observed selectivity, we use the XPairIt peptide-protein docking protocol to analyze binding location predictions of the individual Tier 1 peptides and consensus on abrax and RiVax. The binding location profiles on the two proteins are quite distinct, which we determine is due to differences in pocket size, pocket environment (including hydrophobicity and electronegativity), and steric hindrance. This study provides a model system to show that peptide capture candidates can be quite selective for a structurally similar protein system, even without further maturation, and offers an in silico method of analysis for understanding binding and down-selecting candidates.
Collapse
Affiliation(s)
- Deborah A Sarkes
- Biotechnology Branch, Sensors and Electron Devices Directorate, US Army Research Laboratory, Adelphi, MD 20783, USA.
| | - Margaret M Hurley
- Biotechnology Branch, Sensors and Electron Devices Directorate, US Army Research Laboratory, Adelphi, MD 20783, USA.
| | - Dimitra N Stratis-Cullum
- Biotechnology Branch, Sensors and Electron Devices Directorate, US Army Research Laboratory, Adelphi, MD 20783, USA.
| |
Collapse
|
18
|
Li Z, Tang J, Guo F. Identification of 14-3-3 Proteins Phosphopeptide-Binding Specificity Using an Affinity-Based Computational Approach. PLoS One 2016; 11:e0147467. [PMID: 26828594 PMCID: PMC4734684 DOI: 10.1371/journal.pone.0147467] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 01/04/2016] [Indexed: 11/17/2022] Open
Abstract
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor genes. For each 14-3-3 isoform, we have 1,000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we propose a sampling criteria to build a predictor for each new peptide sequence. Then, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Finally, we consider elastic net to predict affinity values of peptide motifs, based on ridge regression and least absolute shrinkage and selection operator (LASSO). Our method tests on the 1,000 known peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall pearson-product-moment correlation coefficient (PCC) and root mean squared error (RMSE) values of 0.84 and 252.31 for N-terminal sublibrary, and 0.77 and 269.13 for C-terminal sublibrary. We predict affinity values of 16,000 peptide sequences and relative binding ability across six permutated positions similar with experimental values. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs are in the same amino acid category with experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is fast and reliable and is a general computational method that can be used in peptide-protein binding identification in proteomics research.
Collapse
Affiliation(s)
- Zhao Li
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China.,School of Computational Science and Engineering, University of South Carolina, Columbia, United States of America
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China
| |
Collapse
|
19
|
Kuksa PP, Min MR, Dugar R, Gerstein M. High-order neural networks and kernel methods for peptide-MHC binding prediction. Bioinformatics 2015. [PMID: 26206306 DOI: 10.1093/bioinformatics/btv371] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Effective computational methods for peptide-protein binding prediction can greatly help clinical peptide vaccine search and design. However, previous computational methods fail to capture key nonlinear high-order dependencies between different amino acid positions. As a result, they often produce low-quality rankings of strong binding peptides. To solve this problem, we propose nonlinear high-order machine learning methods including high-order neural networks (HONNs) with possible deep extensions and high-order kernel support vector machines to predict major histocompatibility complex-peptide binding. RESULTS The proposed high-order methods improve quality of binding predictions over other prediction methods. With the proposed methods, a significant gain of up to 25-40% is observed on the benchmark and reference peptide datasets and tasks. In addition, for the first time, our experiments show that pre-training with high-order semi-restricted Boltzmann machines significantly improves the performance of feed-forward HONNs. Moreover, our experiments show that the proposed shallow HONN outperform the popular pre-trained deep neural network on most tasks, which demonstrates the effectiveness of modelling high-order feature interactions for predicting major histocompatibility complex-peptide binding. AVAILABILITY AND IMPLEMENTATION There is no associated distributable software. CONTACT renqiang@nec-labs.com or mark.gerstein@yale.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Institute for Biomedical Informatics, Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA, Department of Machine Learning, NEC Laboratories America, Princeton, NJ 08540, USA
| | - Martin Renqiang Min
- Department of Machine Learning, NEC Laboratories America, Princeton, NJ 08540, USA
| | - Rishabh Dugar
- Department of Machine Learning, NEC Laboratories America, Princeton, NJ 08540, USA
| | - Mark Gerstein
- Program of Computational Biology and Bioinformatics and Department of Molecular Biophysics and Biochemistry and Department of Computer Science, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
20
|
Li BYS, Yeung LF, Ko KT. Indefinite kernel ridge regression and its application on QSAR modelling. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.01.060] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
21
|
Tang Q, Nie F, Kang J, Ding H, Zhou P, Huang J. NIEluter: Predicting peptides eluted from HLA class I molecules. J Immunol Methods 2015; 422:22-7. [PMID: 25862605 DOI: 10.1016/j.jim.2015.03.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 02/18/2015] [Accepted: 03/31/2015] [Indexed: 11/30/2022]
Abstract
The immune system has evolved to make a diverse repertoire of peptides processed from self and foreign proteomes, which are displayed in antigen-binding grooves of major histocompatibility complex (MHC) proteins at cell surface for surveillance by T cells. These antigenic peptides are termed Naturally Processed Peptides or Naturally Presented Peptides (NPPs), which play a major role in cell-mediated immunity and rational vaccine design. Therefore, it is intensely desirable to predict NPPs from a given protein antigen, or to foretell if an MHC-binding peptide can be eluted from a given MHC protein. In this paper, we describe NIEluter, an ensemble predictor based on support vector machine (SVM). It consists of a combination of five SVM models trained with position-specific amino acid composition, position-specific dipeptide composition, Hidden Markov Model, binary encoding, and BLOSUM62 feature. NIEluter can predict NPPs of length 8-11 from six HLA alleles (A0201, B0702, B3501, B4403, B5301, and B5701) at present. Evaluated with five-fold cross-validation and independent datasets if available, NIEluter shows good performance. It outperforms MHC-NP in 7 out of 24 types of situation and precedes NetMHC3.2 in most cases, indicating that it is a helpful complement to available tools. NIEluter has been implemented as a free web service, which can be accessed at http://immunet.cn/nie/cgi-bin/nieluter.pl.
Collapse
Affiliation(s)
- Qiang Tang
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fulei Nie
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Juanjuan Kang
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China; Center for Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Peng Zhou
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China; Center for Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jian Huang
- Center of Bioinformatics (COBI), Key Laboratory for NeuroInformation of Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China; Center for Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
22
|
Giguère S, Laviolette F, Marchand M, Tremblay D, Moineau S, Liang X, Biron É, Corbeil J. Machine learning assisted design of highly active peptides for drug discovery. PLoS Comput Biol 2015; 11:e1004074. [PMID: 25849257 PMCID: PMC4388847 DOI: 10.1371/journal.pcbi.1004074] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 12/05/2014] [Indexed: 01/15/2023] Open
Abstract
The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. Part of the complexity of drug discovery is the sheer chemical diversity to explore combined to all requirements a compound must meet to become a commercial drug. Hence, it makes sense to automate this chemical exploration endeavor in a wise, informed, and efficient fashion. Here, we focused on peptides as they have properties that make them excellent drug starting points. Machine learning techniques may replace expensive in-vitro laboratory experiments by learning an accurate model of it. However, computational models also suffer from the combinatorial explosion due to the enormous chemical diversity. Indeed, applying the model to every peptides would take an astronomical amount of computer time. Therefore, given a model, is it possible to determine, using reasonable computational time, the peptide that has the best properties and chance for success? This exact question is what motivated our work. We focused on recent advances in kernel methods and machine learning to learn a model that already had excellent results. We demonstrate that this class of model has mathematical properties that makes it possible to rapidly identify and sort the best peptides. Finally, in-vitro and in-silico results are provided to support and validate this theoretical discovery.
Collapse
Affiliation(s)
- Sébastien Giguère
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
- * E-mail:
| | - François Laviolette
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Mario Marchand
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Denise Tremblay
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Sylvain Moineau
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Xinxia Liang
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Éric Biron
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Jacques Corbeil
- Department of Molecular Medicine, Université Laval, Québec, Canada
| |
Collapse
|
23
|
Xu Y, Luo C, Qian M, Huang X, Zhu S. MHC2MIL: a novel multiple instance learning based method for MHC-II peptide binding prediction by considering peptide flanking region and residue positions. BMC Genomics 2014; 15 Suppl 9:S9. [PMID: 25521198 PMCID: PMC4290625 DOI: 10.1186/1471-2164-15-s9-s9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background Computational prediction of major histocompatibility complex class II (MHC-II) binding peptides can assist researchers in understanding the mechanism of immune systems and developing peptide based vaccines. Although many computational methods have been proposed, the performance of these methods are far from satisfactory. The difficulty of MHC-II peptide binding prediction comes mainly from the large length variation of binding peptides. Methods We develop a novel multiple instance learning based method called MHC2MIL, in order to predict MHC-II binding peptides. We deem each peptide in MHC2MIL as a bag, and some substrings of the peptide as the instances in the bag. Unlike previous multiple instance learning based methods that consider only instances of fixed length 9 (9 amino acids), MHC2MIL is able to deal with instances of both lengths of 9 and 11 (11 amino acids), simultaneously. As such, MHC2MIL incorporates important information in the peptide flanking region. For measuring the distances between different instances, furthermore, MHC2MIL explicitly highlights the amino acids in some important positions. Results Experimental results on a benchmark dataset have shown that, the performance of MHC2MIL is significantly improved by considering the instances of both 9 and 11 amino acids, as well as by emphasizing amino acids at key positions in the instance. The results are consistent with those reported in the literature on MHC-II peptide binding. In addition to five important positions (1, 4, 6, 7 and 9) for HLA(human leukocyte antigen, the name of MHC in Humans) DR peptide binding, we also find that position 2 may play some roles in the binding process. By using 5-fold cross validation on the benchmark dataset, MHC2MIL outperforms two state-of-the-art methods of MHC2SK and NN-align with being statistically significant, on 12 HLA DP and DQ molecules. In addition, it achieves comparable performance with MHC2SK and NN-align on 14 HLA DR molecules. MHC2MIL is freely available at http://datamining-iip.fudan.edu.cn/service/MHC2MIL/index.html.
Collapse
|
24
|
Abstract
The past decade has seen a dramatic expansion in the number and range of techniques available to obtain genome-wide information and to analyze this information so as to infer both the functions of individual molecules and how they interact to modulate the behavior of biological systems. Here, we review these techniques, focusing on the construction of physical protein-protein interaction networks, and highlighting approaches that incorporate protein structure, which is becoming an increasingly important component of systems-level computational techniques. We also discuss how network analyses are being applied to enhance our basic understanding of biological systems and their disregulation, as well as how these networks are being used in drug development.
Collapse
Affiliation(s)
- Donald Petrey
- Center for Computational Biology and Bioinformatics, Department of Systems Biology
| | | |
Collapse
|
25
|
Giguère S, Drouin A, Lacoste A, Marchand M, Corbeil J, Laviolette F. MHC-NP: Predicting peptides naturally processed by the MHC. J Immunol Methods 2013; 400-401:30-6. [DOI: 10.1016/j.jim.2013.10.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 10/05/2013] [Indexed: 10/26/2022]
|
26
|
Guo L, Luo C, Zhu S. MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction. BMC Genomics 2013; 14 Suppl 5:S11. [PMID: 24564280 PMCID: PMC3852073 DOI: 10.1186/1471-2164-14-s5-s11] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computational methods for the prediction of Major Histocompatibility Complex (MHC) class II binding peptides play an important role in facilitating the understanding of immune recognition and the process of epitope discovery. To develop an effective computational method, we need to consider two important characteristics of the problem: (1) the length of binding peptides is highly flexible; and (2) MHC molecules are extremely polymorphic and for the vast majority of them there are no sufficient training data. METHODS We develop a novel string kernel MHC2SK (MHC-II String Kernel) method to measure the similarities among peptides with variable lengths. By considering the distinct features of MHC-II peptide binding prediction problem, MHC2SK differs significantly from the recently developed kernel based method, GS (Generic String) kernel, in the way of computing similarities. Furthermore, we extend MHC2SK to MHC2SKpan for pan-specific MHC-II peptide binding prediction by leveraging the binding data of various MHC molecules. RESULTS MHC2SK outperformed GS in allele specific prediction using a benchmark dataset, which demonstrates the effectiveness of MHC2SK. Furthermore, we evaluated the performance of MHC2SKpan using various benckmark data sets from several different perspectives: Leave-one-allele-out (LOO), 5-fold cross validation as well as independent data testing. MHC2SKpan has achieved comparable performance with NetMHCIIpan-2.0 and outperformed NetMHCIIpan-1.0, TEPITOPEpan and MultiRTA, being statistically significant. MHC2SKpan can be freely accessed at http://datamining-iip.fudan.edu.cn/service/MHC2SKpan/index.html.
Collapse
|