1
|
Jurczak S, Druchok M. Cancer Immunotherapies Ignited by a Thorough Machine Learning-Based Selection of Neoantigens. Adv Biol (Weinh) 2024; 8:e2400114. [PMID: 38971967 DOI: 10.1002/adbi.202400114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 06/02/2024] [Indexed: 07/08/2024]
Abstract
Identification of neoantigens, derived from somatic DNA alterations, emerges as a promising strategy for cancer immunotherapies. However, not all somatic mutations result in immunogenicity, hence, efficient tools to predict the immunogenicity of neoepitopes are needed. A pipeline is presented that provides a comprehensive solution for the identification of neoepitopes based on genomic sequencing data. The pipeline consists of a data pre-processing step and three machine learning predictive steps. The pre-processing step analyzes genomic data for different types of alterations, produces a list of all possible antigens, and determines the human leukocyte antigen (HLA) type and T-cell receptor (TCR) repertoire. The first predictive step performs a classification into antigens and neoantigens, selecting neoantigens for further consideration. The next step predicts the strength of binding between neoantigens and available major histocompatibility complexes of class I (MHC-I). The third step is engaged to predict the likelihood of inducing an immune response. Neoepitopes satisfying all three predictive stages are assumed to be potent candidates to ensure immunogenicity. The predictive pipeline is used in two regimes: selecting neoantigens from patients' sequencing data and generating novel neoantigen candidates. Two different techniques - Monte Carlo and Reinforcement Learning - are implemented to facilitate the generative regime.
Collapse
Affiliation(s)
- Sebastian Jurczak
- SoftServe Inc., 11/13 Building B, Jaworska St., Wroclaw, 53-612, Poland
| | - Maksym Druchok
- SoftServe Inc., 2d Sadova St., Lviv, 79021, Ukraine
- Institute for Condensed Matter Physics, 1 Svientsitskii St., Lviv, 79011, Ukraine
| |
Collapse
|
2
|
Rovenchak A, Druchok M. Machine learning-assisted search for novel coagulants: When machine learning can be efficient even if data availability is low. J Comput Chem 2024; 45:937-952. [PMID: 38174834 DOI: 10.1002/jcc.27292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/04/2023] [Accepted: 12/10/2023] [Indexed: 01/05/2024]
Abstract
Design of new drugs is a challenging process: a candidate molecule should satisfy multiple conditions to act properly and make the least side-effect-perfect candidates selectively attach to and influence only targets, leaving off-targets intact. The amount of experimental data about various properties of molecules constantly grows, promoting data-driven approaches. However, the applicability of typical predictive machine learning techniques can be substantially limited by a lack of experimental data about a particular target. For example, there are many known Thrombin inhibitors (acting as anticoagulants), but a very limited number of known Protein C inhibitors (coagulants). In this study, we present our approach to suggest new inhibitor candidates by building an effective representation of chemical space. For this aim, we developed a deep learning model-autoencoder, trained on a large set of molecules in the SMILES format to map the chemical space. Further, we applied different sampling strategies to generate novel coagulant candidates. Symmetrically, we tested our approach on anticoagulant candidates, where we were able to predict their inhibition towards Thrombin. We also compare our approach with MegaMolBART-another deep learning generative model, but exploiting similar principles of navigation in a chemical space.
Collapse
Affiliation(s)
- Andrij Rovenchak
- SoftServe, Inc., Lviv, Ukraine
- Professor Ivan Vakarchuk Department for Theoretical Physics, Ivan Franko National University of Lviv, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine
- Institute for Condensed Matter Physics, Lviv, Ukraine
| |
Collapse
|
3
|
Zhu X, Zhang P, Jiang H, Kuang J, Wu L. Using the Super Learner algorithm to predict risk of major adverse cardiovascular events after percutaneous coronary intervention in patients with myocardial infarction. BMC Med Res Methodol 2024; 24:59. [PMID: 38459490 PMCID: PMC10921576 DOI: 10.1186/s12874-024-02179-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/14/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND The primary treatment for patients with myocardial infarction (MI) is percutaneous coronary intervention (PCI). Despite this, the incidence of major adverse cardiovascular events (MACEs) remains a significant concern. Our study seeks to optimize PCI predictive modeling by employing an ensemble learning approach to identify the most effective combination of predictive variables. METHODS AND RESULTS We conducted a retrospective, non-interventional analysis of MI patient data from 2018 to 2021, focusing on those who underwent PCI. Our principal metric was the occurrence of 1-year postoperative MACEs. Variable selection was performed using lasso regression, and predictive models were developed using the Super Learner (SL) algorithm. Model performance was appraised by the area under the receiver operating characteristic curve (AUC) and the average precision (AP) score. Our cohort included 3,880 PCI patients, with 475 (12.2%) experiencing MACEs within one year. The SL model exhibited superior discriminative performance, achieving a validated AUC of 0.982 and an AP of 0.971, which markedly surpassed the traditional logistic regression models (AUC: 0.826, AP: 0.626) in the test cohort. Thirteen variables were significantly associated with the occurrence of 1-year MACEs. CONCLUSION Implementing the Super Learner algorithm has substantially enhanced the predictive accuracy for the risk of MACEs in MI patients. This advancement presents a promising tool for clinicians to craft individualized, data-driven interventions to better patient outcomes.
Collapse
Affiliation(s)
- Xiang Zhu
- Jiangxi Provincial Key Laboratory of Preventive Medicine, School of Public Health, Nanchang University, 461 BaYi St, Nanchang, 330006, People's Republic of China
| | - Pin Zhang
- School of Public Health and Management, Nanchang Medical College, Nanchang, People's Republic of China
| | - Han Jiang
- Department of Cardiology, Second Affiliated Hospital of Nanchang University, Nanchang, 330006, People's Republic of China
| | - Jie Kuang
- Jiangxi Provincial Key Laboratory of Preventive Medicine, School of Public Health, Nanchang University, 461 BaYi St, Nanchang, 330006, People's Republic of China
| | - Lei Wu
- Jiangxi Provincial Key Laboratory of Preventive Medicine, School of Public Health, Nanchang University, 461 BaYi St, Nanchang, 330006, People's Republic of China.
| |
Collapse
|
4
|
Pandiyan S, Wang L. A comprehensive review on recent approaches for cancer drug discovery associated with artificial intelligence. Comput Biol Med 2022; 150:106140. [PMID: 36179510 DOI: 10.1016/j.compbiomed.2022.106140] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 07/20/2022] [Accepted: 09/18/2022] [Indexed: 11/03/2022]
Abstract
Through the revolutionization of artificial intelligence (AI) technologies in clinical research, significant improvement is observed in diagnosis of cancer. Utilization of these AI technologies, such as machine and deep learning, is imperative for the discovery of novel anticancer drugs and improves existing/ongoing cancer therapeutics. However, building a model for complicated cancers and their types remains a challenge due to lack of effective therapeutics that hinder the establishment of effective computational tools. In this review, we exploit recent approaches and state-of-the-art in implementing AI methods for anticancer drug discovery, and discussed how advances in these applications need to be considered in the current cancer therapeutics. Considering the immense potential of AI, we explore molecular docking and their interactions to recognize metabolic activities that support drug design. Finally, we highlight corresponding strategies in applying machine and deep learning methods to various types of cancer with their pros and cons.
Collapse
Affiliation(s)
- Sanjeevi Pandiyan
- Research Center for Intelligent Information Technology, Nantong University, Nantong, China; School of Information Science and Technology, Nantong University, Nantong, China; Nantong Research Institute for Advanced Communication Technologies, Nantong, China
| | - Li Wang
- Research Center for Intelligent Information Technology, Nantong University, Nantong, China; School of Information Science and Technology, Nantong University, Nantong, China; Nantong Research Institute for Advanced Communication Technologies, Nantong, China.
| |
Collapse
|
5
|
Yarish D, Garkot S, Grygorenko OO, Radchenko DS, Moroz YS, Gurbych O. Advancing molecular graphs with descriptors for the prediction of chemical reaction yields. J Comput Chem 2022; 44:76-92. [PMID: 36264601 DOI: 10.1002/jcc.27016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/08/2022]
Abstract
Chemical yield is the percentage of the reactants converted to the desired products. Chemists use predictive algorithms to select high-yielding reactions and score synthesis routes, saving time and reagents. This study suggests a novel graph neural network architecture for chemical yield prediction. The network combines structural information about participants of the transformation as well as molecular and reaction-level descriptors. It works with incomplete chemical reactions and generates reactants-product atom mapping. We show that the network benefits from advanced information by comparing it with several machine learning models and molecular representations. Models included logistic regression, support vector machine, CatBoost, and Bidirectional Encoder Representations from Transformers. Molecular representations included extended-connectivity fingerprints, Morgan fingerprints, SMILESVec embeddings, and textual. Classification and regression objectives were assessed for each model and feature set. The goal of each classification model was to separate zero- and non-zero-yielding reactions. The models were trained and evaluated on a proprietary dataset of 10 reaction types. Also, the models were benchmarked on two public single reaction type datasets. The study was supplemented with analysis of data, results, and errors, as well as the impact of steric factors, side reactions, isolation, and purification efficiency. The supplementary code is available at https://github.com/SoftServeInc/yield-paper.
Collapse
Affiliation(s)
| | - Sofiya Garkot
- SoftServe, Inc., Lviv, Ukraine.,Ukrainian Catholic University, Lviv, Ukraine
| | - Oleksandr O Grygorenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Dmytro S Radchenko
- Enamine Ltd., Kyiv, Ukraine.,Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Yurii S Moroz
- Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.,Chemspace LLC, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Lviv Polytechnic National University, Lviv, Ukraine.,Blackthorn AI, Ltd., London, UK
| |
Collapse
|
6
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
7
|
Nikolaienko T, Gurbych O, Druchok M. Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network. J Comput Chem 2022; 43:728-739. [PMID: 35201629 DOI: 10.1002/jcc.26831] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/04/2022] [Accepted: 02/09/2022] [Indexed: 12/12/2022]
Abstract
Drug discovery pipelines typically involve high-throughput screening of large amounts of compounds in a search of potential drugs candidates. As a chemical space of small organic molecules is huge, a "navigation" over it urges for fast and lightweight computational methods, thus promoting machine-learning approaches for processing huge pools of candidates. In this contribution, we present a graph-based deep neural network for prediction of protein-drug binding affinity and assess its predictive power under thorough testing conditions. Within the suggested approach, both protein and drug molecules are represented as graphs and passed to separate graph sub-networks, then concatenated and regressed towards a binding affinity. The neural network is trained on two binding affinity datasets-PDBbind and data imported from RCSB Protein Data Bank. In order to explore the generalization capabilities of the model we go beyond traditional random or leave-cluster-out techniques and demonstrate the need for more elaborate model performance assessment - six different strategies for test/train data partitioning (random, time- and property-arranged, protein- and ligand-clustered) with a k-fold cross-validation are engaged. Finally, we discuss the model performance in terms of a set of metrics for different split strategies and fold arrangement. Our code is available at https://github.com/SoftServeInc/affinity-by-GNN.
Collapse
Affiliation(s)
- Tymofii Nikolaienko
- SoftServe, Inc., Lviv, Ukraine.,Faculty of Physics, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Oleksandr Gurbych
- Blackthorn AI Ltd., London, UK.,Department of Artificial Intelligence Systems, Lviv Polytechnic National University, Lviv, Ukraine
| | - Maksym Druchok
- SoftServe, Inc., Lviv, Ukraine.,Institute for Condensed Matter Physics, NAS of Ukraine, Lviv, Ukraine
| |
Collapse
|
8
|
Basciu A, Callea L, Motta S, Bonvin AM, Bonati L, Vargiu AV. No dance, no partner! A tale of receptor flexibility in docking and virtual screening. VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
|