1
|
Baskaran MM, Ramanujam J, Sadayappan P. Automatic C-to-CUDA Code Generation for Affine Programs. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-11970-5_14] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
|
15 |
100 |
2
|
Auer AA, Baumgartner G, Bernholdt DE, Bibireata A, Choppella V, Cociorva D, Gao X, Harrison R, Krishnamoorthy S, Krishnan S, Lam CC, Lu Q, Nooijen M, Pitzer R, Ramanujam J, Sadayappan P, Sibiryakov A. Automatic code generation for many-body electronic structure methods: the tensor contraction engine‡‡. Mol Phys 2006. [DOI: 10.1080/00268970500275780] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
|
19 |
68 |
3
|
Bondhugula U, Hartono A, Ramanujam J, Sadayappan P. A practical automatic polyhedral parallelizer and locality optimizer. ACTA ACUST UNITED AC 2008. [DOI: 10.1145/1379022.1375595] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.
Collapse
|
|
17 |
62 |
4
|
Henretty T, Stock K, Pouchet LN, Franchetti F, Ramanujam J, Sadayappan P. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-19861-8_13] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
|
14 |
46 |
5
|
Hartono A, Lu Q, Henretty T, Krishnamoorthy S, Zhang H, Baumgartner G, Bernholdt DE, Nooijen M, Pitzer R, Ramanujam J, Sadayappan P. Performance Optimization of Tensor Contraction Expressions for Many-Body Methods in Quantum Chemistry. J Phys Chem A 2009; 113:12715-23. [DOI: 10.1021/jp9051215] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
|
16 |
23 |
6
|
Fang Y, Ding Y, Feinstein WP, Koppelman DM, Moreno J, Jarrell M, Ramanujam J, Brylinski M. GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing. PLoS One 2016; 11:e0158898. [PMID: 27420300 PMCID: PMC4946785 DOI: 10.1371/journal.pone.0158898] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 06/23/2016] [Indexed: 12/19/2022] Open
Abstract
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249.
Collapse
|
Journal Article |
9 |
15 |
7
|
Hartono A, Sibiryakov A, Nooijen M, Baumgartner G, Bernholdt DE, Hirata S, Lam CC, Pitzer RM, Ramanujam J, Sadayappan P. Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11428831_20] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
|
20 |
13 |
8
|
Kandemir M, Ramanujam J, Choudhary A. Exploiting shared scratch pad memory space in embedded multiprocessor systems. ACTA ACUST UNITED AC 2002. [DOI: 10.1145/513918.513974] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
|
23 |
12 |
9
|
Liu G, Singha M, Pu L, Neupane P, Feinstein J, Wu HC, Ramanujam J, Brylinski M. GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data. J Cheminform 2021; 13:58. [PMID: 34380569 PMCID: PMC8356453 DOI: 10.1186/s13321-021-00540-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/31/2021] [Indexed: 12/22/2022] Open
Abstract
Traditional techniques to identify macromolecular targets for drugs utilize solely the information on a query drug and a putative target. Nonetheless, the mechanisms of action of many drugs depend not only on their binding affinity toward a single protein, but also on the signal transduction through cascades of molecular interactions leading to certain phenotypes. Although using protein-protein interaction networks and drug-perturbed gene expression profiles can facilitate system-level investigations of drug-target interactions, utilizing such large and heterogeneous data poses notable challenges. To improve the state-of-the-art in drug target identification, we developed GraphDTI, a robust machine learning framework integrating the molecular-level information on drugs, proteins, and binding sites with the system-level information on gene expression and protein-protein interactions. In order to properly evaluate the performance of GraphDTI, we compiled a high-quality benchmarking dataset and devised a new cluster-based cross-validation protocol. Encouragingly, GraphDTI not only yields an AUC of 0.996 against the validation dataset, but it also generalizes well to unseen data with an AUC of 0.939, significantly outperforming other predictors. Finally, selected examples of identified drugtarget interactions are validated against the biomedical literature. Numerous applications of GraphDTI include the investigation of drug polypharmacological effects, side effects through offtarget binding, and repositioning opportunities.
Collapse
|
|
4 |
9 |
10
|
|
|
30 |
8 |
11
|
Hartono A, Lu Q, Gao X, Krishnamoorthy S, Nooijen M, Baumgartner G, Bernholdt DE, Choppella V, Pitzer RM, Ramanujam J, Rountev A, Sadayappan P. Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations. COMPUTATIONAL SCIENCE – ICCS 2006 2006. [DOI: 10.1007/11758501_39] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
|
19 |
8 |
12
|
Krishnan S, Krishnamoorthy S, Baumgartner G, Cociorva D, Lam CC, Sadayappan P, Ramanujam J, Bernholdt DE, Choppella V. Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms. HIGH PERFORMANCE COMPUTING - HIPC 2003 2003. [DOI: 10.1007/978-3-540-24596-4_44] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
|
22 |
8 |
13
|
Singha M, Pu L, Srivastava G, Ni X, Stanfield BA, Uche IK, Rider PJF, Kousoulas KG, Ramanujam J, Brylinski M. Unlocking the Potential of Kinase Targets in Cancer: Insights from CancerOmicsNet, an AI-Driven Approach to Drug Response Prediction in Cancer. Cancers (Basel) 2023; 15:4050. [PMID: 37627077 PMCID: PMC10452340 DOI: 10.3390/cancers15164050] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/16/2023] [Accepted: 07/26/2023] [Indexed: 08/27/2023] Open
Abstract
Deregulated protein kinases are crucial in promoting cancer cell proliferation and driving malignant cell signaling. Although these kinases are essential targets for cancer therapy due to their involvement in cell development and proliferation, only a small part of the human kinome has been targeted by drugs. A comprehensive scoring system is needed to evaluate and prioritize clinically relevant kinases. We recently developed CancerOmicsNet, an artificial intelligence model employing graph-based algorithms to predict the cancer cell response to treatment with kinase inhibitors. The performance of this approach has been evaluated in large-scale benchmarking calculations, followed by the experimental validation of selected predictions against several cancer types. To shed light on the decision-making process of CancerOmicsNet and to better understand the role of each kinase in the model, we employed a customized saliency map with adjustable channel weights. The saliency map, functioning as an explainable AI tool, allows for the analysis of input contributions to the output of a trained deep-learning model and facilitates the identification of essential kinases involved in tumor progression. The comprehensive survey of biomedical literature for essential kinases selected by CancerOmicsNet demonstrated that it could help pinpoint potential druggable targets for further investigation in diverse cancer types.
Collapse
|
research-article |
2 |
7 |
14
|
Goel A, Ramanujam J. A neural architecture for a class of abduction problems. ACTA ACUST UNITED AC 1996; 26:854-60. [DOI: 10.1109/3477.544299] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
|
29 |
6 |
15
|
Shi W, Lemoine JM, Shawky AEMA, Singha M, Pu L, Yang S, Ramanujam J, Brylinski M. BionoiNet: ligand-binding site classification with off-the-shelf deep neural network. Bioinformatics 2020; 36:3077-3083. [PMID: 32053156 DOI: 10.1093/bioinformatics/btaa094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 01/27/2020] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide- and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. RESULTS We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide- and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures. AVAILABILITY AND IMPLEMENTATION BionoiNet is implemented in Python with the source code freely available at: https://github.com/CSBG-LSU/BionoiNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
Research Support, N.I.H., Extramural |
5 |
6 |
16
|
Feinstein J, Shi W, Ramanujam J, Brylinski M. Bionoi: A Voronoi Diagram-Based Representation of Ligand-Binding Sites in Proteins for Machine Learning Applications. Methods Mol Biol 2021; 2266:299-312. [PMID: 33759134 DOI: 10.1007/978-1-0716-1209-5_17] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Bionoi is a new software to generate Voronoi representations of ligand-binding sites in proteins for machine learning applications. Unlike many other deep learning models in biomedicine, Bionoi utilizes off-the-shelf convolutional neural network architectures, reducing the development work without sacrificing the performance. When initially generating images of binding sites, users have the option to color the Voronoi cells based on either one of six structural, physicochemical, and evolutionary properties, or a blend of all six individual properties. Encouragingly, after inputting images generated by Bionoi into the convolutional autoencoder, the network was able to effectively learn the most salient features of binding pockets. The accuracy of the generated model is evaluated both visually and numerically through the reconstruction of binding site images from the latent feature space. The generated feature vectors capture well various properties of binding sites and thus can be applied in a multitude of machine learning projects. As a demonstration, we trained the ResNet-18 architecture from Microsoft on Bionoi images to show that it is capable to effectively classify nucleotide- and heme-binding pockets against a large dataset of control pockets binding a variety of small molecules. Bionoi is freely available to the research community at https://github.com/CSBG-LSU/BionoiNet.
Collapse
|
Research Support, N.I.H., Extramural |
4 |
4 |
17
|
Singha M, Pu L, Stanfield BA, Uche IK, Rider PJF, Kousoulas KG, Ramanujam J, Brylinski M. Artificial intelligence to guide precision anticancer therapy with multitargeted kinase inhibitors. BMC Cancer 2022; 22:1211. [PMID: 36434556 PMCID: PMC9694576 DOI: 10.1186/s12885-022-10293-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/07/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Vast amounts of rapidly accumulating biological data related to cancer and a remarkable progress in the field of artificial intelligence (AI) have paved the way for precision oncology. Our recent contribution to this area of research is CancerOmicsNet, an AI-based system to predict the therapeutic effects of multitargeted kinase inhibitors across various cancers. This approach was previously demonstrated to outperform other deep learning methods, graph kernel models, molecular docking, and drug binding pocket matching. METHODS CancerOmicsNet integrates multiple heterogeneous data by utilizing a deep graph learning model with sophisticated attention propagation mechanisms to extract highly predictive features from cancer-specific networks. The AI-based system was devised to provide more accurate and robust predictions than data-driven therapeutic discovery using gene signature reversion. RESULTS Selected CancerOmicsNet predictions obtained for "unseen" data are positively validated against the biomedical literature and by live-cell time course inhibition assays performed against breast, pancreatic, and prostate cancer cell lines. Encouragingly, six molecules exhibited dose-dependent antiproliferative activities, with pan-CDK inhibitor JNJ-7706621 and Src inhibitor PP1 being the most potent against the pancreatic cancer cell line Panc 04.03. CONCLUSIONS CancerOmicsNet is a promising AI-based platform to help guide the development of new approaches in precision oncology involving a variety of tumor types and therapeutics.
Collapse
|
research-article |
3 |
4 |
18
|
Cociorva D, Baumgartner G, Lam CC, Sadayappan P, Ramanujam J, Nooijen M, Bernholdt DE, Harrison R. Space-time trade-off optimization for a class of electronic structure calculations. ACTA ACUST UNITED AC 2002. [DOI: 10.1145/543552.512551] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The accurate modeling of the electronic structure of atoms and molecules is very computationally intensive. Many models of electronic structure, such as the Coupled Cluster approach, involve collections of tensor contractions. There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number of arithmetic operations. In this paper, we present an algorithm that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit. Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.
Collapse
|
|
23 |
3 |
19
|
Singha M, Pu L, Shawky A, Busch K, Wu H, Ramanujam J, Brylinski M. GraphGR: A graph neural network to predict the effect of pharmacotherapy on the cancer cell growth.. [DOI: 10.1101/2020.05.20.107458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
AbstractGenomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to treatment with drugs. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. High prediction accuracies reported in the literature likely result from significant overlaps among training, validation, and testing sets, making many predictors inapplicable to new data. To address these issues, we developed GraphGR, a graph neural network with sophisticated attention propagation mechanisms to predict the therapeutic effects of kinase inhibitors across various tumors. Emphasizing on the system-level complexity of cancer, GraphGR integrates multiple heterogeneous data, such as biological networks, genomics, inhibitor profiling, and genedisease associations, into a unified graph structure. In order to construct diverse and information-rich cancer-specific networks, we devised a novel graph reduction protocol based on not only the topological information, but also the biological knowledge. The performance of GraphGR, properly cross-validated at the tissue level, is 0.83 in terms of the area under the receiver operating characteristics, which is notably higher than those measured for other approaches on the same data. Finally, several new predictions are validated against the biomedical literature demonstrating that GraphGR generalizes well to unseen data, i.e. it can predict therapeutic effects across a variety of cancer cell lines and inhibitors. GraphGR is freely available to the academic community at https://github.com/pulimeng/GraphGR.
Collapse
|
|
5 |
2 |
20
|
Sadayappan P, Ercal F, Ramanujam J. Partitioning graphs on message-passing machines by pairwise mincut. Inf Sci (N Y) 1998. [DOI: 10.1016/s0020-0255(98)10005-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
|
27 |
1 |
21
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects. RESEARCH SQUARE 2023:rs.3.rs-3481858. [PMID: 37961281 PMCID: PMC10635365 DOI: 10.21203/rs.3.rs-3481858/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8,798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that Random Forest and Gradient Boosting Trees models trained on the augmented data achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.
Collapse
|
Preprint |
2 |
|
22
|
Shi W, Singha M, Srivastava G, Pu L, Ramanujam J, Brylinski M. Pocket2Drug: An Encoder-Decoder Deep Neural Network for the Target-Based Drug Design. Front Pharmacol 2022; 13:837715. [PMID: 35359869 PMCID: PMC8962739 DOI: 10.3389/fphar.2022.837715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 02/10/2022] [Indexed: 11/13/2022] Open
Abstract
Computational modeling is an essential component of modern drug discovery. One of its most important applications is to select promising drug candidates for pharmacologically relevant target proteins. Because of continuing advances in structural biology, putative binding sites for small organic molecules are being discovered in numerous proteins linked to various diseases. These valuable data offer new opportunities to build efficient computational models predicting binding molecules for target sites through the application of data mining and machine learning. In particular, deep neural networks are powerful techniques capable of learning from complex data in order to make informed drug binding predictions. In this communication, we describe Pocket2Drug, a deep graph neural network model to predict binding molecules for a given a ligand binding site. This approach first learns the conditional probability distribution of small molecules from a large dataset of pocket structures with supervised training, followed by the sampling of drug candidates from the trained model. Comprehensive benchmarking simulations show that using Pocket2Drug significantly improves the chances of finding molecules binding to target pockets compared to traditional drug selection procedures. Specifically, known binders are generated for as many as 80.5% of targets present in the testing set consisting of dissimilar data from that used to train the deep graph neural network model. Overall, Pocket2Drug is a promising computational approach to inform the discovery of novel biopharmaceuticals.
Collapse
|
|
3 |
|
23
|
Liu M, Ni X, Ramanujam J, Brylinski M. EC2Vec: A Machine Learning Method to Embed Enzyme Commission (EC) Numbers into Vector Representations. J Chem Inf Model 2025; 65:2173-2179. [PMID: 39981640 PMCID: PMC11898066 DOI: 10.1021/acs.jcim.4c02161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 02/10/2025] [Accepted: 02/10/2025] [Indexed: 02/22/2025]
Abstract
Enzyme commission (EC) numbers play a vital role in classifying enzymes and understanding their functions in enzyme-related research. Although accurate and informative encoding of EC numbers is essential for enhancing the effectiveness of machine learning applications, simple EC encoding approaches suffer from limitations such as false numerical order and high sparsity. To address these issues, we developed EC2Vec, a multimodal autoencoder that preserves the categorical nature of EC numbers and leverages their hierarchical relationships, resulting in more meaningful and informative representations. EC2Vec encodes each digit of the EC number as a categorical token and then processes these embeddings through a 1D convolutional layer to capture their relationships. Comprehensive benchmarking against a large collection of EC numbers indicates that EC2Vec outperforms simple encoding methods. The t-SNE visualization of EC2Vec embeddings revealed distinct clusters corresponding to different enzyme classes, demonstrating that the hierarchical structure of the EC numbers is effectively captured. In downstream machine learning applications, EC2Vec embeddings outperformed other EC encoding methods in the reaction-EC pair classification task, underscoring its robustness and utility for enzyme-related research and bioinformatics applications.
Collapse
|
brief-report |
1 |
|
24
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. Augmented drug combination dataset to improve the performance of machine learning models predicting synergistic anticancer effects. Sci Rep 2024; 14:1668. [PMID: 38238448 PMCID: PMC10796434 DOI: 10.1038/s41598-024-51940-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/11/2024] [Indexed: 01/22/2024] Open
Abstract
Combination therapy has gained popularity in cancer treatment as it enhances the treatment efficacy and overcomes drug resistance. Although machine learning (ML) techniques have become an indispensable tool for discovering new drug combinations, the data on drug combination therapy currently available may be insufficient to build high-precision models. We developed a data augmentation protocol to unbiasedly scale up the existing anti-cancer drug synergy dataset. Using a new drug similarity metric, we augmented the synergy data by substituting a compound in a drug combination instance with another molecule that exhibits highly similar pharmacological effects. Using this protocol, we were able to upscale the AZ-DREAM Challenges dataset from 8798 to 6,016,697 drug combinations. Comprehensive performance evaluations show that ML models trained on the augmented data consistently achieve higher accuracy than those trained solely on the original dataset. Our data augmentation protocol provides a systematic and unbiased approach to generating more diverse and larger-scale drug combination datasets, enabling the development of more precise and effective ML models. The protocol presented in this study could serve as a foundation for future research aimed at discovering novel and effective drug combinations for cancer treatment.
Collapse
|
research-article |
1 |
|
25
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. SynerGNet: A Graph Neural Network Model to Predict Anticancer Drug Synergy. Biomolecules 2024; 14:253. [PMID: 38540674 PMCID: PMC10967862 DOI: 10.3390/biom14030253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 01/03/2025] Open
Abstract
Drug combination therapy shows promise in cancer treatment by addressing drug resistance, reducing toxicity, and enhancing therapeutic efficacy. However, the intricate and dynamic nature of biological systems makes identifying potential synergistic drugs a costly and time-consuming endeavor. To facilitate the development of combination therapy, techniques employing artificial intelligence have emerged as a transformative solution, providing a sophisticated avenue for advancing existing therapeutic approaches. In this study, we developed SynerGNet, a graph neural network model designed to accurately predict the synergistic effect of drug pairs against cancer cell lines. SynerGNet utilizes cancer-specific featured graphs created by integrating heterogeneous biological features into the human protein-protein interaction network, followed by a reduction process to enhance topological diversity. Leveraging synergy data provided by AZ-DREAM Challenges, the model yields a balanced accuracy of 0.68, significantly outperforming traditional machine learning. Encouragingly, augmenting the training data with carefully constructed synthetic instances improved the balanced accuracy of SynerGNet to 0.73. Finally, the results of an independent validation conducted against DrugCombDB demonstrated that it exhibits a strong performance when applied to unseen data. SynerGNet shows a great potential in detecting drug synergy, positioning itself as a valuable tool that could contribute to the advancement of combination therapy for cancer treatment.
Collapse
|
research-article |
1 |
|