Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023;120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open

For:	Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023;120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open

Number

Cited by Other Article(s)

Liu M, Wang K, Zhang Y, Zhou X, Li W, Han W. Mechanistic Study of Protein Interaction with Natto Inhibitory Peptides Targeting Xanthine Oxidase: Insights from Machine Learning and Molecular Dynamics Simulations. J Chem Inf Model 2025;65:3682-3696. [PMID: 40125929 DOI: 10.1021/acs.jcim.5c00126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]

Ünlü A, Ulusoy E, Yiğit MG, Darcan M, Doğan T. Protein language models for predicting drug-target interactions: Novel approaches, emerging methods, and future directions. Curr Opin Struct Biol 2025;91:103017. [PMID: 39985946 DOI: 10.1016/j.sbi.2025.103017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/28/2025] [Accepted: 01/29/2025] [Indexed: 02/24/2025]

Shao Y, Liu T. iNClassSec-ESM: Discovering potential non-classical secreted proteins through a novel protein language model. Comput Struct Biotechnol J 2025;27:1350-1358. [PMID: 40235638 PMCID: PMC11999076 DOI: 10.1016/j.csbj.2025.03.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/15/2025] [Accepted: 03/26/2025] [Indexed: 04/17/2025] Open

Abstract

Non-classical secreted proteins (NCSPs) are a class of proteins lacking signal peptides, secreted by Gram-positive bacteria through non-classical secretion pathways. With the increasing demand for highly secreted proteins in recent years, non-classical secretion pathways have received more attention due to their advantages over classical secretion pathways (Sec/Tat). However, because the mechanisms of non-classical secretion pathways are not yet clear, identifying NCSPs through biological experiments is expensive and time-consuming, making it imperative to develop computational methods to address this issue. Existing NCSP prediction methods mainly use traditional handcrafted features to represent proteins from sequence information, which limits the models' ability to capture complex protein characteristics. In this study, we proposed a novel NCSP predictor, iNClassSec-ESM, which combined deep learning with traditional classifiers to enhance prediction performance. iNClassSec-ESM integrates an XGBoost model trained on comprehensive handcrafted features and a Deep Neural Network (DNN) trained on hidden layer embeddings from the protein language model (PLM) ESM3. The ESM3 is the recently proposed multimodal PLM and has not yet been fully explored in terms of protein representation. Therefore, we extracted hidden layer embeddings from ESM3 as inputs for multiple classifiers and deep learning networks, and compared them with existing PLMs. Benchmark experiments indicate that iNClassSec-ESM outperforms most of existing methods across multiple performance metrics and could serve as an effective tool for discovering potential NCSPs. Additionally, the ESM3 hidden layer embeddings, as an innovative protein representation method, show great potential for the application in broader protein-related classification tasks. The source code of iNClassSec-ESM and the ESM3 embeddings extraction script are publicly available at https://github.com/AmamiyaHoshie/iNClassSec-ESM/.

Collapse

Du Z, Fu W, Guo X, Caragea D, Li Y. FusionESP: Improved Enzyme-Substrate Pair Prediction by Fusing Protein and Chemical Knowledge. J Chem Inf Model 2025;65:2806-2817. [PMID: 40035691 DOI: 10.1021/acs.jcim.4c02357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]

NaderiAlizadeh N, Singh R. Aggregating residue-level protein language model embeddings with optimal transport. BIOINFORMATICS ADVANCES 2025;5:vbaf060. [PMID: 40170888 PMCID: PMC11961220 DOI: 10.1093/bioadv/vbaf060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 02/13/2025] [Accepted: 03/17/2025] [Indexed: 04/03/2025]

Ullanat V, Jing B, Sledzieski S, Berger B. Learning the language of protein-protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642188. [PMID: 40166198 PMCID: PMC11956943 DOI: 10.1101/2025.03.09.642188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]

Zhou F, Zhang S, Zhang H, Liu JK. ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models. J Chem Inf Model 2025;65:2304-2313. [PMID: 39988825 PMCID: PMC11898056 DOI: 10.1021/acs.jcim.4c01752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 02/14/2025] [Accepted: 02/17/2025] [Indexed: 02/25/2025]

Yin J, Zhang H, Sun X, You N, Mou M, Lu M, Pan Z, Li F, Li H, Zeng S, Zhu F. Decoding Drug Response With Structurized Gridding Map-Based Cell Representation. IEEE J Biomed Health Inform 2025;29:1702-1713. [PMID: 38090819 DOI: 10.1109/jbhi.2023.3342280] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2025]

Wen S, Han Y, Li Y, Zhan D. Therapeutic Mechanisms of Medicine Food Homology Plants in Alzheimer's Disease: Insights from Network Pharmacology, Machine Learning, and Molecular Docking. Int J Mol Sci 2025;26:2121. [PMID: 40076742 PMCID: PMC11899993 DOI: 10.3390/ijms26052121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 02/21/2025] [Accepted: 02/24/2025] [Indexed: 03/14/2025] Open

Abstract

Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by a gradual decline in cognitive function. Currently, there are no effective treatments for this condition. Medicine food homology plants have gained increasing attention as potential natural treatments for AD because of their nutritional value and therapeutic benefits. In this work, we aimed to provide a deeper understanding of how medicine food homology plants may help alleviate or potentially treat AD by identifying key targets, pathways, and small molecule compounds from 10 medicine food homology plants that play an important role in this process. Using network pharmacology, we identified 623 common targets between AD and the compounds from the selected 10 plants, including crucial proteins such as STAT3, IL6, TNF, and IL1B. Additionally, the small molecules from the selected plants were grouped into four clusters using hierarchical clustering. The ConPlex algorithm was then applied to predict the binding capabilities of these small molecules to the key protein targets. Cluster 3 showed superior predicted binding capabilities to STAT3, TNF, and IL1B, which was further validated by molecular docking. Scaffold analysis of small molecules in Cluster 3 revealed that those with a steroid-like core-comprising three fused six-membered rings and one five-membered ring with a carbon-carbon double bond-exhibited better predicted binding affinities and were potential triple-target inhibitors. Among them, MOL005439, MOL000953, and MOL005438 were identified as the top-performing compounds. This study highlights the potential of medicine food homology plants as a source of active compounds that could be developed into new drugs for AD treatment. However, further pharmacokinetic studies are essential to assess their efficacy and minimize side effects.

Collapse

Ge L, Gao Q, He J, Wang X, Huang J, Zhang H, Qin Z. MultiT2: A Tool Connecting the Multimodal Data for Bacterial Aromatic Polyketide Natural Products. ACS OMEGA 2025;10:5105-5110. [PMID: 39959056 PMCID: PMC11822507 DOI: 10.1021/acsomega.4c11266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/15/2025] [Accepted: 01/23/2025] [Indexed: 02/18/2025]

Creanza TM, Alberga D, Patruno C, Mangiatordi GF, Ancona N. Transformer Decoder Learns from a Pretrained Protein Language Model to Generate Ligands with High Affinity. J Chem Inf Model 2025;65:1258-1277. [PMID: 39871540 DOI: 10.1021/acs.jcim.4c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]

Talo M, Bozdag S. Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.07.637146. [PMID: 39975019 PMCID: PMC11839103 DOI: 10.1101/2025.02.07.637146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]

Abstract

Motivation

The accurate prediction of drug-target interactions (DTI) is a crucial step in drug discovery, providing a foundation for identifying novel therapeutics. Traditional drug development is both costly and time-consuming, often spanning over a decade. Computational approaches help narrow the pool of compound candidates, offering significant starting points for experimental validation. In this study, we propose Top-DTI framework for predicting DTI by integrating topological data analysis (TDA) with large language models (LLMs). Top-DTI leverages persistent homology to extract topological features from protein contact maps and drug molecular images. Simultaneously, protein and drug LLMs generate semantically rich embeddings that capture sequential and contextual information from protein sequences and drug SMILES strings. By combining these complementary features, Top-DTI enhances predictive performance and robustness.

Results

Experimental results on the public BioSNAP and Human DTI benchmark datasets demonstrate that the proposed Top-DTI model outperforms state-of-the-art approaches across multiple evaluation metrics, including AUROC, AUPRC, sensitivity, and specificity. Furthermore, the Top-DTI model achieves superior performance in the challenging cold-split scenario, where the test and validation sets contain drugs or targets absent from the training set. This setting simulates real-world scenarios and highlights the robustness of the model. Notably, incorporating topological features alongside LLM embeddings significantly improves predictive performance, underscoring the value of integrating structural and sequence-based representations.

Availability

The data and source code of Top-DTI is available at https://github.com/bozdaglab/Top_DTI under Creative Commons Attribution Non Commercial 4.0 International Public License.

Collapse

Schuh MG, Boldini D, Bohne AI, Sieber SA. Barlow Twins deep neural network for advanced 1D drug-target interaction prediction. J Cheminform 2025;17:18. [PMID: 39910404 PMCID: PMC11800607 DOI: 10.1186/s13321-025-00952-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 01/08/2025] [Indexed: 02/07/2025] Open

Abstract

Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of our hybrid approach of deep learning and gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also propose the use of an influence method to investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interactions predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti . SCIENTIFIC CONTRIBUTION: Our computationally efficient and effective hybrid approach, combining the deep learning model Barlow Twins and gradient boosting machines, outperforms state-of-the-art methods across multiple splits and benchmarks using only one-dimensional input. Furthermore, we advance the field by proposing an influence method that elucidates model decision-making, thereby providing deeper insights into molecular interactions and improving the interpretability of drug-target interactions predictions.

Collapse

Peng Y, Wu J, Sun Y, Zhang Y, Wang Q, Shao S. Contrastive-learning of language embedding and biological features for cross modality encoding and effector prediction. Nat Commun 2025;16:1299. [PMID: 39900608 PMCID: PMC11791096 DOI: 10.1038/s41467-025-56526-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open

Yoon MS, Bae B, Kim K, Park H, Baek M. Deep learning methods for proteome-scale interaction prediction. Curr Opin Struct Biol 2025;90:102981. [PMID: 39848140 DOI: 10.1016/j.sbi.2024.102981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 11/13/2024] [Accepted: 12/22/2024] [Indexed: 01/25/2025]

McNutt AT, Adduri AK, Ellington CN, Dayao MT, Xing EP, Mohimani H, Koes DR. Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT. ARXIV 2025:arXiv:2411.15418v2. [PMID: 39975427 PMCID: PMC11838698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]

Adduri AK, McNutt AT, Ellington CN, Suraparaju K, Fang N, Yan D, Krummenacher B, Li S, Bodden C, Xing EP, Behsaz B, Koes D, Mohimani H. Interpretable adenylation domain specificity prediction using protein language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.13.632878. [PMID: 39868251 PMCID: PMC11761653 DOI: 10.1101/2025.01.13.632878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]

Tanoli Z, Schulman A, Aittokallio T. Validation guidelines for drug-target prediction methods. Expert Opin Drug Discov 2025;20:31-45. [PMID: 39568436 DOI: 10.1080/17460441.2024.2430955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024]

Ouyang X, Feng Y, Cui C, Li Y, Zhang L, Wang H. Improving generalizability of drug-target binding prediction by pre-trained multi-view molecular representations. Bioinformatics 2024;41:btaf002. [PMID: 39776159 PMCID: PMC11751634 DOI: 10.1093/bioinformatics/btaf002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 12/12/2024] [Accepted: 01/06/2025] [Indexed: 01/11/2025] Open

Liu XH, Lu ZH, Wang T, Liu F. Large language models facilitating modern molecular biology and novel drug development. Front Pharmacol 2024;15:1458739. [PMID: 39776586 PMCID: PMC11703923 DOI: 10.3389/fphar.2024.1458739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 12/05/2024] [Indexed: 01/11/2025] Open

Heinzinger M, Weissenow K, Sanchez J, Henkel A, Mirdita M, Steinegger M, Rost B. Bilingual language model for protein sequence and structure. NAR Genom Bioinform 2024;6:lqae150. [PMID: 39633723 PMCID: PMC11616678 DOI: 10.1093/nargab/lqae150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 08/02/2024] [Accepted: 10/21/2024] [Indexed: 12/07/2024] Open

Yu X, Zhou S, Zang M, Wang Q, Liu C, Liu T. Parallel Convolutional Contrastive Learning Method for Enzyme Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:2604-2609. [PMID: 39167509 DOI: 10.1109/tcbb.2024.3447037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]

Christensson G, Bocci M, Kazi JU, Durand G, Lanzing G, Pietras K, Gonzalez Velozo H, Hagerling C. Spatial Multiomics Reveals Intratumoral Immune Heterogeneity with Distinct Cytokine Networks in Lung Cancer Brain Metastases. CANCER RESEARCH COMMUNICATIONS 2024;4:2888-2902. [PMID: 39400127 PMCID: PMC11539001 DOI: 10.1158/2767-9764.crc-24-0201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 09/06/2024] [Accepted: 10/09/2024] [Indexed: 10/15/2024]

Abstract

The tumor microenvironment of brain metastases has become a focus in the development of immunotherapeutic drugs. However, countless patients with brain metastasis have not experienced clinical benefit. Thus, understanding the immune cell composition within brain metastases and how immune cells interact with each other and other microenvironmental cell types may be critical for optimizing immunotherapy. We applied spatial whole-transcriptomic profiling with extensive multiregional sampling (19-30 regions per sample) and multiplex IHC on formalin-fixed, paraffin-embedded lung cancer brain metastasis samples. We performed deconvolution of gene expression data to infer the abundances of immune cell populations and inferred spatial relationships from the multiplex IHC data. We also described cytokine networks between immune and tumor cells and used a protein language model to predict drug-target interactions. Finally, we performed deconvolution of bulk RNA data to assess the prognostic significance of immune-metastatic tumor cellular networks. We show that immune cell infiltration has a negative prognostic role in lung cancer brain metastases. Our in-depth multiomics analyses further reveal recurring intratumoral immune heterogeneity and the segregation of myeloid and lymphoid cells into distinct compartments that may be influenced by distinct cytokine networks. By using computational modeling, we identify drugs that may target genes expressed in both tumor core and regions bordering immune infiltrates. Finally, we illustrate the potential negative prognostic role of our immune-metastatic tumor cell networks. Our findings advocate for a paradigm shift from focusing on individual genes or cell types toward targeting networks of immune and tumor cells.

SIGNIFICANCE

Immune cell signatures are conserved across lung cancer brain metastases, and immune-metastatic tumor cell networks have a prognostic effect, implying that targeting cytokine networks between immune and metastatic tumor cells may generate more precise immunotherapeutic approaches.

Collapse

Liu Y, Xia X, Gong Y, Song B, Zeng X. SSR-DTA: Substructure-aware multi-layer graph neural networks for drug-target binding affinity prediction. Artif Intell Med 2024;157:102983. [PMID: 39321746 DOI: 10.1016/j.artmed.2024.102983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 09/10/2024] [Accepted: 09/13/2024] [Indexed: 09/27/2024]

Abstract

Accurate prediction of drug-target binding affinity (DTA) is essential in the field of drug discovery. Recently, scientists have been attempting to utilize artificial intelligence prediction to screen out a significant number of ineffective compounds, thereby mitigating labor and financial losses. While graph neural networks (GNNs) have been applied to DTA, existing GNNs have limitations in effectively extracting substructural features across various sizes. Functional groups play a crucial role in modulating molecular properties, but existing GNNs struggle with feature extraction from certain motifs due to scale mismatches. Additionally, sequence-based models for target proteins lack the integration of structural information. To address these limitations, we present SSR-DTA, a multi-layer graph network capable of adapting to diverse structural sizes, which can extract richer biological features, thereby improving the robustness and accuracy of predictions. Multi-layer GNNs enable the capture of molecular motifs across different scales, ranging from atomic to macrocyclic motifs. Furthermore, we introduce BiGNN to simultaneously learn sequence and structural information. Sequence information corresponds to the primary structure of proteins, while graph information represents the tertiary structure. BiGNN assimilates richer information compared to sequence-based methods while mitigating the impact of errors from predicted structures, resulting in more accurate predictions. Through rigorous experimental evaluations conducted on four benchmark datasets, we demonstrate the superiority of SSR-DTA over state-of-the-art models. Particularly, in comparison to state-of-the-art models, SSR-DTA demonstrates an impressive 20% reduction in mean squared error on the Davis dataset and a 5% reduction on the KIBA dataset, underscoring its potential as a valuable tool for advancing DTA prediction.

Collapse

Ghislat G, Hernandez-Hernandez S, Piyawajanusorn C, Ballester PJ. Data-centric challenges with the application and adoption of artificial intelligence for drug discovery. Expert Opin Drug Discov 2024;19:1297-1307. [PMID: 39316009 DOI: 10.1080/17460441.2024.2403639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/25/2024]

Li Y, Zhang X, Chen Z, Yang H, Liu Y, Wang H, Yan T, Xiang J, Wang B. Accurate prediction of drug-target interactions in Chinese and western medicine by the CWI-DTI model. Sci Rep 2024;14:25054. [PMID: 39443630 PMCID: PMC11499656 DOI: 10.1038/s41598-024-76367-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/14/2024] [Indexed: 10/25/2024] Open

Henderson J, Nagano Y, Milighetti M, Tiffeau-Mayer A. Limits on inferring T cell specificity from partial information. Proc Natl Acad Sci U S A 2024;121:e2408696121. [PMID: 39374400 PMCID: PMC11494314 DOI: 10.1073/pnas.2408696121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 09/03/2024] [Indexed: 10/09/2024] Open

Qiao G, Wang G, Li Y. Causal enhanced drug-target interaction prediction based on graph generation and multi-source information fusion. Bioinformatics 2024;40:btae570. [PMID: 39312682 PMCID: PMC11639159 DOI: 10.1093/bioinformatics/btae570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 08/17/2024] [Accepted: 09/20/2024] [Indexed: 09/25/2024] Open

Abstract

MOTIVATION

The prediction of drug-target interaction is a vital task in the biomedical field, aiding in the discovery of potential molecular targets of drugs and the development of targeted therapy methods with higher efficacy and fewer side effects. Although there are various methods for drug-target interaction (DTI) prediction based on heterogeneous information networks, these methods face challenges in capturing the fundamental interaction between drugs and targets and ensuring the interpretability of the model. Moreover, they need to construct meta-paths artificially or a lot of feature engineering (prior knowledge), and graph generation can fuse information more flexibly without meta-path selection.

RESULTS

We propose a causal enhanced method for drug-target interaction (CE-DTI) prediction that integrates graph generation and multi-source information fusion. First, we represent drugs and targets by modeling the fusion of their multi-source information through automatic graph generation. Once drugs and targets are combined, a network of drug-target pairs is constructed, transforming the prediction of drug-target interactions into a node classification problem. Specifically, the influence of surrounding nodes on the central node is separated into two groups: causal and non-causal variable nodes. Causal variable nodes significantly impact the central node's classification, while non-causal variable nodes do not. Causal invariance is then used to enhance the contrastive learning of the drug-target pairs network. Our method demonstrates excellent performance compared with other competitive benchmark methods across multiple datasets. At the same time, the experimental results also show that the causal enhancement strategy can explore the potential causal effects between DTPs, and discover new potential targets. Additionally, case studies demonstrate that this method can identify potential drug targets.

AVAILABILITY AND IMPLEMENTATION

The source code of AdaDR is available at: https://github.com/catly/CE-DTI.

Collapse

Sun D, Macedonia C, Chen Z, Chandrasekaran S, Najarian K, Zhou S, Cernak T, Ellingrod VL, Jagadish HV, Marini B, Pai M, Violi A, Rech JC, Wang S, Li Y, Athey B, Omenn GS. Can Machine Learning Overcome the 95% Failure Rate and Reality that Only 30% of Approved Cancer Drugs Meaningfully Extend Patient Survival? J Med Chem 2024;67:16035-16055. [PMID: 39253942 DOI: 10.1021/acs.jmedchem.4c01684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]

Guichaoua G, Pinel P, Hoffmann B, Azencott CA, Stoven V. Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset. J Chem Inf Model 2024;64:6938-6956. [PMID: 39237105 PMCID: PMC11423346 DOI: 10.1021/acs.jcim.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]

Abstract

Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available L H benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.

Collapse

Jiang X, Tan L, Zou Q. DGCL: dual-graph neural networks contrastive learning for molecular property prediction. Brief Bioinform 2024;25:bbae474. [PMID: 39331017 PMCID: PMC11428321 DOI: 10.1093/bib/bbae474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 08/16/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open

Theisen R, Wang T, Ravikumar B, Rahman R, Cichońska A. Leveraging multiple data types for improved compound-kinase bioactivity prediction. Nat Commun 2024;15:7596. [PMID: 39217147 PMCID: PMC11365929 DOI: 10.1038/s41467-024-52055-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open

Cao A, Zhang L, Bu Y, Sun D. Machine Learning Prediction of On/Off Target-driven Clinical Adverse Events. Pharm Res 2024;41:1649-1658. [PMID: 39095534 DOI: 10.1007/s11095-024-03742-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 07/06/2024] [Indexed: 08/04/2024]

Cesnik A, Schaffer LV, Gaur I, Jain M, Ideker T, Lundberg E. Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes. Annu Rev Biomed Data Sci 2024;7:369-389. [PMID: 38748859 PMCID: PMC11343683 DOI: 10.1146/annurev-biodatasci-102423-113534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]

Piras A, Chenghao S, Sebek M, Ispirova G, Menichetti G. CPIExtract: A software package to collect and harmonize small molecule and protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.03.601957. [PMID: 39005430 PMCID: PMC11245042 DOI: 10.1101/2024.07.03.601957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]

Sledzieski S, Kshirsagar M, Baek M, Dodhia R, Lavista Ferres J, Berger B. Democratizing protein language models with parameter-efficient fine-tuning. Proc Natl Acad Sci U S A 2024;121:e2405840121. [PMID: 38900798 PMCID: PMC11214071 DOI: 10.1073/pnas.2405840121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 05/09/2024] [Indexed: 06/22/2024] Open

Zhou H, Skolnick J. Utility of the Morgan Fingerprint in Structure-Based Virtual Ligand Screening. J Phys Chem B 2024;128:5363-5370. [PMID: 38783525 PMCID: PMC11163432 DOI: 10.1021/acs.jpcb.4c01875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024]

Rao J, Xie J, Yuan Q, Liu D, Wang Z, Lu Y, Zheng S, Yang Y. A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions. Nat Commun 2024;15:4476. [PMID: 38796523 PMCID: PMC11530528 DOI: 10.1038/s41467-024-48801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/14/2024] [Indexed: 05/28/2024] Open

Chen H, Bajorath J. Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model. J Cheminform 2024;16:55. [PMID: 38778425 PMCID: PMC11110441 DOI: 10.1186/s13321-024-00852-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open

Abstract

Deep learning models adapted from natural language processing offer new opportunities for the prediction of active compounds via machine translation of sequential molecular data representations. For example, chemical language models are often derived for compound string transformation. Moreover, given the principal versatility of language models for translating different types of textual representations, off-the-beaten-path design tasks might be explored. In this work, we have investigated generative design of active compounds with desired potency from target sequence embeddings, representing a rather provoking prediction task. Therefore, a dual-component conditional language model was designed for learning from multimodal data. It comprised a protein language model component for generating target sequence embeddings and a conditional transformer for predicting new active compounds with desired potency. To this end, the designated "biochemical" language model was trained to learn mappings of combined protein sequence and compound potency value embeddings to corresponding compounds, fine-tuned on individual activity classes not encountered during model derivation, and evaluated on compound test sets that were structurally distinct from training sets. The biochemical language model correctly reproduced known compounds with different potency for all activity classes, providing proof-of-concept for the approach. Furthermore, the conditional model consistently reproduced larger numbers of known compounds as well as more potent compounds than an unconditional model, revealing a substantial effect of potency conditioning. The biochemical language model also generated structurally diverse candidate compounds departing from both fine-tuning and test compounds. Overall, generative compound design based on potency value-conditioned target sequence embeddings yielded promising results, rendering the approach attractive for further exploration and practical applications. SCIENTIFIC CONTRIBUTION: The approach introduced herein combines protein language model and chemical language model components, representing an advanced architecture, and is the first methodology for predicting compounds with desired potency from conditioned protein sequence data.

Collapse

Wei H, Gao L, Wu S, Jiang Y, Liu B. DiSMVC: a multi-view graph collaborative learning framework for measuring disease similarity. Bioinformatics 2024;40:btae306. [PMID: 38715444 PMCID: PMC11256965 DOI: 10.1093/bioinformatics/btae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/19/2024] [Accepted: 05/05/2024] [Indexed: 05/30/2024] Open

Ding K, Luo J, Luo Y. Leveraging conformal prediction to annotate enzyme function space with limited false positives. PLoS Comput Biol 2024;20:e1012135. [PMID: 38809942 PMCID: PMC11164347 DOI: 10.1371/journal.pcbi.1012135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 06/10/2024] [Accepted: 05/03/2024] [Indexed: 05/31/2024] Open

Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024;64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]

Ozalp MK, Vignaux PA, Puhl AC, Lane TR, Urbina F, Ekins S. Sequential Contrastive and Deep Learning Models to Identify Selective Butyrylcholinesterase Inhibitors. J Chem Inf Model 2024;64:3161-3172. [PMID: 38532612 PMCID: PMC11331448 DOI: 10.1021/acs.jcim.4c00397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]

Meimetis N, Lauffenburger DA, Nilsson A. Inference of drug off-target effects on cellular signaling using interactome-based deep learning. iScience 2024;27:109509. [PMID: 38591003 PMCID: PMC11000001 DOI: 10.1016/j.isci.2024.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/04/2024] [Accepted: 03/13/2024] [Indexed: 04/10/2024] Open

Qiu Y, Cheng F. Artificial intelligence for drug discovery and development in Alzheimer's disease. Curr Opin Struct Biol 2024;85:102776. [PMID: 38335558 DOI: 10.1016/j.sbi.2024.102776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/29/2023] [Accepted: 01/15/2024] [Indexed: 02/12/2024]

Luo D, Liu D, Qu X, Dong L, Wang B. Enhancing Generalizability in Protein-Ligand Binding Affinity Prediction with Multimodal Contrastive Learning. J Chem Inf Model 2024;64:1892-1906. [PMID: 38441880 DOI: 10.1021/acs.jcim.3c01961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]

Singh N, Singh AK. In Silico Structural Modeling and Binding Site Analysis of Cerebroside Sulfotransferase (CST): A Therapeutic Target for Developing Substrate Reduction Therapy for Metachromatic Leukodystrophy. ACS OMEGA 2024;9:10748-10768. [PMID: 38463293 PMCID: PMC10918841 DOI: 10.1021/acsomega.3c09462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/26/2024] [Accepted: 01/31/2024] [Indexed: 03/12/2024]

Abstract

Cerebroside sulfotransferase (CST) is emerging as an important therapeutic target to develop substrate reduction therapy (SRT) for metachromatic leukodystrophy (MLD), a rare neurodegenerative lysosomal storage disorder. MLD develops with progressive impairment and destruction of the myelin sheath as a result of accumulation of sulfatide around the nerve cells in the absence of its recycling mechanism with deficiency of arylsulfatase A (ARSA). Sulfatide is the product of the catalytic action of cerebroside sulfotransferase (CST), which needs to be regulated under pathophysiological conditions by inhibitor development. To carry out in silico-based preliminary drug screening or for designing new drug candidates, a high-quality three-dimensional (3D) structure is needed in the absence of an experimentally derived three-dimensional crystal structure. In this study, a 3D model of the protein was developed using a primary sequence with the SWISS-MODEL server by applying the top four GMEQ score-based templates belonging to the sulfotransferase family as a reference. The 3D model of CST highlights the features of the protein responsible for its catalytic action. The CST model comprises five β-strands, which are flanked by ten α-helices from both sides as well as form the upside cover of the catalytic pocket of CST. CST has two catalytic regions: PAPS (-sulfo donor) binding and galactosylceramide (-sulfo acceptor) binding. The catalytic action of CST was proposed via molecular docking and molecular dynamic (MD) simulation with PAPS, galactosylceramide (GC), PAPS-galactosylceramide, and PAP. The stability of the model and its catalytic action were confirmed using molecular dynamic simulation-based trajectory analysis. CST response against the inhibition potential of the experimentally reported competitive inhibitor of CST was confirmed via molecular docking and molecular dynamics simulation, which suggested the suitability of the CST model for future drug discovery to strengthen substrate reduction therapy for MLD.

Collapse

Smith MD, Darryl Quarles L, Demerdash O, Smith JC. Drugging the entire human proteome: Are we there yet? Drug Discov Today 2024;29:103891. [PMID: 38246414 DOI: 10.1016/j.drudis.2024.103891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 01/23/2024]

Taujale R, Gravel N, Zhou Z, Yeung W, Kochut K, Kannan N. Informatic challenges and advances in illuminating the druggable proteome. Drug Discov Today 2024;29:103894. [PMID: 38266979 DOI: 10.1016/j.drudis.2024.103894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/08/2024] [Accepted: 01/17/2024] [Indexed: 01/26/2024]

Scharf MM, Humphrys LJ, Berndt S, Di Pizio A, Lehmann J, Liebscher I, Nicoli A, Niv MY, Peri L, Schihada H, Schulte G. The dark sides of the GPCR tree - research progress on understudied GPCRs. Br J Pharmacol 2024. [PMID: 38339984 DOI: 10.1111/bph.16325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/24/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open