1
|
Schulman A, Rousu J, Aittokallio T, Tanoli Z. Attention-based approach to predict drug-target interactions across seven target superfamilies. Bioinformatics 2024; 40:btae496. [PMID: 39115379 PMCID: PMC11520408 DOI: 10.1093/bioinformatics/btae496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/12/2024] [Accepted: 08/06/2024] [Indexed: 08/29/2024] Open
Abstract
MOTIVATION Drug-target interactions (DTIs) hold a pivotal role in drug repurposing and elucidation of drug mechanisms of action. While single-targeted drugs have demonstrated clinical success, they often exhibit limited efficacy against complex diseases, such as cancers, whose development and treatment is dependent on several biological processes. Therefore, a comprehensive understanding of primary, secondary and even inactive targets becomes essential in the quest for effective and safe treatments for cancer and other indications. The human proteome offers over a thousand druggable targets, yet most FDA-approved drugs bind to only a small fraction of these targets. RESULTS This study introduces an attention-based method (called as MMAtt-DTA) to predict drug-target bioactivities across human proteins within seven superfamilies. We meticulously examined nine different descriptor sets to identify optimal signature descriptors for predicting novel DTIs. Our testing results demonstrated Spearman correlations exceeding 0.72 (P < 0.001) for six out of seven superfamilies. The proposed method outperformed fourteen state-of-the-art machine learning, deep learning and graph-based methods and maintained relatively high performance for most target superfamilies when tested with independent bioactivity data sources. We computationally validated 185 676 drug-target pairs from ChEMBL-V33 that were not available during model training, achieving a reasonable performance with Spearman correlation >0.57 (P < 0.001) for most superfamilies. This underscores the robustness of the proposed method for predicting novel DTIs. Finally, we applied our method to predict missing bioactivities among 3492 approved molecules in ChEMBL-V33, offering a valuable tool for advancing drug mechanism discovery and repurposing existing drugs for new indications. AVAILABILITY AND IMPLEMENTATION https://github.com/AronSchulman/MMAtt-DTA.
Collapse
Affiliation(s)
- Aron Schulman
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, 02150, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, 0379, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, 0372, Norway
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, 00014, Finland
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital, Helsinki, 00014, Finland
- Drug Discovery and Chemical Biology (DDCB) Consortium, Biocenter, Helsinki, 00014, Finland
- BioICAWtech, Helsinki, Helsinki, 00410, Finland
| |
Collapse
|
2
|
Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Fröhlich H. Transformer models in biomedicine. BMC Med Inform Decis Mak 2024; 24:214. [PMID: 39075407 PMCID: PMC11287876 DOI: 10.1186/s12911-024-02600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
Deep neural networks (DNN) have fundamentally revolutionized the artificial intelligence (AI) field. The transformer model is a type of DNN that was originally used for the natural language processing tasks and has since gained more and more attention for processing various kinds of sequential data, including biological sequences and structured electronic health records. Along with this development, transformer-based models such as BioBERT, MedBERT, and MassGenie have been trained and deployed by researchers to answer various scientific questions originating in the biomedical domain. In this paper, we review the development and application of transformer models for analyzing various biomedical-related datasets such as biomedical textual data, protein sequences, medical structured-longitudinal data, and biomedical images as well as graphs. Also, we look at explainable AI strategies that help to comprehend the predictions of transformer-based models. Finally, we discuss the limitations and challenges of current models, and point out emerging novel research directions.
Collapse
Affiliation(s)
- Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Institute of Computer Science, University of Bonn, Bonn, 53115, Germany.
| | - Manuel Lentzen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Johannes Brandt
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Daniel Rueckert
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
- School of Computation, Information and Technology, Technical University Munich, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany.
| |
Collapse
|
3
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
4
|
Savva K, Zachariou M, Bourdakou MM, Dietis N, Spyrou GM. D Re Amocracy: A Method to Capitalise on Prior Drug Discovery Efforts to Highlight Candidate Drugs for Repurposing. Int J Mol Sci 2024; 25:5319. [PMID: 38791356 PMCID: PMC11121186 DOI: 10.3390/ijms25105319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/26/2024] [Accepted: 05/02/2024] [Indexed: 05/26/2024] Open
Abstract
In the area of drug research, several computational drug repurposing studies have highlighted candidate repurposed drugs, as well as clinical trial studies that have tested/are testing drugs in different phases. To the best of our knowledge, the aggregation of the proposed lists of drugs by previous studies has not been extensively exploited towards generating a dynamic reference matrix with enhanced resolution. To fill this knowledge gap, we performed weight-modulated majority voting of the modes of action, initial indications and targeted pathways of the drugs in a well-known repository, namely the Drug Repurposing Hub. Our method, DReAmocracy, exploits this pile of information and creates frequency tables and, finally, a disease suitability score for each drug from the selected library. As a testbed, we applied this method to a group of neurodegenerative diseases (Alzheimer's, Parkinson's, Huntington's disease and Multiple Sclerosis). A super-reference table with drug suitability scores has been created for all four neurodegenerative diseases and can be queried for any drug candidate against them. Top-scored drugs for Alzheimer's Disease include agomelatine, mirtazapine and vortioxetine; for Parkinson's Disease, they include apomorphine, pramipexole and lisuride; for Huntington's, they include chlorpromazine, fluphenazine and perphenazine; and for Multiple Sclerosis, they include zonisamide, disopyramide and priralfimide. Overall, DReAmocracy is a methodology that focuses on leveraging the existing drug-related experimental and/or computational knowledge rather than a predictive model for drug repurposing, offering a quantified aggregation of existing drug discovery results to (1) reveal trends in selected tracks of drug discovery research with increased resolution that includes modes of action, targeted pathways and initial indications for the investigated drugs and (2) score new candidate drugs for repurposing against a selected disease.
Collapse
Affiliation(s)
- Kyriaki Savva
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia 2370, Cyprus; (K.S.); (M.Z.); (M.M.B.)
| | - Margarita Zachariou
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia 2370, Cyprus; (K.S.); (M.Z.); (M.M.B.)
| | - Marilena M. Bourdakou
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia 2370, Cyprus; (K.S.); (M.Z.); (M.M.B.)
| | - Nikolas Dietis
- Experimental Pharmacology Laboratory, Medical School, University of Cyprus, Nicosia 2115, Cyprus;
| | - George M. Spyrou
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia 2370, Cyprus; (K.S.); (M.Z.); (M.M.B.)
| |
Collapse
|
5
|
Chen S, Li M, Semenov I. MFA-DTI: Drug-target interaction prediction based on multi-feature fusion adopted framework. Methods 2024; 224:79-92. [PMID: 38430967 DOI: 10.1016/j.ymeth.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 02/16/2024] [Accepted: 02/23/2024] [Indexed: 03/05/2024] Open
Abstract
The identification of drug-target interactions (DTI) is a valuable step in the drug discovery and repositioning process. However, traditional laboratory experiments are time-consuming and expensive. Computational methods have streamlined research to determine DTIs. The application of deep learning methods has significantly improved the prediction performance for DTIs. Modern deep learning methods can leverage multiple sources of information, including sequence data that contains biological structural information, and interaction data. While useful, these methods cannot be effectively applied to each type of information individually (e.g., chemical structure and interaction network) and do not take into account the specificity of DTI data such as low- or zero-interaction biological entities. To overcome these limitations, we propose a method called MFA-DTI (Multi-feature Fusion Adopted framework for DTI). MFA-DTI consists of three modules: an interaction graph learning module that processes the interaction network to generate interaction vectors, a chemical structure learning module that extracts features from the chemical structure, and a fusion module that combines these features for the final prediction. To validate the performance of MFA-DTI, we conducted experiments on six public datasets under different settings. The results indicate that the proposed method is highly effective in various settings and outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China.
| | - Minghui Li
- Beidahuang Industry Group General Hospital, Harbin, 150006, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| |
Collapse
|
6
|
Bandara D, Riccardi K. Graph Node Classification to Predict Autism Risk in Genes. Genes (Basel) 2024; 15:447. [PMID: 38674382 PMCID: PMC11049455 DOI: 10.3390/genes15040447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/28/2024] [Accepted: 03/28/2024] [Indexed: 04/28/2024] Open
Abstract
This study explores the genetic risk associations with autism spectrum disorder (ASD) using graph neural networks (GNNs), leveraging the Sfari dataset and protein interaction network (PIN) data. We built a gene network with genes as nodes, chromosome band location as node features, and gene interactions as edges. Graph models were employed to classify the autism risk associated with newly introduced genes (test set). Three classification tasks were undertaken to test the ability of our models: binary risk association, multi-class risk association, and syndromic gene association. We tested graph convolutional networks, Graph Sage, graph transformer, and Multi-Layer Perceptron (Baseline) architectures on this problem. The Graph Sage model consistently outperformed the other models, showcasing its utility in classifying ASD-related genes. Our ablation studies show that the chromosome band location and protein interactions contain useful information for this problem. The models achieved 85.80% accuracy on the binary risk classification, 81.68% accuracy on the multi-class risk classification, and 90.22% on the syndromic classification.
Collapse
Affiliation(s)
- Danushka Bandara
- Department of Computer Science and Engineering, Fairfield University, Fairfield, CT 06824, USA;
| | | |
Collapse
|
7
|
Zhang Y, Zhou C. PfgPDI: Pocket feature-enabled graph neural network for protein-drug interaction prediction. J Bioinform Comput Biol 2024; 22:2450004. [PMID: 38812467 DOI: 10.1142/s0219720024500045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Biomolecular interaction recognition between ligands and proteins is an essential task, which largely enhances the safety and efficacy in drug discovery and development stage. Studying the interaction between proteins and ligands can improve the understanding of disease pathogenesis and lead to more effective drug targets. Additionally, it can aid in determining drug parameters, ensuring proper absorption, distribution, and metabolism within the body. Due to incomplete feature representation or the model's inadequate adaptation to protein-ligand complexes, the existing methodologies suffer from suboptimal predictive accuracy. To address these pitfalls, in this study, we designed a new deep learning method based on transformer and GCN. We first utilized the transformer network to grasp crucial information of the original protein sequences within the smile sequences and connected them to prevent falling into a local optimum. Furthermore, a series of dilation convolutions are performed to obtain the pocket features and smile features, subsequently subjected to graphical convolution to optimize the connections. The combined representations are fed into the proposed model for classification prediction. Experiments conducted on various protein-ligand binding prediction methods prove the effectiveness of our proposed method. It is expected that the PfgPDI can contribute to drug prediction and accelerate the development of new drugs, while also serving as a valuable partner for drug testing and Research and Development engineers.
Collapse
Affiliation(s)
- Yiqian Zhang
- School of Electrical and Information, Northeast Agricultural University, Harbin 150030, P. R. China
| | - Changjian Zhou
- Department of Data and Computing, Northeast Agricultural University, Harbin 150030, P. R. China
| |
Collapse
|
8
|
Hashemi Sheikhshabani S, Ghafouri-Fard S, Amini-Farsani Z, Modarres P, Khazaei Feyzabad S, Amini-Farsani Z, Shaygan N, Omrani MD. In Silico Prediction of Functional SNPs Interrupting Antioxidant Defense Genes in Relation to COVID-19 Progression. Biochem Genet 2024:10.1007/s10528-024-10705-9. [PMID: 38460087 DOI: 10.1007/s10528-024-10705-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/16/2024] [Indexed: 03/11/2024]
Abstract
The excessive production of reactive oxygen species and weakening of antioxidant defense system play a pivotal role in the pathogenesis of different diseases. Extensive differences observed among individuals in terms of affliction with cancer, cardiovascular disorders, diabetes, bacterial, and viral infections, as well as response to treatments can be partly due to their genomic variations. In this work, we attempted to predict the effect of SNPs of the key genes of antioxidant defense system on their structure, function, and expression in relation to COVID-19 pathogenesis using in silico tools. In addition, the effect of SNPs on the target site binding efficiency of SNPs was investigated as a factor with potential to change drug response or susceptibility to COVID-19. According to the predicted results, only six missense SNPs with minor allele frequency (MAF) ≥ 0.1 in the coding region of genes GPX7, GPX8, TXNRD2, GLRX5, and GLRX were able to strongly affect their structure and function. Our results predicted that 39 SNPs with MAF ≥ 0.1 led to the generation or destruction of miRNA-binding sites on target antioxidant genes from GPX, PRDX, GLRX, TXN, and SOD families. The results obtained from comparing the expression profiles of mild vs. severe COVID-19 patients using GEO2R demonstrated a significant change in the expression of approximately 250 miRNAs. The binding efficiency of 21 of these miRNAs was changed due to the elimination or generation of target sites in these genes. Altogether, this study reveals the fundamental role of the SNPs of antioxidant defense genes in COVID-19 progression and susceptibility of individuals to this virus. In addition, different responses of COVID-19 patients to antioxidant defense system enhancement drugs may be due to presence of these SNPs in different individuals.
Collapse
Affiliation(s)
- Somayeh Hashemi Sheikhshabani
- Student Research Committee, Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Soudeh Ghafouri-Fard
- Student Research Committee, Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Zeinab Amini-Farsani
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Parastoo Modarres
- Department of Cell and Molecular Biology and Microbiology, University of Isfahan, Isfahan, Iran
| | - Sharareh Khazaei Feyzabad
- Department of Laboratory Sciences, School of Paramedical Sciences, Zahedan University of Medical Sciences, Zahedan, Iran
| | - Zahra Amini-Farsani
- Bayesian Imaging and Spatial Statistics Group, Institute of Statistics, Ludwig-Maximilian-Universität München, Ludwigstraße 33, 80539, Munich, Germany
| | - Nasibeh Shaygan
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mir Davood Omrani
- Department of Medical Genetics, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Urogenital Stem Cell Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
9
|
Jiang J, Pei H, Li J, Li M, Zou Q, Lv Z. FEOpti-ACVP: identification of novel anti-coronavirus peptide sequences based on feature engineering and optimization. Brief Bioinform 2024; 25:bbae037. [PMID: 38366802 PMCID: PMC10939380 DOI: 10.1093/bib/bbae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024] Open
Abstract
Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
10
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
11
|
Yan J, Ye Z, Yang Z, Lu C, Zhang S, Liu Q, Qiu J. Multi-task bioassay pre-training for protein-ligand binding affinity prediction. Brief Bioinform 2023; 25:bbad451. [PMID: 38084920 PMCID: PMC10783875 DOI: 10.1093/bib/bbad451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/27/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Protein-ligand binding affinity (PLBA) prediction is the fundamental task in drug discovery. Recently, various deep learning-based models predict binding affinity by incorporating the three-dimensional (3D) structure of protein-ligand complexes as input and achieving astounding progress. However, due to the scarcity of high-quality training data, the generalization ability of current models is still limited. Although there is a vast amount of affinity data available in large-scale databases such as ChEMBL, issues such as inconsistent affinity measurement labels (i.e. IC50, Ki, Kd), different experimental conditions, and the lack of available 3D binding structures complicate the development of high-precision affinity prediction models using these data. To address these issues, we (i) propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction; (ii) construct a pre-training dataset called ChEMBL-Dock with more than 300k experimentally measured affinity labels and about 2.8M docked 3D structures. By introducing multi-task pre-training to treat the prediction of different affinity labels as different tasks and classifying relative rankings between samples from the same bioassay, MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate the capability of MBP on the structure-based PLBA prediction task. To the best of our knowledge, MBP is the first affinity pre-training model and shows great potential for future development. MBP web-server is now available for free at: https://huggingface.co/spaces/jiaxianustc/mbp.
Collapse
Affiliation(s)
- Jiaxian Yan
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Zhaofeng Ye
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Ziyi Yang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Chengqiang Lu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| | - Qi Liu
- Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China, JinZhai Road, 230026, Anhui, China
| | - Jiezhong Qiu
- Tencent Quantum Laboratory, Tencent, Shennan Road, 518057, Guangdong, China
| |
Collapse
|
12
|
Li J, Ma S, Pei H, Jiang J, Zou Q, Lv Z. Review of T cell proliferation regulatory factors in treatment and prognostic prediction for solid tumors. Heliyon 2023; 9:e21329. [PMID: 37954355 PMCID: PMC10637962 DOI: 10.1016/j.heliyon.2023.e21329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/15/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
T cell proliferation regulators (Tcprs), which are positive regulators that promote T cell function, have made great contributions to the development of therapies to improve T cell function. CAR (chimeric antigen receptor) -T cell therapy, a type of adoptive cell transfer therapy that targets tumor cells and enhances immune lethality, has led to significant progress in the treatment of hematologic tumors. However, the applications of CAR-T in solid tumor treatment remain limited. Therefore, in this review, we focus on the development of Tcprs for solid tumor therapy and prognostic prediction. We summarize potential strategies for targeting different Tcprs to enhance T cell proliferation and activation and inhibition of cancer progression, thereby improving the antitumor activity and persistence of CAR-T. In summary, we propose means of enhancing CAR-T cells by expressing different Tcprs, which may lead to the development of a new generation of cell therapies.
Collapse
Affiliation(s)
- Jiayu Li
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Shuhan Ma
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jici Jiang
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
13
|
Tisi A, Palaniappan S, Maccarrone M. Advanced Omics Techniques for Understanding Cochlear Genome, Epigenome, and Transcriptome in Health and Disease. Biomolecules 2023; 13:1534. [PMID: 37892216 PMCID: PMC10605747 DOI: 10.3390/biom13101534] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Advanced genomics, transcriptomics, and epigenomics techniques are providing unprecedented insights into the understanding of the molecular underpinnings of the central nervous system, including the neuro-sensory cochlea of the inner ear. Here, we report for the first time a comprehensive and updated overview of the most advanced omics techniques for the study of nucleic acids and their applications in cochlear research. We describe the available in vitro and in vivo models for hearing research and the principles of genomics, transcriptomics, and epigenomics, alongside their most advanced technologies (like single-cell omics and spatial omics), which allow for the investigation of the molecular events that occur at a single-cell resolution while retaining the spatial information.
Collapse
Affiliation(s)
- Annamaria Tisi
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
| | - Sakthimala Palaniappan
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
| | - Mauro Maccarrone
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, 67100 L’Aquila, Italy;
- Laboratory of Lipid Neurochemistry, European Center for Brain Research (CERC), Santa Lucia Foundation IRCCS, 00143 Rome, Italy
| |
Collapse
|
14
|
Deng Y, Ma S, Li J, Zheng B, Lv Z. Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides. Int J Mol Sci 2023; 24:10854. [PMID: 37446031 PMCID: PMC10341712 DOI: 10.3390/ijms241310854] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/17/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi2), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
Collapse
Affiliation(s)
- Yiting Deng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Shuhan Ma
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China;
| | - Bowen Zheng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| |
Collapse
|
15
|
Chen P, Zheng H. Drug-target interaction prediction based on spatial consistency constraint and graph convolutional autoencoder. BMC Bioinformatics 2023; 24:151. [PMID: 37069493 PMCID: PMC10109239 DOI: 10.1186/s12859-023-05275-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/05/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction plays an important role in drug discovery and repositioning. However, most of the computational methods used for identifying relevant DTIs do not consider the invariance of the nearest neighbour relationships between drugs or targets. In other words, they do not take into account the invariance of the topological relationships between nodes during representation learning. It may limit the performance of the DTI prediction methods. RESULTS Here, we propose a novel graph convolutional autoencoder-based model, named SDGAE, to predict DTIs. As the graph convolutional network cannot handle isolated nodes in a network, a pre-processing step was applied to reduce the number of isolated nodes in the heterogeneous network and facilitate effective exploitation of the graph convolutional network. By maintaining the graph structure during representation learning, the nearest neighbour relationships between nodes in the embedding space remained as close as possible to the original space. CONCLUSIONS Overall, we demonstrated that SDGAE can automatically learn more informative and robust feature vectors of drugs and targets, thus exhibiting significantly improved predictive accuracy for DTIs.
Collapse
Affiliation(s)
- Peng Chen
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China
| | - Haoran Zheng
- School of Computer Science and Technology, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
- Anhui Key Laboratory of Software Engineering in Computing and Communication, University of Science and Technology of China, Jinzhai Road 96, Hefei, 230027, People's Republic of China.
| |
Collapse
|