Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One 2015;10:e0141287. [PMID: 26555596 PMCID: PMC4640716 DOI: 10.1371/journal.pone.0141287] [Citation(s) in RCA: 349] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 10/05/2015] [Indexed: 12/22/2022] Open

For:	Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One 2015;10:e0141287. [PMID: 26555596 PMCID: PMC4640716 DOI: 10.1371/journal.pone.0141287] [Citation(s) in RCA: 349] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 10/05/2015] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Ghazikhani H, Butler G. Exploiting protein language models for the precise classification of ion channels and ion transporters. Proteins 2024;92:998-1055. [PMID: 38656743 DOI: 10.1002/prot.26694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]

Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]

Ranjan A, Bess A, Alvin C, Mukhopadhyay S. MDF-DTA: A Multi-Dimensional Fusion Approach for Drug-Target Binding Affinity Prediction. J Chem Inf Model 2024;64:4980-4990. [PMID: 38888163 PMCID: PMC11234358 DOI: 10.1021/acs.jcim.4c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/15/2024] [Accepted: 05/29/2024] [Indexed: 06/20/2024]

Banerjee P, Eulenstein O, Friedberg I. Discovering genomic islands in unannotated bacterial genomes using sequence embedding. BIOINFORMATICS ADVANCES 2024;4:vbae089. [PMID: 38911822 PMCID: PMC11193100 DOI: 10.1093/bioadv/vbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 05/26/2024] [Accepted: 06/11/2024] [Indexed: 06/25/2024]

Zhang B, Hou Z, Yang Y, Wong KC, Zhu H, Li X. SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues. Commun Biol 2024;7:679. [PMID: 38830995 PMCID: PMC11148103 DOI: 10.1038/s42003-024-06332-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/15/2024] [Indexed: 06/05/2024] Open

Abstract

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .

Collapse

Sztuka M, Kotlarz K, Mielczarek M, Hajduk P, Liu J, Szyda J. Nextflow vs. plain bash: different approaches to the parallelization of SNP calling from the whole genome sequence data. NAR Genom Bioinform 2024;6:lqae040. [PMID: 38686136 PMCID: PMC11057021 DOI: 10.1093/nargab/lqae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 03/28/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024] Open

Tran HN, Nguyen PXQ, Guo F, Wang J. Prediction of Protein-Protein Interactions Based on Integrating Deep Learning and Feature Fusion. Int J Mol Sci 2024;25:5820. [PMID: 38892007 PMCID: PMC11172432 DOI: 10.3390/ijms25115820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 04/27/2024] [Accepted: 04/29/2024] [Indexed: 06/21/2024] Open

Joubbi S, Micheli A, Milazzo P, Maccari G, Ciano G, Cardamone D, Medini D. Antibody design using deep learning: from sequence and structure design to affinity maturation. Brief Bioinform 2024;25:bbae307. [PMID: 38960409 PMCID: PMC11221890 DOI: 10.1093/bib/bbae307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/20/2024] [Accepted: 06/12/2024] [Indexed: 07/05/2024] Open

Yang Q, Xu L, Dong W, Li X, Wang K, Dong S, Zhang X, Yang T, Jiang F, Zhang B, Luo G, Gao X, Wang G. HLAIImaster: a deep learning method with adaptive domain knowledge predicts HLA II neoepitope immunogenic responses. Brief Bioinform 2024;25:bbae302. [PMID: 38920343 PMCID: PMC11200192 DOI: 10.1093/bib/bbae302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/20/2024] [Accepted: 06/11/2024] [Indexed: 06/27/2024] Open

Affiliation(s)

Qiang Yang School of Medicine and Health, Harbin Institute of Technology, Yikuang Street, Harbin 150000, China
Long Xu School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street, Harbin 150001, China
Weihe Dong College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, Harbin 150004, China
Xiaokun Li School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street, Harbin 150001, China School of Computer Science and Technology, Heilongjiang University, Xuefu Road, Harbin 150080, China Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Xuefu Road, Harbin 150090, China Shandong Hengxun Technology Co., Ltd., Miaoling Road, Qingdao 266100, China
Kuanquan Wang School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi Street, Harbin 150001, China
Suyu Dong College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, Harbin 150004, China
Xianyu Zhang Department of Breast Surgery, Harbin Medical University Cancer Hospital, Haping Road, Harbin 150081, China
Tiansong Yang Department of Rehabilitation, The First Affiliated Hospital of Heilongjiang University of Traditional Chinese Medicine, and Traditional Chinese Medicine Informatics Key Laboratory of Heilongjiang Province, Heping Road, Harbin 150040, China
Feng Jiang School of Medicine and Health, Harbin Institute of Technology, Yikuang Street, Harbin 150000, China
Bin Zhang Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
Gongning Luo Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
Xin Gao Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal 23955, Saudi Arabia
Guohua Wang College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, Harbin 150004, China

Collapse

García Sánchez N, Ugarte Carro E, Prieto-Santamaría L, Rodríguez-González A. Protein sequence analysis in the context of drug repurposing. BMC Med Inform Decis Mak 2024;24:122. [PMID: 38741115 DOI: 10.1186/s12911-024-02531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open

Wagner A. Genotype sampling for deep-learning assisted experimental mapping of a combinatorially complete fitness landscape. Bioinformatics 2024;40:btae317. [PMID: 38745436 PMCID: PMC11132821 DOI: 10.1093/bioinformatics/btae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/21/2024] [Accepted: 05/14/2024] [Indexed: 05/16/2024] Open

Hafezqorani S, Nip KM, Birol I. ntEmbd: Deep learning embedding for nucleotide sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.30.591806. [PMID: 38746190 PMCID: PMC11092672 DOI: 10.1101/2024.04.30.591806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024;173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]

Lobanov MY, Slizen MV, Dovidchenko NV, Panfilov AV, Surin AA, Likhachev IV, Galzitskaya OV. Comparison of deep learning models with simple method to assess the problem of antimicrobial peptides prediction. Mol Inform 2024;43:e202200181. [PMID: 36961202 DOI: 10.1002/minf.202200181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 03/20/2023] [Accepted: 03/23/2023] [Indexed: 03/25/2023]

Nambiar A, Forsyth JM, Liu S, Maslov S. DR-BERT: A protein language model to annotate disordered regions. Structure 2024:S0969-2126(24)00136-9. [PMID: 38701796 DOI: 10.1016/j.str.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/16/2023] [Accepted: 04/08/2024] [Indexed: 05/05/2024]

Kotlarz K, Mielczarek M, Biecek P, Wojdak-Maksymiec K, Suchocki T, Topolski P, Jagusiak W, Szyda J. An Explainable Deep Learning Classifier of Bovine Mastitis Based on Whole-Genome Sequence Data-Circumventing the p >> n Problem. Int J Mol Sci 2024;25:4715. [PMID: 38731932 PMCID: PMC11083318 DOI: 10.3390/ijms25094715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 04/19/2024] [Accepted: 04/23/2024] [Indexed: 05/13/2024] Open

Affiliation(s)

Krzysztof Kotlarz Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland; (K.K.); (M.M.); (T.S.) University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland
Magda Mielczarek Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland; (K.K.); (M.M.); (T.S.) University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland
Przemysław Biecek Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland; Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland
Katarzyna Wojdak-Maksymiec Department of Genetics and Animal Breeding, West Pomeranian University of Technology, Aleja Piastow 45, 70-311 Szczecin, Poland;
Tomasz Suchocki Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland; (K.K.); (M.M.); (T.S.) University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland
Piotr Topolski National Research Institute of Animal Production, Krakowska 1, 32-083 Balice, Poland; (P.T.); (W.J.)
Wojciech Jagusiak National Research Institute of Animal Production, Krakowska 1, 32-083 Balice, Poland; (P.T.); (W.J.) Faculty of Animal Science, University of Agriculture in Krakow, al. Mickiewicza 24/28, 30-059 Kraków, Poland
Joanna Szyda Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland; (K.K.); (M.M.); (T.S.) University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland

Collapse

Choi Y, Lee J, Shin K, Lee JW, Kim JW, Lee S, Choi YJ, Park KH, Kim JH. Integrated clinical and genomic models using machine-learning methods to predict the efficacy of paclitaxel-based chemotherapy in patients with advanced gastric cancer. BMC Cancer 2024;24:502. [PMID: 38643078 PMCID: PMC11031899 DOI: 10.1186/s12885-024-12268-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 04/16/2024] [Indexed: 04/22/2024] Open

Abstract

BACKGROUND

Paclitaxel is commonly used as a second-line therapy for advanced gastric cancer (AGC). The decision to proceed with second-line chemotherapy and select an appropriate regimen is critical for vulnerable patients with AGC progressing after first-line chemotherapy. However, no predictive biomarkers exist to identify patients with AGC who would benefit from paclitaxel-based chemotherapy.

METHODS

This study included 288 patients with AGC receiving second-line paclitaxel-based chemotherapy between 2017 and 2022 as part of the K-MASTER project, a nationwide government-funded precision medicine initiative. The data included clinical (age [young-onset vs. others], sex, histology [intestinal vs. diffuse type], prior trastuzumab use, duration of first-line chemotherapy), and genomic factors (pathogenic or likely pathogenic variants). Data were randomly divided into training and validation sets (0.8:0.2). Four machine learning (ML) methods, namely random forest (RF), logistic regression (LR), artificial neural network (ANN), and ANN with genetic embedding (ANN with GE), were used to develop the prediction model and validated in the validation sets.

RESULTS

The median patient age was 64 years (range 25-91), and 65.6% of those were male. A total of 288 patients were divided into the training (n = 230) and validation (n = 58) sets. No significant differences existed in baseline characteristics between the training and validation sets. In the training set, the areas under the ROC curves (AUROC) for predicting better progression-free survival (PFS) with paclitaxel-based chemotherapy were 0.499, 0.679, 0.618, and 0.732 in the RF, LR, ANN, and ANN with GE models, respectively. The ANN with the GE model that achieved the highest AUROC recorded accuracy, sensitivity, specificity, and F1-score performance of 0.458, 0.912, 0.724, and 0.579, respectively. In the validation set, the ANN with GE model predicted that paclitaxel-sensitive patients had significantly longer PFS (median PFS 7.59 vs. 2.07 months, P = 0.020) and overall survival (OS) (median OS 14.70 vs. 7.50 months, P = 0.008). The LR model predicted that paclitaxel-sensitive patients showed a trend for longer PFS (median PFS 6.48 vs. 2.33 months, P = 0.078) and OS (median OS 12.20 vs. 8.61 months, P = 0.099).

CONCLUSIONS

These ML models, integrated with clinical and genomic factors, offer the possibility to help identify patients with AGC who may benefit from paclitaxel chemotherapy.

Collapse

Prabhu H, Bhosale H, Sane A, Dhadwal R, Ramakrishnan V, Valadi J. Protein feature engineering framework for AMPylation site prediction. Sci Rep 2024;14:8695. [PMID: 38622194 DOI: 10.1038/s41598-024-58450-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open

Chen J, Wu H, Wang N. KEGG orthology prediction of bacterial proteins using natural language processing. BMC Bioinformatics 2024;25:146. [PMID: 38600441 PMCID: PMC11007918 DOI: 10.1186/s12859-024-05766-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open

Wu Z, Wang C, Li C, Xu N, Cao X, Chen S, Shi Y, He Y, Zhang P, Ji J. Integrated Computational Pipeline for the High-Throughput Discovery of Cell Adhesion Peptides. J Phys Chem Lett 2024;15:3748-3756. [PMID: 38551401 DOI: 10.1021/acs.jpclett.4c00393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]

Affiliation(s)

Zhiyu Wu College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Institute of Zhejiang University-Quzhou, Quzhou 324000, China
Cong Wang MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China
Chen Li College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Institute of Zhejiang University-Quzhou, Quzhou 324000, China
Nan Xu College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Institute of Zhejiang University-Quzhou, Quzhou 324000, China
Xiaoyong Cao College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Institute of Zhejiang University-Quzhou, Quzhou 324000, China
Shengfu Chen College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China
Yao Shi College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Key Laboratory of Biomass Chemical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310058, China
Yi He College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310058, China Institute of Zhejiang University-Quzhou, Quzhou 324000, China Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, United States
Peng Zhang MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China State Key Laboratory of Transvascular Implantation Devices, Qidi Road 456, Hangzhou 310058, China
Jian Ji MOE Key Laboratory of Macromolecular Synthesis and Functionalization, Department of Polymer Science and Engineering, Zhejiang University, Hangzhou 310058, China State Key Laboratory of Transvascular Implantation Devices, Qidi Road 456, Hangzhou 310058, China

Collapse

Saha G, Sawmya S, Saha A, Akil MA, Tasnim S, Rahman MS, Rahman MS. PRIEST: predicting viral mutations with immune escape capability of SARS-CoV-2 using temporal evolutionary information. Brief Bioinform 2024;25:bbae218. [PMID: 38742520 PMCID: PMC11091746 DOI: 10.1093/bib/bbae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 04/04/2024] [Accepted: 04/06/2024] [Indexed: 05/16/2024] Open

Ashrafzadeh S, Golding GB, Ilie S, Ilie L. Scoring alignments by embedding vector similarity. Brief Bioinform 2024;25:bbae178. [PMID: 38695119 PMCID: PMC11063651 DOI: 10.1093/bib/bbae178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 03/20/2024] [Accepted: 03/31/2024] [Indexed: 05/05/2024] Open

Sun J, Qu J, Zhao C, Zhang X, Liu X, Wang J, Wei C, Liu X, Wang M, Zeng P, Tang X, Ling X, Qing L, Jiang S, Chen J, Chen TSR, Kuang Y, Gao J, Zeng X, Huang D, Yuan Y, Fan L, Yu H, Ding J. Precise prediction of phase-separation key residues by machine learning. Nat Commun 2024;15:2662. [PMID: 38531854 DOI: 10.1038/s41467-024-46901-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 03/13/2024] [Indexed: 03/28/2024] Open

Affiliation(s)

Jun Sun Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Jiale Qu RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Cai Zhao RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Xinyao Zhang RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Xinyu Liu RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Jia Wang RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, 511436, China
Chao Wei RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Xinyi Liu RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Mulan Wang RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Pengguihang Zeng RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Xiuxiao Tang RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Xiaoru Ling RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Li Qing RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Shaoshuai Jiang RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Jiahao Chen RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
Tara S R Chen Department of Rehabilitation Medicine, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, Guangdong, 518107, China
Yalan Kuang Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
Jinhang Gao Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
Xiaoxi Zeng Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
Dongfeng Huang Department of Rehabilitation Medicine, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, Guangdong, 518107, China
Yong Yuan Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China. Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
Lili Fan Guangzhou Key Laboratory of Formula-Pattern of Traditional Chinese Medicine, School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China.
Haopeng Yu Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China. Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
Junjun Ding Department of Thoracic Surgery and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China. Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China. RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China. Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China. Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China. Department of Rehabilitation Medicine, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, Guangdong, 518107, China.

Collapse

Pogány D, Antal P. Towards explainable interaction prediction: Embedding biological hierarchies into hyperbolic interaction space. PLoS One 2024;19:e0300906. [PMID: 38512848 PMCID: PMC10956837 DOI: 10.1371/journal.pone.0300906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/06/2024] [Indexed: 03/23/2024] Open

Rehman A, Mujahid M, Saba T, Jeon G. Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry. Funct Integr Genomics 2024;24:23. [PMID: 38305949 DOI: 10.1007/s10142-024-01289-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/22/2023] [Accepted: 01/02/2024] [Indexed: 02/03/2024]

Abstract

With recent advances in precision medicine and healthcare computing, there is an enormous demand for developing machine learning algorithms in genomics to enhance the rapid analysis of disease disorders. Technological advancement in genomics and imaging provides clinicians with enormous amounts of data, but prediction is still mostly subjective, resulting in problematic medical treatment. Machine learning is being employed in several domains of the healthcare sector, encompassing clinical research, early disease identification, and medicinal innovation with a historical perspective. The main objective of this study is to detect patients who, based on several medical standards, are more susceptible to having a genetic disorder. A genetic disease prediction algorithm was employed, leveraging the patient's health history to evaluate the probability of diagnosing a genetic disorder. We developed a computationally efficient machine learning approach to predict the overall lifespan of patients with a genomics disorder and to classify and predict patients with a genetic disease. The SVM, RF, and ETC are stacked using two-layer meta-estimators to develop the proposed model. The first layer comprises all the baseline models employed to predict the outcomes based on the dataset. The second layer comprises a component known as a meta-classifier. Results from the experiment indicate that the model achieved an accuracy of 90.45% and a recall score of 90.19%. The area under the curve (AUC) for mitochondrial diseases is 98.1%; for multifactorial diseases, it is 97.5%; and for single-gene inheritance, it is 98.8%. The proposed approach presents a novel method for predicting patient prognosis in a manner that is unbiased, accurate, and comprehensive. The proposed approach outperforms human professionals using the current clinical standard for genetic disease classification in terms of identification accuracy. The implementation of stacked will significantly improve the field of biomedical research by improving the anticipation of genetic diseases.

Collapse

Flamholz ZN, Biller SJ, Kelly L. Large language models improve annotation of prokaryotic viral proteins. Nat Microbiol 2024;9:537-549. [PMID: 38287147 DOI: 10.1038/s41564-023-01584-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 12/08/2023] [Indexed: 01/31/2024]

Xiang X, Gao J, Ding Y. DeepPPThermo: A Deep Learning Framework for Predicting Protein Thermostability Combining Protein-Level and Amino Acid-Level Features. J Comput Biol 2024;31:147-160. [PMID: 38100126 DOI: 10.1089/cmb.2023.0097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024] Open

Huang WC, Lin WT, Hung MS, Lee JC, Tung CW. Decrypting orphan GPCR drug discovery via multitask learning. J Cheminform 2024;16:10. [PMID: 38263092 PMCID: PMC10804799 DOI: 10.1186/s13321-024-00806-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open

Zhao M, Lei C, Zhou K, Huang Y, Fu C, Yang S, Zhang Z. POOE: predicting oomycete effectors based on a pre-trained large protein language model. mSystems 2024;9:e0100423. [PMID: 38078741 PMCID: PMC10804963 DOI: 10.1128/msystems.01004-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/23/2023] [Indexed: 01/24/2024] Open

Abstract

Oomycetes are fungus-like eukaryotic microorganisms which can cause catastrophic diseases in many plants. Successful infection of oomycetes depends highly on their effector proteins that are secreted into plant cells to subvert plant immunity. Thus, systematic identification of effectors from the oomycete proteomes remains an initial but crucial step in understanding plant-pathogen relationships. However, the number of experimentally identified oomycete effectors is still limited. Currently, only a few bioinformatics predictors exist to detect potential effectors, and their prediction performance needs to be improved. Here, we used the sequence embeddings from a pre-trained large protein language model (ProtTrans) as input and developed a support vector machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance with an area under the precision-recall curve of 0.804 (area under the receiver operating characteristic curve = 0.893, accuracy = 0.874, precision = 0.777, recall = 0.684, and specificity = 0.936) in the fivefold cross-validation, considerably outperforming various combinations of popular machine learning algorithms and other commonly used sequence encoding schemes. A similar prediction performance was also observed in the independent test. Compared with the existing oomycete effector prediction methods, POOE provided very competitive and promising performance, suggesting that ProtTrans effectively captures rich protein semantic information and dramatically improves the prediction task. We anticipate that POOE can accelerate the identification of oomycete effectors and provide new hints to systematically understand the functional roles of effectors in plant-pathogen interactions. The web server of POOE is freely accessible at http://zzdlab.com/pooe/index.php. The corresponding source codes and data sets are also available at https://github.com/zzdlabzm/POOE.IMPORTANCEIn this work, we use the sequence representations from a pre-trained large protein language model (ProtTrans) as input and develop a Support Vector Machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance in the independent test set, considerably outperforming existing oomycete effector prediction methods. We expect that this new bioinformatics tool will accelerate the identification of oomycete effectors and further guide the experimental efforts to interrogate the functional roles of effectors in plant-pathogen interaction.

Collapse

Wu S, Feng T, Tang W, Qi C, Gao J, He X, Wang J, Zhou H, Fang Z. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform 2024;25:bbae085. [PMID: 38487846 PMCID: PMC10940841 DOI: 10.1093/bib/bbae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/26/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open

Abstract

Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model-based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing on gut metagenomes from probiotic-treated individuals, it revealed the probioticity of intervention strains-derived bins and other probiotic-associated bins beyond the training data, such as a plasmid-like bin. Analyses of these bins revealed various probiotic mechanisms and bai operon as probiotic Ruminococcaceae's potential marker. In different health-disease cohorts, these bins were more common in healthy individuals, signifying their probiotic role, but relevant health predictions based on the abundance profiles of these bins faced cross-disease challenges. To better understand the heterogeneous nature of probiotics, we used metaProbiotics to construct a comprehensive probiotic genome set from global gut metagenomic data. Module analysis of this set shows that diseased individuals often lack certain probiotic gene modules, with significant variation of the missing modules across different diseases. Additionally, different gene modules on the same probiotic have heterogeneous effects on various diseases. We thus believe that gene function integrity of the probiotic community is more crucial in maintaining gut homeostasis than merely increasing specific gene abundance, and adding probiotics indiscriminately might not boost health. We expect that the innovative language model-based metaProbiotics tool will promote novel probiotic discovery using large-scale metagenomic data and facilitate systematic research on bacterial probiotic effects. The metaProbiotics program can be freely downloaded at https://github.com/zhenchengfang/metaProbiotics.

Collapse

Xing H, Cai P, Liu D, Han M, Liu J, Le Y, Zhang D, Hu QN. High-throughput prediction of enzyme promiscuity based on substrate-product pairs. Brief Bioinform 2024;25:bbae089. [PMID: 38487850 PMCID: PMC10940840 DOI: 10.1093/bib/bbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/20/2024] [Accepted: 02/03/2024] [Indexed: 03/18/2024] Open

Yang X, Wuchty S, Liang Z, Ji L, Wang B, Zhu J, Zhang Z, Dong Y. Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM. Brief Bioinform 2024;25:bbae005. [PMID: 38279649 PMCID: PMC10818167 DOI: 10.1093/bib/bbae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/25/2023] [Accepted: 01/01/2021] [Indexed: 01/28/2024] Open

Erten M. MehNet: a vigesimal-based model by amino acid melting points generates unique ID numbers for protein sequences. J Biomol Struct Dyn 2024:1-7. [PMID: 38230442 DOI: 10.1080/07391102.2024.2302937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/02/2024] [Indexed: 01/18/2024]

Liu J, Yang M, Yu Y, Xu H, Li K, Zhou X. Large language models in bioinformatics: applications and perspectives. ARXIV 2024:arXiv:2401.04155v1. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Hosseini S, Golding GB, Ilie L. Seq-InSite: sequence supersedes structure for protein interaction site prediction. Bioinformatics 2024;40:btad738. [PMID: 38212995 PMCID: PMC10796176 DOI: 10.1093/bioinformatics/btad738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/17/2023] [Accepted: 01/10/2024] [Indexed: 01/13/2024] Open

Liu T, Song C, Wang C. NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024;21:1472-1488. [PMID: 38303473 DOI: 10.3934/mbe.2024063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]

Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024;2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]

Chen HM, Liu JX, Liu D, Hao GF, Yang GF. Human-virus protein-protein interactions maps assist in revealing the pathogenesis of viral infection. Rev Med Virol 2024;34:e2517. [PMID: 38282401 DOI: 10.1002/rmv.2517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/12/2023] [Accepted: 01/16/2024] [Indexed: 01/30/2024]

Qiu W, Liang Q, Yu L, Xiao X, Qiu W, Lin W. LSTM-SAGDTA: Predicting Drug-target Binding Affinity with an Attention Graph Neural Network and LSTM Approach. Curr Pharm Des 2024;30:468-476. [PMID: 38323613 PMCID: PMC11071654 DOI: 10.2174/0113816128282837240130102817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/08/2024]

Alquran H, Al Fahoum A, Zyout A, Abu Qasmieh I. A comprehensive framework for advanced protein classification and function prediction using synergistic approaches: Integrating bispectral analysis, machine learning, and deep learning. PLoS One 2023;18:e0295805. [PMID: 38096313 PMCID: PMC10721063 DOI: 10.1371/journal.pone.0295805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open

Abstract

Proteins are fundamental components of diverse cellular systems and play crucial roles in a variety of disease processes. Consequently, it is crucial to comprehend their structure, function, and intricate interconnections. Classifying proteins into families or groups with comparable structural and functional characteristics is a crucial aspect of this comprehension. This classification is crucial for evolutionary research, predicting protein function, and identifying potential therapeutic targets. Sequence alignment and structure-based alignment are frequently ineffective techniques for identifying protein families.This study addresses the need for a more efficient and accurate technique for feature extraction and protein classification. The research proposes a novel method that integrates bispectrum characteristics, deep learning techniques, and machine learning algorithms to overcome the limitations of conventional methods. The proposed method uses numbers to represent protein sequences, utilizes bispectrum analysis, uses different topologies for convolutional neural networks to pull out features, and chooses robust features to classify protein families. The goal is to outperform existing methods for identifying protein families, thereby enhancing classification metrics. The materials consist of numerous protein datasets, whereas the methods incorporate bispectrum characteristics and deep learning strategies. The results of this study demonstrate that the proposed method for identifying protein families is superior to conventional approaches. Significantly enhanced quality metrics demonstrated the efficacy of the combined bispectrum and deep learning approaches. These findings have the potential to advance the field of protein biology and facilitate pharmaceutical innovation. In conclusion, this study presents a novel method that employs bispectrum characteristics and deep learning techniques to improve the precision and efficiency of protein family identification. The demonstrated advancements in classification metrics demonstrate this method's applicability to numerous scientific disciplines. This furthers our understanding of protein function and its implications for disease and treatment.

Collapse

Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023;13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open

Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023;3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]

Ming Z, Chen X, Wang S, Liu H, Yuan Z, Wu M, Xia H. HostNet: improved sequence representation in deep neural networks for virus-host prediction. BMC Bioinformatics 2023;24:455. [PMID: 38041071 PMCID: PMC10691023 DOI: 10.1186/s12859-023-05582-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 11/24/2023] [Indexed: 12/03/2023] Open

Abstract

BACKGROUND

The escalation of viruses over the past decade has highlighted the need to determine their respective hosts, particularly for emerging ones that pose a potential menace to the welfare of both human and animal life. Yet, the traditional means of ascertaining the host range of viruses, which involves field surveillance and laboratory experiments, is a laborious and demanding undertaking. A computational tool with the capability to reliably predict host ranges for novel viruses can provide timely responses in the prevention and control of emerging infectious diseases. The intricate nature of viral-host prediction involves issues such as data imbalance and deficiency. Therefore, developing highly accurate computational tools capable of predicting virus-host associations is a challenging and pressing demand.

RESULTS

To overcome the challenges of virus-host prediction, we present HostNet, a deep learning framework that utilizes a Transformer-CNN-BiGRU architecture and two enhanced sequence representation modules. The first module, k-mer to vector, pre-trains a background vector representation of k-mers from a broad range of virus sequences to address the issue of data deficiency. The second module, an adaptive sliding window, truncates virus sequences of various lengths to create a uniform number of informative and distinct samples for each sequence to address the issue of data imbalance. We assess HostNet's performance on a benchmark dataset of "Rabies lyssavirus" and an in-house dataset of "Flavivirus". Our results show that HostNet surpasses the state-of-the-art deep learning-based method in host-prediction accuracies and F1 score. The enhanced sequence representation modules, significantly improve HostNet's training generalization, performance in challenging classes, and stability.

CONCLUSION

HostNet is a promising framework for predicting virus hosts from genomic sequences, addressing challenges posed by sparse and varying-length virus sequence data. Our results demonstrate its potential as a valuable tool for virus-host prediction in various biological contexts. Virus-host prediction based on genomic sequences using deep neural networks is a promising approach to identifying their potential hosts accurately and efficiently, with significant impacts on public health, disease prevention, and vaccine development.

Collapse

Przybyszewski J, Malawski M, Lichołai S. GraphTar: applying word2vec and graph neural networks to miRNA target prediction. BMC Bioinformatics 2023;24:436. [PMID: 37978418 PMCID: PMC10657114 DOI: 10.1186/s12859-023-05564-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 11/09/2023] [Indexed: 11/19/2023] Open

Chebanov DK, Misyurin VA, Shubina IZ. An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer. FRONTIERS IN BIOINFORMATICS 2023;3:1225149. [PMID: 38025397 PMCID: PMC10666046 DOI: 10.3389/fbinf.2023.1225149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 10/02/2023] [Indexed: 12/01/2023] Open

Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023;24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023;6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open

Fu L, Li M, Lv J, Yang C, Zhang Z, Qin S, Li W, Wang X, Chen L. Deep neural network for discovering metabolism-related biomarkers for lung adenocarcinoma. Front Endocrinol (Lausanne) 2023;14:1270772. [PMID: 37955007 PMCID: PMC10634586 DOI: 10.3389/fendo.2023.1270772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 10/03/2023] [Indexed: 11/14/2023] Open

Shan W, Chen L, Xu H, Zhong Q, Xu Y, Yao H, Lin K, Li X. GcForest-based compound-protein interaction prediction model and its application in discovering small-molecule drugs targeting CD47. Front Chem 2023;11:1292869. [PMID: 37927570 PMCID: PMC10623438 DOI: 10.3389/fchem.2023.1292869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open

Song N, Dong R, Pu Y, Wang E, Xu J, Guo F. Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound-protein interactions. J Cheminform 2023;15:97. [PMID: 37838703 PMCID: PMC10576287 DOI: 10.1186/s13321-023-00767-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/28/2023] [Indexed: 10/16/2023] Open