Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019;20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 241] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open

For:	Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019;20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 241] [Impact Index Per Article: 48.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open

Number

Cited by Other Article(s)

Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024;23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open

Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024;23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open

Barrios-Núñez I, Martínez-Redondo G, Medina-Burgos P, Cases I, Fernández R, Rojas A. Decoding functional proteome information in model organisms using protein language models. NAR Genom Bioinform 2024;6:lqae078. [PMID: 38962255 PMCID: PMC11217674 DOI: 10.1093/nargab/lqae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/31/2024] [Accepted: 06/26/2024] [Indexed: 07/05/2024] Open

Paul D, Saha S, Basu S, Chakraborti T. Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox. Sci Rep 2024;14:18736. [PMID: 39134619 PMCID: PMC11319331 DOI: 10.1038/s41598-024-69617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/07/2024] [Indexed: 08/15/2024] Open

Abstract

Monkeypox (Mpox), a zoonotic illness triggered by the monkeypox virus (MPXV), poses a significant threat since it may be transmitted and has no cure. This work introduces a computational method to predict Protein-Protein Interactions (PPIs) during MPXV infection. The objective is to discover prospective drug targets and repurpose current potential Food and Drug Administration (FDA) drugs for therapeutic purposes. In this work, ensemble features, comprising 2-5 node graphlet attributes and protein composition-based features are utilized for Deep Learning (DL) models to predict PPIs. The technique that is used here demonstrated an excellent prediction performance for PPI on both the Human Integrated Protein-Protein Interaction Reference (HIPPIE) and MPXV-Human PPI datasets. In addition, the human protein targets for MPXV have been identified accurately along with the detection of possible therapeutic targets. Furthermore, the validation process included conducting docking research studies on potential FDA drugs like Nicotinamide Adenine Dinucleotide and Hydrogen (NADH), Fostamatinib, Glutamic acid, Cannabidiol, Copper, and Zinc in DrugBank identified via research on drug repurposing and the Drug Consensus Score (DCS) for MPXV. This has been achieved by employing the primary crystal structures of MPXV, which are now accessible. The docking study is also supported by Molecular Dynamics (MD) simulation. The results of our study emphasize the effectiveness of using ensemble feature-based PPI prediction to understand the molecular processes involved in viral infection and to aid in the development of repurposed drugs for emerging infectious diseases such as, but not limited to, Mpox. The source code and link to data used in this work is available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox .

Collapse

Hong L, Hu Z, Sun S, Tang X, Wang J, Tan Q, Zheng L, Wang S, Xu S, King I, Gerstein M, Li Y. Fast, sensitive detection of protein homologs using deep dense retrieval. Nat Biotechnol 2024:10.1038/s41587-024-02353-6. [PMID: 39123049 DOI: 10.1038/s41587-024-02353-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 07/12/2024] [Indexed: 08/12/2024]

Affiliation(s)

Liang Hong Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Zhihang Hu Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Siqi Sun Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China. Shanghai AI Laboratory, Shanghai, China.
Xiangru Tang Department of Computer Science, Yale University, New Haven, CT, USA
Jiuming Wang Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China OneAIM Ltd., Hong Kong SAR, China
Qingxiong Tan Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Liangzhen Zheng Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China Shanghai Zelixir Biotech Company Ltd., Shanghai, China
Sheng Wang Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China Shanghai Zelixir Biotech Company Ltd., Shanghai, China
Sheng Xu Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China Shanghai AI Laboratory, Shanghai, China
Irwin King Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
Mark Gerstein Department of Computer Science, Yale University, New Haven, CT, USA. Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA. Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
Yu Li Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China. Shanghai AI Laboratory, Shanghai, China. Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA. Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA. Broad Institute of MIT and Harvard, Cambridge, MA, USA. The Chinese University of Hong Kong Shenzhen Research Institute, Shenzhen, China.

Collapse

Sun Y, Shen Y. Structure-informed protein language models are robust predictors for variant effects. Hum Genet 2024:10.1007/s00439-024-02695-w. [PMID: 39117802 DOI: 10.1007/s00439-024-02695-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 07/20/2024] [Indexed: 08/10/2024]

Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024;23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]

Ghazikhani H, Butler G. Exploiting protein language models for the precise classification of ion channels and ion transporters. Proteins 2024;92:998-1055. [PMID: 38656743 DOI: 10.1002/prot.26694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]

Wang J, Quan L, Jin Z, Wu H, Ma X, Wang X, Xie J, Pan D, Chen T, Wu T, Lyu Q. MultiModRLBP: A Deep Learning Approach for Multi-Modal RNA-Small Molecule Ligand Binding Sites Prediction. IEEE J Biomed Health Inform 2024;28:4995-5006. [PMID: 38739505 DOI: 10.1109/jbhi.2024.3400521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Abstract

This study aims to tackle the intricate challenge of predicting RNA-small molecule binding sites to explore the potential value in the field of RNA drug targets. To address this challenge, we propose the MultiModRLBP method, which integrates multi-modal features using deep learning algorithms. These features include 3D structural properties at the nucleotide base level of the RNA molecule, relational graphs based on overall RNA structure, and rich RNA semantic information. In our investigation, we gathered 851 interactions between RNA and small molecule ligand from the RNAglib dataset and RLBind training set. Unlike conventional training sets, this collection broadened its scope by including RNA complexes that have the same RNA sequence but change their respective binding sites due to structural differences or the presence of different ligands. This enhancement enables the MultiModRLBP model to more accurately capture subtle changes at the structural level, ultimately improving its ability to discern nuances among similar RNA conformations. Furthermore, we evaluated MultiModRLBP on two classic test sets, Test18 and Test3, highlighting its performance disparities on small molecules based on metal and non-metal ions. Additionally, we conducted a structural sensitivity analysis on specific complex categories, considering RNA instances with varying degrees of structural changes and whether they share the same ligands. The research results indicate that MultiModRLBP outperforms the current state-of-the-art methods on multiple classic test sets, particularly excelling in predicting binding sites for non-metal ions and instances where the binding sites are widely distributed along the sequence. MultiModRLBP also can be used as a potential tool when the RNA structure is perturbed or the RNA experimental tertiary structure is not available. Most importantly, MultiModRLBP exhibits the capability to distinguish binding characteristics of RNA that are structurally diverse yet exhibit sequence similarity. These advancements hold promise in reducing the costs associated with the development of RNA-targeted drugs.

Collapse

Susanty M, Mursalim MKN, Hertadi R, Purwarianti A, LE Rajab T. Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification. Comput Biol Chem 2024;112:108163. [PMID: 39098138 DOI: 10.1016/j.compbiolchem.2024.108163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/02/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024]

Cosentino S, Sriswasdi S, Iwasaki W. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biol 2024;25:195. [PMID: 39054525 PMCID: PMC11270883 DOI: 10.1186/s13059-024-03298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 06/04/2024] [Indexed: 07/27/2024] Open

Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An ontology of amino acid scales for interpretable machine learning. J Mol Biol 2024:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]

Volzhenin K, Bittner L, Carbone A. SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. iScience 2024;27:110371. [PMID: 39055916 PMCID: PMC11269938 DOI: 10.1016/j.isci.2024.110371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/04/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open

Zhao L, Li J, Zhan W, Jiang X, Zhang B. Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation. Sci Rep 2024;14:16488. [PMID: 39020005 PMCID: PMC11255250 DOI: 10.1038/s41598-024-67403-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/10/2024] [Indexed: 07/19/2024] Open

Abstract

Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.

Collapse

Cuturello F, Celoria M, Ansuini A, Cazzaniga A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based Language Models. Bioinformatics 2024;40:btae447. [PMID: 39012369 PMCID: PMC11269464 DOI: 10.1093/bioinformatics/btae447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/19/2024] [Accepted: 07/10/2024] [Indexed: 07/17/2024] Open

Boadu F, Lee A, Cheng J. Deep learning methods for protein function prediction. Proteomics 2024:e2300471. [PMID: 38996351 DOI: 10.1002/pmic.202300471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 06/15/2024] [Accepted: 06/18/2024] [Indexed: 07/14/2024]

Saar KL, Scrutton RM, Bloznelyte K, Morgunov AS, Good LL, Lee AA, Teichmann SA, Knowles TPJ. Protein Condensate Atlas from predictive models of heteromolecular condensate composition. Nat Commun 2024;15:5418. [PMID: 38987300 PMCID: PMC11237133 DOI: 10.1038/s41467-024-48496-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 05/02/2024] [Indexed: 07/12/2024] Open

Yang S, Xu P. HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information. Anal Biochem 2024;690:115523. [PMID: 38552762 DOI: 10.1016/j.ab.2024.115523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]

Wossnig L, Furtmann N, Buchanan A, Kumar S, Greiff V. Best practices for machine learning in antibody discovery and development. Drug Discov Today 2024;29:104025. [PMID: 38762089 DOI: 10.1016/j.drudis.2024.104025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/25/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024]

Si Y, Zou J, Gao Y, Chuai G, Liu Q, Chen L. Foundation models in molecular biology. BIOPHYSICS REPORTS 2024;10:135-151. [PMID: 39027316 PMCID: PMC11252241 DOI: 10.52601/bpr.2024.240006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/04/2024] [Indexed: 07/20/2024] Open

Affiliation(s)

Yunda Si Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
Jiawei Zou Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
Yicheng Gao Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Guohui Chuai Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Qi Liu Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
Luonan Chen Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China

Collapse

Jahn LR, Marquet C, Heinzinger M, Rost B. Protein embeddings predict binding residues in disordered regions. Sci Rep 2024;14:13566. [PMID: 38866950 PMCID: PMC11169622 DOI: 10.1038/s41598-024-64211-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open

Pham NT, Terrance AT, Jeon YJ, Rakkiyappan R, Manavalan B. ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2024;35:102192. [PMID: 38779332 PMCID: PMC11108997 DOI: 10.1016/j.omtn.2024.102192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 04/18/2024] [Indexed: 05/25/2024]

Urhan A, Cosma BM, Earl AM, Manson AL, Abeel T. SAFPred: synteny-aware gene function prediction for bacteria using protein embeddings. Bioinformatics 2024;40:btae328. [PMID: 38775729 PMCID: PMC11147799 DOI: 10.1093/bioinformatics/btae328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 04/08/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024] Open

Abstract

MOTIVATION

Today, we know the function of only a small fraction of the protein sequences predicted from genomic data. This problem is even more salient for bacteria, which represent some of the most phylogenetically and metabolically diverse taxa on Earth. This low rate of bacterial gene annotation is compounded by the fact that most function prediction algorithms have focused on eukaryotes, and conventional annotation approaches rely on the presence of similar sequences in existing databases. However, often there are no such sequences for novel bacterial proteins. Thus, we need improved gene function prediction methods tailored for bacteria. Recently, transformer-based language models-adopted from the natural language processing field-have been used to obtain new representations of proteins, to replace amino acid sequences. These representations, referred to as protein embeddings, have shown promise for improving annotation of eukaryotes, but there have been only limited applications on bacterial genomes.

RESULTS

To predict gene functions in bacteria, we developed SAFPred, a novel synteny-aware gene function prediction tool based on protein embeddings from state-of-the-art protein language models. SAFpred also leverages the unique operon structure of bacteria through conserved synteny. SAFPred outperformed both conventional sequence-based annotation methods and state-of-the-art methods on multiple bacterial species, including for distant homolog detection, where the sequence similarity to the proteins in the training set was as low as 40%. Using SAFPred to identify gene functions across diverse enterococci, of which some species are major clinical threats, we identified 11 previously unrecognized putative novel toxins, with potential significance to human and animal health.

AVAILABILITY AND IMPLEMENTATION

https://github.com/AbeelLab/safpred.

Collapse

Hamamsy T, Morton JT, Blackwell R, Berenberg D, Carriero N, Gligorijevic V, Strauss CEM, Leman JK, Cho K, Bonneau R. Protein remote homology detection and structural alignment using deep learning. Nat Biotechnol 2024;42:975-985. [PMID: 37679542 PMCID: PMC11180608 DOI: 10.1038/s41587-023-01917-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 07/26/2023] [Indexed: 09/09/2023]

Lin S, Yang M, Liu C, Wang Z, Long X. A pretrain-finetune approach for improving model generalizability in outcome prediction of acute respiratory distress syndrome patients. Int J Med Inform 2024;186:105397. [PMID: 38507979 DOI: 10.1016/j.ijmedinf.2024.105397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 02/25/2024] [Indexed: 03/22/2024]

Abstract

BACKGROUND

Early prediction of acute respiratory distress syndrome (ARDS) of critically ill patients in intensive care units (ICUs) has been intensively studied in the past years. Yet a prediction model trained on data from one hospital might not be well generalized to other hospitals. It is therefore essential to develop an accurate and generalizable ARDS prediction model adaptive to different hospital or medical centers.

METHODS

We analyzed electronic medical records of 200,859 and 50,920 hospitalized patients within 24 h after being diagnosed with ARDS from the Philips eICU Institute (eICU-CRD) and the Medical Information Mart for Intensive Care (MIMIC-IV) dataset, respectively. Patients were sorted into three groups, including rapid death, long stay, and recovery, based on their condition or outcome between 24 and 72 h after ARDS diagnosis. To improve prediction performance and generalizability, a "pretrain-finetune" approach was applied, where we pretrained models on the eICU-CRD dataset and performed model finetuning using only a part (35%) of the MIMIC-IV dataset, and then tested the finetuned models on the remaining data from the MIMIC-IV dataset. Well-known machine-learning algorithms, including logistic regression, random forest, extreme gradient boosting, and multilayer perceptron neural networks, were employed to predict ARDS outcomes. Prediction performance was evaluated using the area under the receiver-operating characteristic curve (AUC).

RESULTS

Results show that, in general, multilayer perceptron neural networks outperformed the other models. The use of pretrain-finetune yielded improved performance in predicting ARDS outcomes achieving a micro-AUC of 0.870 for the MIMIC-IV dataset, an improvement of 0.046 over the pretrain model.

CONCLUSIONS

The proposed pretrain-finetune approach can effectively improve model generalizability from one to another dataset in ARDS prediction.

Collapse

Yu H, Luo X. ThermoFinder: A sequence-based thermophilic proteins prediction framework. Int J Biol Macromol 2024;270:132469. [PMID: 38761901 DOI: 10.1016/j.ijbiomac.2024.132469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]

Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024;15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open

Lin B, Luo X, Liu Y, Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform 2024;25:bbae289. [PMID: 39003530 PMCID: PMC11246557 DOI: 10.1093/bib/bbae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/18/2024] [Indexed: 07/15/2024] Open

Chen N, Yu J, Zhe L, Wang F, Li X, Wong KC. TP-LMMSG: a peptide prediction graph neural network incorporating flexible amino acid property representation. Brief Bioinform 2024;25:bbae308. [PMID: 38920345 PMCID: PMC11200197 DOI: 10.1093/bib/bbae308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/28/2024] [Accepted: 06/10/2024] [Indexed: 06/27/2024] Open

Gillani M, Pollastri G. SCLpred-ECL: Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks. Int J Mol Sci 2024;25:5440. [PMID: 38791479 PMCID: PMC11121631 DOI: 10.3390/ijms25105440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/09/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open

García Sánchez N, Ugarte Carro E, Prieto-Santamaría L, Rodríguez-González A. Protein sequence analysis in the context of drug repurposing. BMC Med Inform Decis Mak 2024;24:122. [PMID: 38741115 DOI: 10.1186/s12911-024-02531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open

Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med 2024;151:102860. [PMID: 38552379 DOI: 10.1016/j.artmed.2024.102860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 02/21/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024]

Abstract

Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.

Collapse

Ma W, Bi X, Jiang H, Zhang S, Wei Z. CollaPPI: A Collaborative Learning Framework for Predicting Protein-Protein Interactions. IEEE J Biomed Health Inform 2024;28:3167-3177. [PMID: 38466584 DOI: 10.1109/jbhi.2024.3375621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]

Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024;173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]

Lyu D, Wang X, Chen Y, Wang F. Language model and its interpretability in biomedicine: A scoping review. iScience 2024;27:109334. [PMID: 38495823 PMCID: PMC10940999 DOI: 10.1016/j.isci.2024.109334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024] Open

Chen J, Wu H, Wang N. KEGG orthology prediction of bacterial proteins using natural language processing. BMC Bioinformatics 2024;25:146. [PMID: 38600441 PMCID: PMC11007918 DOI: 10.1186/s12859-024-05766-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 04/03/2024] [Indexed: 04/12/2024] Open

Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024;64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]

Xu S, Onoda A. Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning. J Chem Inf Model 2024;64:2901-2911. [PMID: 37883249 DOI: 10.1021/acs.jcim.3c01202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]

Wang JM, Cui RK, Qian ZK, Yang ZZ, Li Y. Mining channel-regulated peptides from animal venom by integrating sequence semantics and structural information. Comput Biol Chem 2024;109:108027. [PMID: 38340414 DOI: 10.1016/j.compbiolchem.2024.108027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/24/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]

Zhang J, Durham J, Qian Cong. Revolutionizing protein-protein interaction prediction with deep learning. Curr Opin Struct Biol 2024;85:102775. [PMID: 38330793 DOI: 10.1016/j.sbi.2024.102775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 02/10/2024]

Ashrafzadeh S, Golding GB, Ilie S, Ilie L. Scoring alignments by embedding vector similarity. Brief Bioinform 2024;25:bbae178. [PMID: 38695119 PMCID: PMC11063651 DOI: 10.1093/bib/bbae178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 03/20/2024] [Accepted: 03/31/2024] [Indexed: 05/05/2024] Open

Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024;14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open

Yan Y, Li W, Wang S, Huang T. Seq-RBPPred: Predicting RNA-Binding Proteins from Sequence. ACS OMEGA 2024;9:12734-12742. [PMID: 38524500 PMCID: PMC10955590 DOI: 10.1021/acsomega.3c08381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/18/2023] [Accepted: 12/28/2023] [Indexed: 03/26/2024]

Tayebi Z, Ali S, Murad T, Khan I, Patterson M. PseAAC2Vec protein encoding for TCR protein sequence classification. Comput Biol Med 2024;170:107956. [PMID: 38217977 DOI: 10.1016/j.compbiomed.2024.107956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/07/2023] [Accepted: 01/01/2024] [Indexed: 01/15/2024]

Porebski BT, Balmforth M, Browne G, Riley A, Jamali K, Fürst MJLJ, Velic M, Buchanan A, Minter R, Vaughan T, Holliger P. Rapid discovery of high-affinity antibodies via massively parallel sequencing, ribosome display and affinity screening. Nat Biomed Eng 2024;8:214-232. [PMID: 37814006 DOI: 10.1038/s41551-023-01093-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 08/23/2023] [Indexed: 10/11/2023]

Mock M, Langmead CJ, Grandsard P, Edavettal S, Russell A. Recent advances in generative biology for biotherapeutic discovery. Trends Pharmacol Sci 2024;45:255-267. [PMID: 38378385 DOI: 10.1016/j.tips.2024.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/22/2023] [Accepted: 01/05/2024] [Indexed: 02/22/2024]

Palacios A, Acharya P, Peidl A, Beck M, Blanco E, Mishra A, Bawa-Khalfe T, Pakhrin S. SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model. NAR Genom Bioinform 2024;6:lqae011. [PMID: 38327870 PMCID: PMC10849187 DOI: 10.1093/nargab/lqae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 11/17/2023] [Accepted: 01/17/2024] [Indexed: 02/09/2024] Open

Flamholz ZN, Biller SJ, Kelly L. Large language models improve annotation of prokaryotic viral proteins. Nat Microbiol 2024;9:537-549. [PMID: 38287147 PMCID: PMC11311208 DOI: 10.1038/s41564-023-01584-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 12/08/2023] [Indexed: 01/31/2024]

Zhao M, Lei C, Zhou K, Huang Y, Fu C, Yang S, Zhang Z. POOE: predicting oomycete effectors based on a pre-trained large protein language model. mSystems 2024;9:e0100423. [PMID: 38078741 PMCID: PMC10804963 DOI: 10.1128/msystems.01004-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/23/2023] [Indexed: 01/24/2024] Open

Abstract

Oomycetes are fungus-like eukaryotic microorganisms which can cause catastrophic diseases in many plants. Successful infection of oomycetes depends highly on their effector proteins that are secreted into plant cells to subvert plant immunity. Thus, systematic identification of effectors from the oomycete proteomes remains an initial but crucial step in understanding plant-pathogen relationships. However, the number of experimentally identified oomycete effectors is still limited. Currently, only a few bioinformatics predictors exist to detect potential effectors, and their prediction performance needs to be improved. Here, we used the sequence embeddings from a pre-trained large protein language model (ProtTrans) as input and developed a support vector machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance with an area under the precision-recall curve of 0.804 (area under the receiver operating characteristic curve = 0.893, accuracy = 0.874, precision = 0.777, recall = 0.684, and specificity = 0.936) in the fivefold cross-validation, considerably outperforming various combinations of popular machine learning algorithms and other commonly used sequence encoding schemes. A similar prediction performance was also observed in the independent test. Compared with the existing oomycete effector prediction methods, POOE provided very competitive and promising performance, suggesting that ProtTrans effectively captures rich protein semantic information and dramatically improves the prediction task. We anticipate that POOE can accelerate the identification of oomycete effectors and provide new hints to systematically understand the functional roles of effectors in plant-pathogen interactions. The web server of POOE is freely accessible at http://zzdlab.com/pooe/index.php. The corresponding source codes and data sets are also available at https://github.com/zzdlabzm/POOE.IMPORTANCEIn this work, we use the sequence representations from a pre-trained large protein language model (ProtTrans) as input and develop a Support Vector Machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance in the independent test set, considerably outperforming existing oomycete effector prediction methods. We expect that this new bioinformatics tool will accelerate the identification of oomycete effectors and further guide the experimental efforts to interrogate the functional roles of effectors in plant-pathogen interaction.

Collapse

Zhu YH, Liu Z, Liu Y, Ji Z, Yu DJ. ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction. Brief Bioinform 2024;25:bbae040. [PMID: 38349057 PMCID: PMC10939370 DOI: 10.1093/bib/bbae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/02/2024] [Accepted: 01/22/2024] [Indexed: 02/15/2024] Open