Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 2022. [PMID: 34232869 DOI: 10.1101/2020.07.12.199554] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]

For:	Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 2022. [PMID: 34232869 DOI: 10.1101/2020.07.12.199554] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]

Number

Cited by Other Article(s)

Feng Z, Huang W, Li H, Zhu H, Kang Y, Li Z. DGCPPISP: a PPI site prediction model based on dynamic graph convolutional network and two-stage transfer learning. BMC Bioinformatics 2024;25:252. [PMID: 39085781 PMCID: PMC11293074 DOI: 10.1186/s12859-024-05864-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/10/2024] [Indexed: 08/02/2024] Open

Volzhenin K, Bittner L, Carbone A. SENSE-PPI reconstructs interactomes within, across, and between species at the genome scale. iScience 2024;27:110371. [PMID: 39055916 PMCID: PMC11269938 DOI: 10.1016/j.isci.2024.110371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/04/2024] [Accepted: 06/21/2024] [Indexed: 07/28/2024] Open

Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024;64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]

Schmid EW, Walter JC. Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.09.588596. [PMID: 38645019 PMCID: PMC11030396 DOI: 10.1101/2024.04.09.588596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]

Wang J, Chen S, Yuan Q, Chen J, Li D, Wang L, Yang Y. Predicting the effects of mutations on protein solubility using graph convolution network and protein language model representation. J Comput Chem 2024;45:436-445. [PMID: 37933773 DOI: 10.1002/jcc.27249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/11/2023] [Accepted: 10/21/2023] [Indexed: 11/08/2023]

Dong Q, Wang S, Miao Y, Luo H, Weng Z, Yu L. Novel antimicrobial peptides against Cutibacterium acnes designed by deep learning. Sci Rep 2024;14:4529. [PMID: 38402320 PMCID: PMC10894229 DOI: 10.1038/s41598-024-55205-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/21/2024] [Indexed: 02/26/2024] Open

Lee J, Jun DW, Song I, Kim Y. DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning. J Cheminform 2024;16:14. [PMID: 38297330 PMCID: PMC10832108 DOI: 10.1186/s13321-024-00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open

Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Notin P, Marks DS, Weitzman R, Gal Y. ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.06.570473. [PMID: 38106034 PMCID: PMC10723423 DOI: 10.1101/2023.12.06.570473] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023;3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]

Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]

Urhan A, Cosma BM, Earl AM, Manson AL, Abeel T. SAP: Synteny-aware gene function prediction for bacteria using protein embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.02.539034. [PMID: 37205418 PMCID: PMC10187222 DOI: 10.1101/2023.05.02.539034] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Avraham O, Tsaban T, Ben-Aharon Z, Tsaban L, Schueler-Furman O. Protein language models can capture protein quaternary state. BMC Bioinformatics 2023;24:433. [PMID: 37964216 PMCID: PMC10647083 DOI: 10.1186/s12859-023-05549-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open

Abstract

BACKGROUND

Determining a protein's quaternary state, i.e. the number of monomers in a functional unit, is a critical step in protein characterization. Many proteins form multimers for their activity, and over 50% are estimated to naturally form homomultimers. Experimental quaternary state determination can be challenging and require extensive work. To complement these efforts, a number of computational tools have been developed for quaternary state prediction, often utilizing experimentally validated structural information. Recently, dramatic advances have been made in the field of deep learning for predicting protein structure and other characteristics. Protein language models, such as ESM-2, that apply computational natural-language models to proteins successfully capture secondary structure, protein cell localization and other characteristics, from a single sequence. Here we hypothesize that information about the protein quaternary state may be contained within protein sequences as well, allowing us to benefit from these novel approaches in the context of quaternary state prediction.

RESULTS

We generated ESM-2 embeddings for a large dataset of proteins with quaternary state labels from the curated QSbio dataset. We trained a model for quaternary state classification and assessed it on a non-overlapping set of distinct folds (ECOD family level). Our model, named QUEEN (QUaternary state prediction using dEEp learNing), performs worse than approaches that include information from solved crystal structures. However, it successfully learned to distinguish multimers from monomers, and predicts the specific quaternary state with moderate success, better than simple sequence similarity-based annotation transfer. Our results demonstrate that complex, quaternary state related information is included in such embeddings.

CONCLUSIONS

QUEEN is the first to investigate the power of embeddings for the prediction of the quaternary state of proteins. As such, it lays out strengths as well as limitations of a sequence-based protein language model approach, compared to structure-based approaches. Since it does not require any structural information and is fast, we anticipate that it will be of wide use both for in-depth investigation of specific systems, as well as for studies of large sets of protein sequences. A simple colab implementation is available at: https://colab.

RESEARCH

google.com/github/Furman-Lab/QUEEN/blob/main/QUEEN_prediction_notebook.ipynb .

Collapse

Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023;24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open

Riedl M, Mukherjee S, Gauthier M. Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma. Mol Pharm 2023;20:4984-4993. [PMID: 37656906 DOI: 10.1021/acs.molpharmaceut.3c00129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]

Huang Y, Huang HY, Chen Y, Lin YCD, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, Ma K, Cheng YN, Lee TY, Huang HD. A Robust Drug-Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int J Mol Sci 2023;24:14061. [PMID: 37762364 PMCID: PMC10531393 DOI: 10.3390/ijms241814061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/27/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open

Affiliation(s)

Yixian Huang School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Hsi-Yuan Huang School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Yigang Chen School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Yang-Chi-Dung Lin School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Lantian Yao Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Tianxiu Lin School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Junlin Leng School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Yuan Chang Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Yuntian Zhang Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Zihao Zhu School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Kun Ma School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)
Yeong-Nan Cheng Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
Tzong-Yi Lee Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan; (Y.-N.C.)
Hsien-Da Huang School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.H.); (Y.C.); (J.L.) Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (L.Y.); (Y.C.)

Collapse

Lee CH, Huh J, Buckley PR, Jang M, Pinho MP, Fernandes RA, Antanaviciute A, Simmons A, Koohy H. A robust deep learning workflow to predict CD8 + T-cell epitopes. Genome Med 2023;15:70. [PMID: 37705109 PMCID: PMC10498576 DOI: 10.1186/s13073-023-01225-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 08/30/2023] [Indexed: 09/15/2023] Open

Abstract

BACKGROUND

T-cells play a crucial role in the adaptive immune system by triggering responses against cancer cells and pathogens, while maintaining tolerance against self-antigens, which has sparked interest in the development of various T-cell-focused immunotherapies. However, the identification of antigens recognised by T-cells is low-throughput and laborious. To overcome some of these limitations, computational methods for predicting CD8 + T-cell epitopes have emerged. Despite recent developments, most immunogenicity algorithms struggle to learn features of peptide immunogenicity from small datasets, suffer from HLA bias and are unable to reliably predict pathology-specific CD8 + T-cell epitopes.

METHODS

We developed TRAP (T-cell recognition potential of HLA-I presented peptides), a robust deep learning workflow for predicting CD8 + T-cell epitopes from MHC-I presented pathogenic and self-peptides. TRAP uses transfer learning, deep learning architecture and MHC binding information to make context-specific predictions of CD8 + T-cell epitopes. TRAP also detects low-confidence predictions for peptides that differ significantly from those in the training datasets to abstain from making incorrect predictions. To estimate the immunogenicity of pathogenic peptides with low-confidence predictions, we further developed a novel metric, RSAT (relative similarity to autoantigens and tumour-associated antigens), as a complementary to 'dissimilarity to self' from cancer studies.

RESULTS

TRAP was used to identify epitopes from glioblastoma patients as well as SARS-CoV-2 peptides, and it outperformed other algorithms in both cancer and pathogenic settings. TRAP was especially effective at extracting immunogenicity-associated properties from restricted data of emerging pathogens and translating them onto related species, as well as minimising the loss of likely epitopes in imbalanced datasets. We also demonstrated that the novel metric termed RSAT was able to estimate immunogenic of pathogenic peptides of various lengths and species. TRAP implementation is available at: https://github.com/ChloeHJ/TRAP .

CONCLUSIONS

This study presents a novel computational workflow for accurately predicting CD8 + T-cell epitopes to foster a better understanding of antigen-specific T-cell response and the development of effective clinical therapeutics.

Collapse

Affiliation(s)

Chloe H Lee MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
Jaesung Huh Visual Geometry Group, Department of Engineering Science, University of Oxford, Oxford, OX2 6NN, UK
Paul R Buckley MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
Myeongjun Jang Intelligent Systems Lab, Department of Computer Science, University of Oxford, Oxford, OX1 3QG, UK
Mariana Pereira Pinho MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
Ricardo A Fernandes Chinese Academy of Medical Sciences (CAMS) Oxford Institute (COI), University of Oxford, Oxford, OX3 7BN, UK
Agne Antanaviciute MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK
Alison Simmons MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK Translational Gastroenterology Unit, John Radcliffe Hospital, Oxford, OX3 9DS, UK
Hashem Koohy MRC Human Immunology Unit, Medical Research Council (MRC) Weatherall Institute of Molecular Medicine (WIMM), John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK. MRC WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford, OX3 9DS, UK. Alan Turning Fellow in Health and Medicine, The Alan Turing Institute, London, UK.

Collapse

Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat Genet 2023;55:1512-1522. [PMID: 37563329 PMCID: PMC10484790 DOI: 10.1038/s41588-023-01465-0] [Citation(s) in RCA: 62] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 07/05/2023] [Indexed: 08/12/2023]

Zhao Y, He B, Xu F, Li C, Xu Z, Su X, He H, Huang Y, Rossjohn J, Song J, Yao J. DeepAIR: A deep learning framework for effective integration of sequence and 3D structure to enable adaptive immune receptor analysis. SCIENCE ADVANCES 2023;9:eabo5128. [PMID: 37556545 PMCID: PMC10411891 DOI: 10.1126/sciadv.abo5128] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 07/06/2023] [Indexed: 08/11/2023]

Ghazikhani H, Butler G. Enhanced identification of membrane transport proteins: a hybrid approach combining ProtBERT-BFD and convolutional neural networks. J Integr Bioinform 2023;0:jib-2022-0055. [PMID: 37497772 PMCID: PMC10389051 DOI: 10.1515/jib-2022-0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open

Li Y, Yao Y, Xia Y, Tang M. Searching for protein variants with desired properties using deep generative models. BMC Bioinformatics 2023;24:297. [PMID: 37480001 PMCID: PMC10362698 DOI: 10.1186/s12859-023-05415-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 07/17/2023] [Indexed: 07/23/2023] Open

Abstract

BACKGROUND

Protein engineering aims to improve the functional properties of existing proteins to meet people's needs. Current deep learning-based models have captured evolutionary, functional, and biochemical features contained in amino acid sequences. However, the existing generative models need to be improved when capturing the relationship between amino acid sites on longer sequences. At the same time, the distribution of protein sequences in the homologous family has a specific positional relationship in the latent space. We want to use this relationship to search for new variants directly from the vicinity of better-performing varieties.

RESULTS

To improve the representation learning ability of the model for longer sequences and the similarity between the generated sequences and the original sequences, we propose a temporal variational autoencoder (T-VAE) model. T-VAE consists of an encoder and a decoder. The encoder expands the receptive field of neurons in the network structure by dilated causal convolution, thereby improving the encoding representation ability of longer sequences. The decoder decodes the sampled data into variants closely resembling the original sequence.

CONCLUSION

Compared to other models, the person correlation coefficient between the predicted values of protein fitness obtained by T-VAE and the truth values was higher, and the mean absolute deviation was lower. In addition, the T-VAE model has a better representation learning ability for longer sequences when comparing the encoding of protein sequences of different lengths. These results show that our model has more advantages in representation learning for longer sequences. To verify the model's generative effect, we also calculate the sequence identity between the generated data and the input data. The sequence identity obtained by T-VAE improved by 12.9% compared to the baseline model.

Collapse

Oliveira GB, Pedrini H, Dias Z. TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics 2023;24:242. [PMID: 37291492 DOI: 10.1186/s12859-023-05375-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open

Abstract

BACKGROUND

Although the development of sequencing technologies has provided a large number of protein sequences, the analysis of functions that each one plays is still difficult due to the efforts of laboratorial methods, making necessary the usage of computational methods to decrease this gap. As the main source of information available about proteins is their sequences, approaches that can use this information, such as classification based on the patterns of the amino acids and the inference based on sequence similarity using alignment tools, are able to predict a large collection of proteins. The methods available in the literature that use this type of feature can achieve good results, however, they present restrictions of protein length as input to their models. In this work, we present a new method, called TEMPROT, based on the fine-tuning and extraction of embeddings from an available architecture pre-trained on protein sequences. We also describe TEMPROT+, an ensemble between TEMPROT and BLASTp, a local alignment tool that analyzes sequence similarity, which improves the results of our former approach.

RESULTS

The evaluation of our proposed classifiers with the literature approaches has been conducted on our dataset, which was derived from CAFA3 challenge database. Both TEMPROT and TEMPROT+ achieved competitive results on [Formula: see text], [Formula: see text], AuPRC and IAuPRC metrics on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) ontologies compared to state-of-the-art models, with the main results equal to 0.581, 0.692 and 0.662 of [Formula: see text] on BP, CC and MF, respectively.

CONCLUSIONS

The comparison with the literature showed that our model presented competitive results compared the state-of-the-art approaches considering the amino acid sequence pattern recognition and homology analysis. Our model also presented improvements related to the input size that the model can use to train compared to the literature methods.

Collapse

Rios-Martinez C, Bhattacharya N, Amini AP, Crawford L, Yang KK. Deep self-supervised learning for biosynthetic gene cluster detection and product classification. PLoS Comput Biol 2023;19:e1011162. [PMID: 37220151 DOI: 10.1371/journal.pcbi.1011162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 05/07/2023] [Indexed: 05/25/2023] Open

Kim Y, Kwon J. AttSec: protein secondary structure prediction by capturing local patterns from attention map. BMC Bioinformatics 2023;24:183. [PMID: 37142993 PMCID: PMC10161504 DOI: 10.1186/s12859-023-05310-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Accepted: 04/27/2023] [Indexed: 05/06/2023] Open

Abstract

BACKGROUND

Protein secondary structures that link simple 1D sequences to complex 3D structures can be used as good features for describing the local properties of protein, but also can serve as key features for predicting the complex 3D structures of protein. Thus, it is very important to accurately predict the secondary structure of the protein, which contains a local structural property assigned by the pattern of hydrogen bonds formed between amino acids. In this study, we accurately predict protein secondary structure by capturing the local patterns of protein. For this objective, we present a novel prediction model, AttSec, based on transformer architecture. In particular, AttSec extracts self-attention maps corresponding to pairwise features between amino acid embeddings and passes them through 2D convolution blocks to capture local patterns. In addition, instead of using additional evolutionary information, it uses protein embedding as an input, which is generated by a language model.

RESULTS

For the ProteinNet DSSP8 dataset, our model showed 11.8% better performance on the entire evaluation datasets compared with other no-evolutionary-information-based models. For the NetSurfP-2.0 DSSP8 dataset, it showed 1.2% better performance on average. There was an average performance improvement of 9.0% for the ProteinNet DSSP3 dataset and an average of 0.7% for the NetSurfP-2.0 DSSP3 dataset.

CONCLUSION

We accurately predict protein secondary structure by capturing the local patterns of protein. For this objective, we present a novel prediction model, AttSec, based on transformer architecture. Although there was no dramatic accuracy improvement compared with other models, the improvement on DSSP8 was greater than that on DSSP3. This result implies that using our proposed pairwise feature could have a remarkable effect for several challenging tasks that require finely subdivided classification. Github package URL is https://github.com/youjin-DDAI/AttSec .

Collapse

Jha K, Karmakar S, Saha S. Graph-BERT and language model-based framework for protein-protein interaction identification. Sci Rep 2023;13:5663. [PMID: 37024543 PMCID: PMC10079975 DOI: 10.1038/s41598-023-31612-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 03/14/2023] [Indexed: 04/08/2023] Open

Wang S, You R, Liu Y, Xiong Y, Zhu S. NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:349-358. [PMID: 37075830 PMCID: PMC10626176 DOI: 10.1016/j.gpb.2023.04.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 02/24/2023] [Accepted: 04/07/2023] [Indexed: 04/21/2023]

Deep learning drives efficient discovery of novel antihypertensive peptides from soybean protein isolate. Food Chem 2023;404:134690. [DOI: 10.1016/j.foodchem.2022.134690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/29/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022]

Boadu F, Cao H, Cheng J. Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.17.524477. [PMID: 36711471 PMCID: PMC9882282 DOI: 10.1101/2023.01.17.524477] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023;12:82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open

Konno N, Iwasaki W. Machine learning enables prediction of metabolic system evolution in bacteria. SCIENCE ADVANCES 2023;9:eadc9130. [PMID: 36630500 PMCID: PMC9833677 DOI: 10.1126/sciadv.adc9130] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]

Sharma L, Deepak A, Ranjan A, Krishnasamy G. A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction. Stat Appl Genet Mol Biol 2023;22:sagmb-2022-0057. [PMID: 37658681 DOI: 10.1515/sagmb-2022-0057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 04/20/2023] [Indexed: 09/03/2023]

Lim H, No KT. Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors. BMC Bioinformatics 2022;23:520. [PMID: 36471239 PMCID: PMC9720949 DOI: 10.1186/s12859-022-05010-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/26/2022] [Indexed: 12/09/2022] Open

Abstract

BACKGROUND

Monoclonal antibodies (mAbs) have been used as therapeutic agents, which must overcome many developability issues after the discovery from in vitro display libraries. Especially, polyreactive mAbs can strongly bind to a specific target and weakly bind to off-target proteins, which leads to poor antibody pharmacokinetics in clinical development. Although early assessment of polyreactive mAbs is important in the early discovery stage, experimental assessments are usually time-consuming and expensive. Therefore, computational approaches for predicting the polyreactivity of single-chain fragment variables (scFvs) in the early discovery stage would be promising for reducing experimental efforts.

RESULTS

Here, we made prediction models for the polyreactivity of scFvs with the known polyreactive antibody features and natural language model descriptors. We predicted 19,426 protein structures of scFvs with trRosetta to calculate the polyreactive antibody features and investigated the classifying performance of each factor for polyreactivity. In the known polyreactive features, the net charge of the CDR2 loop, the tryptophan and glycine residues in CDR-H3, and the lengths of the CDR1 and CDR2 loops, importantly contributed to the performance of the models. Additionally, the hydrodynamic features, such as partial specific volume, gyration radius, and isoelectric points of CDR loops and scFvs, were newly added to improve model performance. Finally, we made the prediction model with a robust performance ([Formula: see text]) with an ensemble learning of the top 3 best models.

CONCLUSION

The prediction models for polyreactivity would help assess polyreactive scFvs in the early discovery stage and our approaches would be promising to develop machine learning models with quantitative data from high throughput assays for antibody screening.

Collapse

Anteghini M, Haja A, Martins dos Santos VA, Schomaker L, Saccenti E. OrganelX web server for sub-peroxisomal and sub-mitochondrial protein localization and peroxisomal target signal detection. Comput Struct Biotechnol J 2022;21:128-133. [PMID: 36544474 PMCID: PMC9747352 DOI: 10.1016/j.csbj.2022.11.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/28/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022] Open

Ismi DP, Pulungan R, Afiahayati. Deep learning for protein secondary structure prediction: Pre and post-AlphaFold. Comput Struct Biotechnol J 2022;20:6271-6286. [PMID: 36420164 PMCID: PMC9678802 DOI: 10.1016/j.csbj.2022.11.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/13/2022] Open

An J, Weng X. Collectively encoding protein properties enriches protein language models. BMC Bioinformatics 2022;23:467. [DOI: 10.1186/s12859-022-05031-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/31/2022] [Indexed: 11/10/2022] Open

Lupo U, Sgarbossa D, Bitbol AF. Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nat Commun 2022;13:6298. [PMID: 36273003 PMCID: PMC9588007 DOI: 10.1038/s41467-022-34032-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open

Salem M, Keshavarzi Arshadi A, Yuan JS. AMPDeep: hemolytic activity prediction of antimicrobial peptides using transfer learning. BMC Bioinformatics 2022;23:389. [PMID: 36163001 PMCID: PMC9511757 DOI: 10.1186/s12859-022-04952-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 09/20/2022] [Indexed: 12/02/2022] Open

Abstract

Background

Deep learning’s automatic feature extraction has proven to give superior performance in many sequence classification tasks. However, deep learning models generally require a massive amount of data to train, which in the case of Hemolytic Activity Prediction of Antimicrobial Peptides creates a challenge due to the small amount of available data.

Results

Three different datasets for hemolysis activity prediction of therapeutic and antimicrobial peptides are gathered and the AMPDeep pipeline is implemented for each. The result demonstrate that AMPDeep outperforms the previous works on all three datasets, including works that use physicochemical features to represent the peptides or those who solely rely on the sequence and use deep learning to learn representation for the peptides. Moreover, a combined dataset is introduced for hemolytic activity prediction to address the problem of sequence similarity in this domain. AMPDeep fine-tunes a large transformer based model on a small amount of peptides and successfully leverages the patterns learned from other protein and peptide databases to assist hemolysis activity prediction modeling.

Conclusions

In this work transfer learning is leveraged to overcome the challenge of small data and a deep learning based model is successfully adopted for hemolysis activity classification of antimicrobial peptides. This model is first initialized as a protein language model which is pre-trained on masked amino acid prediction on many unlabeled protein sequences in a self-supervised manner. Having done so, the model is fine-tuned on an aggregated dataset of labeled peptides in a supervised manner to predict secretion. Through transfer learning, hyper-parameter optimization and selective fine-tuning, AMPDeep is able to achieve state-of-the-art performance on three hemolysis datasets using only the sequence of the peptides. This work assists the adoption of large sequence-based models for peptide classification and modeling tasks in a practical manner.

Collapse

Bernhofer M, Rost B. TMbed: transmembrane proteins predicted through language model embeddings. BMC Bioinformatics 2022;23:326. [PMID: 35941534 PMCID: PMC9358067 DOI: 10.1186/s12859-022-04873-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/03/2022] [Indexed: 12/30/2022] Open

Tubiana J, Schneidman-Duhovny D, Wolfson HJ. ScanNet: A web server for structure-based prediction of protein binding sites with geometric deep learning. J Mol Biol 2022;434:167758. [DOI: 10.1016/j.jmb.2022.167758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/18/2022] [Accepted: 07/19/2022] [Indexed: 11/28/2022]

Prediction of protein-protein interaction using graph neural networks. Sci Rep 2022;12:8360. [PMID: 35589837 PMCID: PMC9120162 DOI: 10.1038/s41598-022-12201-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 04/18/2022] [Indexed: 01/09/2023] Open

Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment. Sci Rep 2022;12:7607. [PMID: 35534620 PMCID: PMC9085874 DOI: 10.1038/s41598-022-11684-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 04/25/2022] [Indexed: 11/09/2022] Open

LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction. Sci Rep 2022;12:6832. [PMID: 35477726 PMCID: PMC9046255 DOI: 10.1038/s41598-022-10775-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/11/2022] [Indexed: 11/27/2022] Open

Learning functional properties of proteins with language models. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00457-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Nambiar A, Rubel T, McCaull J, deVries J, Bedau M. Dropping diversity of products of large US firms: Models and measures. PLoS One 2022;17:e0264330. [PMID: 35294464 PMCID: PMC8926282 DOI: 10.1371/journal.pone.0264330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open

Lee BD, Gitter A, Greene CS, Raschka S, Maguire F, Titus AJ, Kessler MD, Lee AJ, Chevrette MG, Stewart PA, Britto-Borges T, Cofer EM, Yu KH, Carmona JJ, Fertig EJ, Kalinin AA, Signal B, Lengerich BJ, Triche TJ, Boca SM. Ten quick tips for deep learning in biology. PLoS Comput Biol 2022;18:e1009803. [PMID: 35324884 PMCID: PMC8946751 DOI: 10.1371/journal.pcbi.1009803] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Affiliation(s)

Benjamin D. Lee In-Q-Tel Labs, Arlington, Virginia, United States of America School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
Anthony Gitter Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America Morgridge Institute for Research, Madison, Wisconsin, United States of America
Casey S. Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
Sebastian Raschka Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
Finlay Maguire Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Alexander J. Titus University of New Hampshire, Manchester, New Hampshire, United States of America Bioeconomy.XYZ, Manchester, New Hampshire, United States of America
Michael D. Kessler Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
Alexandra J. Lee Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
Marc G. Chevrette Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
Paul Allen Stewart Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
Thiago Britto-Borges Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Heidelberg, Germany
Evan M. Cofer Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America
Kun-Hsing Yu Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
Juan Jose Carmona Philips Healthcare, Cambridge, Massachusetts, United States of America
Elana J. Fertig Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America Department of Biomedical Engineering, Department of Applied Mathematics and Statistics, Convergence Institute, Johns Hopkins University, Baltimore, Maryland, United States of America
Alexandr A. Kalinin Medical Big Data Group, Shenzhen Research Institute of Big Data, Shenzhen, China Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
Brandon Signal School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Australia
Benjamin J. Lengerich Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
Timothy J. Triche Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, United States of America Department of Pediatrics, College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
Simina M. Boca Innovation Center for Biomedical Informatics, Georgetown University Medical Center, District of Columbia, United States of America Department of Oncology, Georgetown University Medical Center, Washington, DC, United States of America Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, United States of America Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Washington, DC, United States of America

Collapse

Lee I, Nam H. Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform 2022;14:5. [PMID: 35135622 PMCID: PMC8822694 DOI: 10.1186/s13321-022-00584-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open

Thumuluri V, Martiny HM, Almagro Armenteros JJ, Salomon J, Nielsen H, Johansen AR. NetSolP: predicting protein solubility in Escherichia coli using language models. Bioinformatics 2022;38:941-946. [PMID: 35088833 DOI: 10.1093/bioinformatics/btab801] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 10/13/2021] [Accepted: 11/23/2021] [Indexed: 11/13/2022] Open

Kalakoti Y, Yadav S, Sundar D. TransDTI: Transformer-Based Language Models for Estimating DTIs and Building a Drug Recommendation Workflow. ACS OMEGA 2022;7:2706-2717. [PMID: 35097268 PMCID: PMC8792915 DOI: 10.1021/acsomega.1c05203] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 12/28/2021] [Indexed: 06/09/2023]

Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proc Natl Acad Sci U S A 2022;119:2113348119. [PMID: 35074909 PMCID: PMC8795500 DOI: 10.1073/pnas.2113348119] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open

Hsu C, Nisonoff H, Fannjiang C, Listgarten J. Learning protein fitness models from evolutionary and assay-labeled data. Nat Biotechnol 2022;40:1114-1122. [PMID: 35039677 DOI: 10.1038/s41587-021-01146-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 11/02/2021] [Indexed: 01/27/2023]