Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Reeb J, Hecht M, Mahlich Y, Bromberg Y, Rost B. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease. PLoS Comput Biol 2016;12:e1005047. [PMID: 27536940 PMCID: PMC4990455 DOI: 10.1371/journal.pcbi.1005047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 07/04/2016] [Indexed: 11/19/2022] Open

For:	Reeb J, Hecht M, Mahlich Y, Bromberg Y, Rost B. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease. PLoS Comput Biol 2016;12:e1005047. [PMID: 27536940 PMCID: PMC4990455 DOI: 10.1371/journal.pcbi.1005047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 07/04/2016] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Embeddings from protein language models predict conservation and variant effects. Hum Genet 2022;141:1629-1647. [PMID: 34967936 PMCID: PMC8716573 DOI: 10.1007/s00439-021-02411-y] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]

Abstract

The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.

Collapse

Affiliation(s)

Céline Marquet Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Michael Heinzinger Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Tobias Olenyi Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Christian Dallago Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Kyra Erckert Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Michael Bernhofer Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany

Collapse

Savojardo C, Babbi G, Martelli PL, Casadio R. Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes. Front Mol Biosci 2021;8:617016. [PMID: 34026820 PMCID: PMC8138129 DOI: 10.3389/fmolb.2021.617016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/09/2021] [Indexed: 12/23/2022] Open

Abstract

Human genome resequencing projects provide an unprecedented amount of data about single-nucleotide variations occurring in protein-coding regions and often leading to observable changes in the covalent structure of gene products. For many of these variations, links to Online Mendelian Inheritance in Man (OMIM) genetic diseases are available and are reported in many databases that are collecting human variation data such as Humsavar. However, the current knowledge on the molecular mechanisms that are leading to diseases is, in many cases, still limited. For understanding the complex mechanisms behind disease insurgence, the identification of putative models, when considering the protein structure and chemico-physical features of the variations, can be useful in many contexts, including early diagnosis and prognosis. In this study, we investigate the occurrence and distribution of human disease–related variations in the context of Pfam domains. The aim of this study is the identification and characterization of Pfam domains that are statistically more likely to be associated with disease-related variations. The study takes into consideration 2,513 human protein sequences with 22,763 disease-related variations. We describe patterns of disease-related variation types in biunivocal relation with Pfam domains, which are likely to be possible markers for linking Pfam domains to OMIM diseases. Furthermore, we take advantage of the specific association between disease-related variation types and Pfam domains for clustering diseases according to the Human Disease Ontology, and we establish a relation among variation types, Pfam domains, and disease classes. We find that Pfam models are specific markers of patterns of variation types and that they can serve to bridge genes, diseases, and disease classes. Data are available as Supplementary Material for 1,670 Pfam models, including 22,763 disease-related variations associated to 3,257 OMIM diseases.

Collapse

Qiu J, Nechaev D, Rost B. Protein-protein and protein-nucleic acid binding residues important for common and rare sequence variants in human. BMC Bioinformatics 2020;21:452. [PMID: 33050876 PMCID: PMC7557062 DOI: 10.1186/s12859-020-03759-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/16/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Any two unrelated people differ by about 20,000 missense mutations (also referred to as SAVs: Single Amino acid Variants or missense SNV). Many SAVs have been predicted to strongly affect molecular protein function. Common SAVs (> 5% of population) were predicted to have, on average, more effect on molecular protein function than rare SAVs (< 1% of population). We hypothesized that the prevalence of effect in common over rare SAVs might partially be caused by common SAVs more often occurring at interfaces of proteins with other proteins, DNA, or RNA, thereby creating subgroup-specific phenotypes. We analyzed SAVs from 60,706 people through the lens of two prediction methods, one (SNAP2) predicting the effects of SAVs on molecular protein function, the other (ProNA2020) predicting residues in DNA-, RNA- and protein-binding interfaces.

RESULTS

Three results stood out. Firstly, SAVs predicted to occur at binding interfaces were predicted to more likely affect molecular function than those predicted as not binding (p value < 2.2 × 10^-16). Secondly, for SAVs predicted to occur at binding interfaces, common SAVs were predicted more strongly with effect on protein function than rare SAVs (p value < 2.2 × 10^-16). Restriction to SAVs with experimental annotations confirmed all results, although the resulting subsets were too small to establish statistical significance for any result. Thirdly, the fraction of SAVs predicted at binding interfaces differed significantly between tissues, e.g. urinary bladder tissue was found abundant in SAVs predicted at protein-binding interfaces, and reproductive tissues (ovary, testis, vagina, seminal vesicle and endometrium) in SAVs predicted at DNA-binding interfaces.

CONCLUSIONS

Overall, the results suggested that residues at protein-, DNA-, and RNA-binding interfaces contributed toward predicting that common SAVs more likely affect molecular function than rare SAVs.

Collapse

Capriotti E, Montanucci L, Profiti G, Rossi I, Giannuzzi D, Aresu L, Fariselli P. Fido-SNP: the first webserver for scoring the impact of single nucleotide variants in the dog genome. Nucleic Acids Res 2020;47:W136-W141. [PMID: 31114899 PMCID: PMC6602425 DOI: 10.1093/nar/gkz420] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 04/19/2019] [Accepted: 05/06/2019] [Indexed: 12/22/2022] Open

Reeb J, Wirth T, Rost B. Variant effect predictions capture some aspects of deep mutational scanning experiments. BMC Bioinformatics 2020;21:107. [PMID: 32183714 PMCID: PMC7077003 DOI: 10.1186/s12859-020-3439-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/03/2020] [Indexed: 12/12/2022] Open

Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020;432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]

Affiliation(s)

Jiajun Qiu Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
Michael Bernhofer Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
Michael Heinzinger Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
Sofie Kemper Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
Tomas Norambuena Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
Francisco Melo Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
Burkhard Rost Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany

Collapse

Cao Y, Sun Y, Karimi M, Chen H, Moronfoye O, Shen Y. Predicting pathogenicity of missense variants with weakly supervised regression. Hum Mutat 2019;40:1579-1592. [PMID: 31144781 PMCID: PMC6744350 DOI: 10.1002/humu.23826] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 05/23/2019] [Accepted: 05/27/2019] [Indexed: 12/27/2022]

Priyadarshini P, Mishra C, Sabat SS, Mandal M, Jyotiranjan T, Swain L, Sahoo M. Computational analysis of non-synonymous SNPs in bovine Mx1 gene. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Pejaver V, Mooney SD, Radivojac P. Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges. Hum Mutat 2017;38:1092-1108. [PMID: 28508593 PMCID: PMC5561458 DOI: 10.1002/humu.23258] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 03/16/2017] [Accepted: 03/26/2017] [Indexed: 11/08/2022]

Common sequence variants affect molecular function more than rare variants? Sci Rep 2017;7:1608. [PMID: 28487536 PMCID: PMC5431670 DOI: 10.1038/s41598-017-01054-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 02/28/2017] [Indexed: 12/29/2022] Open

Mishra C, Kumar S, Yathish H. Predicting the effect of non synonymous SNPs in bovine TLR4 gene. GENE REPORTS 2017. [DOI: 10.1016/j.genrep.2016.11.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016;590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]