Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Roscoe BP, Bolon DN. Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. J Mol Biol 2014;426:2854-70. [PMID: 24862281 DOI: 10.1016/j.jmb.2014.05.019] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 05/13/2014] [Accepted: 05/18/2014] [Indexed: 01/26/2023]

For:	Roscoe BP, Bolon DN. Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. J Mol Biol 2014;426:2854-70. [PMID: 24862281 DOI: 10.1016/j.jmb.2014.05.019] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 05/13/2014] [Accepted: 05/18/2014] [Indexed: 01/26/2023]

Number

Cited by Other Article(s)

Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024;34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open

Abstract

Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.

Collapse

Affiliation(s)

Peng Cheng Bioinformatics Center of AMMS, Beijing, China
Cong Mao State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Jin Tang Zhejiang Lab, Hangzhou, Zhejiang, China
Sen Yang Bioinformatics Center of AMMS, Beijing, China
Yu Cheng State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Wuke Wang Zhejiang Lab, Hangzhou, Zhejiang, China
Qiuxi Gu State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Wei Han Zhejiang Lab, Hangzhou, Zhejiang, China
Hao Chen State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Sihan Li State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
Yaofeng Chen Bioinformatics Center of AMMS, Beijing, China
Jianglin Zhou Bioinformatics Center of AMMS, Beijing, China
Wuju Li Bioinformatics Center of AMMS, Beijing, China
Aimin Pan Zhejiang Lab, Hangzhou, Zhejiang, China
Suwen Zhao iHuman Institute, ShanghaiTech University, Shanghai, China School of Life Science and Technology, ShanghaiTech University, Shanghai, China
Xingxu Huang Zhejiang Lab, Hangzhou, Zhejiang, China School of Life Science and Technology, ShanghaiTech University, Shanghai, China
Shiqiang Zhu Zhejiang Lab, Hangzhou, Zhejiang, China.
Jun Zhang State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
Wenjie Shu Bioinformatics Center of AMMS, Beijing, China.
Shengqi Wang Bioinformatics Center of AMMS, Beijing, China.

Collapse

Illig AM, Siedhoff NE, Davari MD, Schwaneberg U. Evolutionary Probability and Stacked Regressions Enable Data-Driven Protein Engineering with Minimized Experimental Effort. J Chem Inf Model 2024;64:6350-6360. [PMID: 39088689 DOI: 10.1021/acs.jcim.4c00704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2024]

Shanker VR, Bruun TUJ, Hie BL, Kim PS. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 2024;385:46-53. [PMID: 38963838 DOI: 10.1126/science.adk8946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 05/29/2024] [Indexed: 07/06/2024]

Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024;121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open

Shanker VR, Bruun TU, Hie BL, Kim PS. Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572475. [PMID: 38187780 PMCID: PMC10769282 DOI: 10.1101/2023.12.19.572475] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]

Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Qu Y, Niu Z, Ding Q, Zhao T, Kong T, Bai B, Ma J, Zhao Y, Zheng J. Ensemble Learning with Supervised Methods Based on Large-Scale Protein Language Models for Protein Mutation Effects Prediction. Int J Mol Sci 2023;24:16496. [PMID: 38003686 PMCID: PMC10671426 DOI: 10.3390/ijms242216496] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 11/11/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open

Abstract

Machine learning has been increasingly utilized in the field of protein engineering, and research directed at predicting the effects of protein mutations has attracted increasing attention. Among them, so far, the best results have been achieved by related methods based on protein language models, which are trained on a large number of unlabeled protein sequences to capture the generally hidden evolutionary rules in protein sequences, and are therefore able to predict their fitness from protein sequences. Although numerous similar models and methods have been successfully employed in practical protein engineering processes, the majority of the studies have been limited to how to construct more complex language models to capture richer protein sequence feature information and utilize this feature information for unsupervised protein fitness prediction. There remains considerable untapped potential in these developed models, such as whether the prediction performance can be further improved by integrating different models to further improve the accuracy of prediction. Furthermore, how to utilize large-scale models for prediction methods of mutational effects on quantifiable properties of proteins due to the nonlinear relationship between protein fitness and the quantification of specific functionalities has yet to be explored thoroughly. In this study, we propose an ensemble learning approach for predicting mutational effects of proteins integrating protein sequence features extracted from multiple large protein language models, as well as evolutionarily coupled features extracted in homologous sequences, while comparing the differences between linear regression and deep learning models in mapping these features to quantifiable functional changes. We tested our approach on a dataset of 17 protein deep mutation scans and indicated that the integrated approach together with linear regression enables the models to have higher prediction accuracy and generalization. Moreover, we further illustrated the reliability of the integrated approach by exploring the differences in the predictive performance of the models across species and protein sequence lengths, as well as by visualizing clustering of ensemble and non-ensemble features.

Collapse

Affiliation(s)

Yang Qu Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Zitong Niu Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Qiaojiao Ding Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Taowa Zhao Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Tong Kong Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Bing Bai Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Jianwei Ma Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Yitian Zhao Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)
Jianping Zheng Cixi Biomedical Research Institute, Wenzhou Medical University, Ningbo 315300, China; (Y.Q.); (Z.N.); (Q.D.); (T.Z.) Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315300, China; (T.K.); (B.B.); (J.M.)

Collapse

Derbel H, Zhao Z, Liu Q. Accurate prediction of functional effect of single amino acid variants with deep learning. Comput Struct Biotechnol J 2023;21:5776-5784. [PMID: 38074467 PMCID: PMC10709104 DOI: 10.1016/j.csbj.2023.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 02/12/2024] Open

Padhy AA, Mavor D, Sahoo S, Bolon DNA, Mishra P. Systematic profiling of dominant ubiquitin variants reveals key functional nodes contributing to evolutionary selection. Cell Rep 2023;42:113064. [PMID: 37656625 DOI: 10.1016/j.celrep.2023.113064] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/30/2023] [Accepted: 08/21/2023] [Indexed: 09/03/2023] Open

Flynn J, Samant N, Schneider-Nachum G, Tenzin T, Bolon DNA. Mutational fitness landscape and drug resistance. Curr Opin Struct Biol 2023;78:102525. [PMID: 36621152 PMCID: PMC10243218 DOI: 10.1016/j.sbi.2022.102525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 01/08/2023]

Fu Y, Bedő J, Papenfuss AT, Rubin AF. Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants. Gigascience 2022;12:giad073. [PMID: 37721410 PMCID: PMC10506130 DOI: 10.1093/gigascience/giad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/02/2023] [Accepted: 08/23/2023] [Indexed: 09/19/2023] Open

Schneider-Nachum G, Flynn J, Mavor D, Schiffer CA, Bolon DNA. Analyses of HIV proteases variants at the threshold of viability reveals relationships between processing efficiency and fitness. Virus Evol 2021;7:veab103. [PMID: 35299788 PMCID: PMC8923237 DOI: 10.1093/ve/veab103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/17/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022] Open

Sesta L, Uguzzoni G, Fernandez-de-Cossio-Diaz J, Pagnani A. AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape. Int J Mol Sci 2021;22:10908. [PMID: 34681569 PMCID: PMC8535593 DOI: 10.3390/ijms222010908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 01/12/2023] Open

Dunham AS, Beltrao P. Exploring amino acid functions in a deep mutational landscape. Mol Syst Biol 2021;17:e10305. [PMID: 34292650 PMCID: PMC8297461 DOI: 10.15252/msb.202110305] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 06/29/2021] [Accepted: 06/30/2021] [Indexed: 12/21/2022] Open

Yamaguchi H, Saito Y. Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins. Brief Bioinform 2021;22:6309928. [PMID: 34180966 DOI: 10.1093/bib/bbab234] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/28/2021] [Accepted: 05/30/2021] [Indexed: 12/14/2022] Open

Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A. Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol 2021;38:318-328. [PMID: 32770229 PMCID: PMC7783173 DOI: 10.1093/molbev/msaa204] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences. BIOPHYSICA 2021. [DOI: 10.3390/biophysica1020008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Despite the long history of using protein sequences to infer the tree of life, the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure. We test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments: buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies, whereas solvent-exposed and helix sites had unequal numbers of sites, supporting the minority topologies. This suggests that the relatively slowly evolving buried, sheet, and coil sites are giving an accurate picture of the true species tree and the amount of conflict among gene trees. Taken as a whole, this study indicates that phylogenetic analyses using sites in different structural environments can yield different topologies for the deepest branches in the animal tree of life and that analyzing larger numbers of taxa eliminates this conflict. More broadly, our results highlight the desirability of incorporating information about protein structure into phylogenomic analyses. Collapse

Nedrud D, Coyote-Maestas W, Schmidt D. A large-scale survey of pairwise epistasis reveals a mechanism for evolutionary expansion and specialization of PDZ domains. Proteins 2021;89:899-914. [PMID: 33620761 DOI: 10.1002/prot.26067] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 02/02/2021] [Accepted: 02/18/2021] [Indexed: 12/21/2022]

Balance between promiscuity and specificity in phage λ host range. ISME JOURNAL 2021;15:2195-2205. [PMID: 33589767 DOI: 10.1038/s41396-021-00912-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 01/21/2023]

Strokach A, Lu TY, Kim PM. ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations. J Mol Biol 2021;433:166810. [PMID: 33450251 DOI: 10.1016/j.jmb.2021.166810] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/19/2020] [Accepted: 01/03/2021] [Indexed: 12/21/2022]

Munro D, Singh M. DeMaSk: a deep mutational scanning substitution matrix and its use for variant impact prediction. Bioinformatics 2020;36:5322-5329. [PMID: 33325500 PMCID: PMC8016454 DOI: 10.1093/bioinformatics/btaa1030] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/16/2020] [Accepted: 11/30/2020] [Indexed: 01/27/2023] Open

Pandey A, Braun EL. Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root. BIOLOGY 2020;9:E64. [PMID: 32231097 PMCID: PMC7235752 DOI: 10.3390/biology9040064] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 03/09/2020] [Accepted: 03/20/2020] [Indexed: 12/23/2022]

Abstract

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Collapse

Reeb J, Wirth T, Rost B. Variant effect predictions capture some aspects of deep mutational scanning experiments. BMC Bioinformatics 2020;21:107. [PMID: 32183714 PMCID: PMC7077003 DOI: 10.1186/s12859-020-3439-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/03/2020] [Indexed: 12/12/2022] Open

Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019;20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open

Affiliation(s)

Daniel Esposito Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
Jochen Weile The Donnelly Centre, University of Toronto, Toronto, ON, Canada Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada Department of Computer Science, University of Toronto, Toronto, ON, Canada
Jay Shendure Department of Genome Sciences, University of Washington, Seattle, WA, USA Brotman Baty Institute for Precision Medicine, Seattle, WA, USA Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
Lea M Starita Department of Genome Sciences, University of Washington, Seattle, WA, USA Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
Anthony T Papenfuss Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
Frederick P Roth The Donnelly Centre, University of Toronto, Toronto, ON, Canada. Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada. Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada. Department of Computer Science, University of Toronto, Toronto, ON, Canada. Canadian Institute for Advanced Research, Toronto, ON, Canada.
Douglas M Fowler Department of Genome Sciences, University of Washington, Seattle, WA, USA. Canadian Institute for Advanced Research, Toronto, ON, Canada. Department of Bioengineering, University of Washington, Seattle, WA, USA.
Alan F Rubin Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia. Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia. Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.

Collapse

Laine E, Karami Y, Carbone A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol Biol Evol 2019;36:2604-2619. [PMID: 31406981 PMCID: PMC6805226 DOI: 10.1093/molbev/msz179] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 06/03/2019] [Accepted: 08/02/2019] [Indexed: 12/15/2022] Open

Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS. Inferring protein 3D structure from deep mutation scans. Nat Genet 2019;51:1170-1176. [PMID: 31209393 PMCID: PMC7295002 DOI: 10.1038/s41588-019-0432-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 11/09/2022]

Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods 2018;15:816-822. [PMID: 30250057 DOI: 10.1038/s41592-018-0138-4] [Citation(s) in RCA: 268] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 07/29/2018] [Indexed: 01/05/2023]

Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum Genet 2018;137:665-678. [PMID: 30073413 PMCID: PMC6153521 DOI: 10.1007/s00439-018-1916-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 07/21/2018] [Indexed: 12/12/2022]

Gladkova C, Schubert AF, Wagstaff JL, Pruneda JN, Freund SM, Komander D. An invisible ubiquitin conformation is required for efficient phosphorylation by PINK1. EMBO J 2017;36:3555-3572. [PMID: 29133469 PMCID: PMC5730886 DOI: 10.15252/embj.201797876] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/16/2017] [Accepted: 10/18/2017] [Indexed: 11/09/2022] Open

Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions. Genetics 2017;207:53-61. [PMID: 28751422 PMCID: PMC5586385 DOI: 10.1534/genetics.117.300064] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 07/24/2017] [Indexed: 11/18/2022] Open

Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CP, Springer M, Sander C, Marks DS. Mutation effects predicted from sequence co-variation. Nat Biotechnol 2017;35:128-135. [PMID: 28092658 PMCID: PMC5383098 DOI: 10.1038/nbt.3769] [Citation(s) in RCA: 384] [Impact Index Per Article: 54.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 12/09/2016] [Indexed: 01/09/2023]

A Statistical Guide to the Design of Deep Mutational Scanning Experiments. Genetics 2016;204:77-87. [PMID: 27412710 DOI: 10.1534/genetics.116.190462] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 06/29/2016] [Indexed: 12/21/2022] Open

Mavor D, Barlow K, Thompson S, Barad BA, Bonny AR, Cario CL, Gaskins G, Liu Z, Deming L, Axen SD, Caceres E, Chen W, Cuesta A, Gate RE, Green EM, Hulce KR, Ji W, Kenner LR, Mensa B, Morinishi LS, Moss SM, Mravic M, Muir RK, Niekamp S, Nnadi CI, Palovcak E, Poss EM, Ross TD, Salcedo EC, See SK, Subramaniam M, Wong AW, Li J, Thorn KS, Conchúir SÓ, Roscoe BP, Chow ED, DeRisi JL, Kortemme T, Bolon DN, Fraser JS. Determination of ubiquitin fitness landscapes under different chemical stresses in a classroom setting. eLife 2016;5. [PMID: 27111525 PMCID: PMC4862753 DOI: 10.7554/elife.15802] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 04/06/2016] [Indexed: 12/31/2022] Open

Abstract

Ubiquitin is essential for eukaryotic life and varies in only 3 amino acid positions between yeast and humans. However, recent deep sequencing studies indicate that ubiquitin is highly tolerant to single mutations. We hypothesized that this tolerance would be reduced by chemically induced physiologic perturbations. To test this hypothesis, a class of first year UCSF graduate students employed deep mutational scanning to determine the fitness landscape of all possible single residue mutations in the presence of five different small molecule perturbations. These perturbations uncover 'shared sensitized positions' localized to areas around the hydrophobic patch and the C-terminus. In addition, we identified perturbation specific effects such as a sensitization of His68 in HU and a tolerance to mutation at Lys63 in DTT. Our data show how chemical stresses can reduce buffering effects in the ubiquitin proteasome system. Finally, this study demonstrates the potential of lab-based interdisciplinary graduate curriculum.

DOI:http://dx.doi.org/10.7554/eLife.15802.001

The ability of an organism to grow and reproduce, that is, it’s “fitness”, is determined by how its genes interact with the environment. Yeast is a model organism in which researchers can control the exact mutations present in the yeast’s genes (its genotype) and the conditions in which the yeast cells live (their environment). This allows researchers to measure how a yeast cell’s genotype and environment affect its fitness.

Ubiquitin is a protein that many organisms depend on to manage cell stress by acting as a tag that targets other proteins for degradation. Essential proteins such as ubiquitin often remain unchanged by mutation over long periods of time. As a result, these proteins evolve very slowly. Like all proteins, ubiquitin is built from a chain of amino acid molecules linked together, and the ubiquitin proteins of yeast and humans are made of almost identical sequences of amino acids.

Although ubiquitin has barely changed its sequence over evolution, previous studies have shown that – under normal growth conditions in the laboratory – most amino acids in ubiquitin can be mutated without any loss of cell fitness. This led Mavor et al. to hypothesize that treating the yeast cells with chemicals that cause cell stress might lead to amino acids in ubiquitin becoming more sensitive to mutation.

To test this idea, a class of graduate students at the University of California, San Francisco grew yeast cells with different ubiquitin mutations together, and with different chemicals that induce cell stress, and measured their growth rates. Sequencing the ubiquitin gene in the thousands of tested yeast cells revealed that three of the chemicals cause a shared set of amino acids in ubiquitin to become more sensitive to mutation.

This result suggests that these amino acids are important for the stress response, possibly by altering the ability of yeast cells to target certain proteins for degradation. Conversely, another chemical causes yeast to become more tolerant to changes in the ubiquitin sequence. The experiments also link changes in particular amino acids in ubiquitin to specific stress responses.

Mavor et al. show that many of ubquitin’s amino acids are sensitive to mutation under different stress conditions, while others can be mutated to form different amino acids without effecting fitness. By testing the effects of other chemicals, future experiments could further characterize how the yeast’s genotype and environment interact.

DOI:http://dx.doi.org/10.7554/eLife.15802.002

Collapse

Affiliation(s)

David Mavor Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Kyle Barlow Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Samuel Thompson Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Benjamin A Barad Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Alain R Bonny Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Clinton L Cario Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Garrett Gaskins Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Zairan Liu Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Laura Deming Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States
Seth D Axen Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Elena Caceres Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Weilin Chen Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Adolfo Cuesta Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Rachel E Gate Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Evan M Green Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Kaitlin R Hulce Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Weiyue Ji Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Lillian R Kenner Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Bruk Mensa Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Leanna S Morinishi Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Steven M Moss Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Marco Mravic Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Ryan K Muir Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Stefan Niekamp Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Chimno I Nnadi Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Eugene Palovcak Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Erin M Poss Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Tyler D Ross Biophysics Graduate Group, University of California, San Francisco, San Francisco, United States
Eugenia C Salcedo Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Stephanie K See Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Meena Subramaniam Bioinformatics Graduate Group, University of California, San Francisco, San Francisco, United States
Allison W Wong Chemistry and Chemical Biology Graduate Program, University of California, San Francisco, San Francisco, United States
Jennifer Li UCSF Science and Health Education Partnership, University of California, San Francisco, San Francisco, United States
Kurt S Thorn Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, United States
Shane Ó Conchúir Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biology, University of California, San Francisco, San Francisco, United States
Benjamin P Roscoe Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, United States
Eric D Chow Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, United States.,Center for Advanced Technology, University of California, San Francisco, San Francisco, United States
Joseph L DeRisi Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, United States.,Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, United States
Tanja Kortemme Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biology, University of California, San Francisco, San Francisco, United States
Daniel N Bolon Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, United States
James S Fraser Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biology, University of California, San Francisco, San Francisco, United States

Collapse

Kurt Yilmaz N, Swanstrom R, Schiffer CA. Improving Viral Protease Inhibitors to Counter Drug Resistance. Trends Microbiol 2016;24:547-557. [PMID: 27090931 DOI: 10.1016/j.tim.2016.03.010] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 03/18/2016] [Accepted: 03/30/2016] [Indexed: 12/13/2022]

A Balance between Inhibitor Binding and Substrate Processing Confers Influenza Drug Resistance. J Mol Biol 2015;428:538-553. [PMID: 26656922 DOI: 10.1016/j.jmb.2015.11.027] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 11/23/2015] [Accepted: 11/24/2015] [Indexed: 11/22/2022]

Viewing protein fitness landscapes through a next-gen lens. Genetics 2015;198:461-71. [PMID: 25316787 DOI: 10.1534/genetics.114.168351] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Gavrilov Y, Hagai T, Levy Y. Nonspecific yet decisive: Ubiquitination can affect the native-state dynamics of the modified protein. Protein Sci 2015;24:1580-92. [PMID: 25970168 DOI: 10.1002/pro.2688] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 04/05/2015] [Indexed: 11/10/2022]

Abstract

Ubiquitination is one of the most common post-translational modifications of proteins, and mediates regulated protein degradation among other cellular processes. A fundamental question regarding the mechanism of protein ubiquitination is whether and how ubiquitin affects the biophysical nature of the modified protein. For some systems, it was shown that the position of ubiquitin within the attachment site is quite flexible and ubiquitin does not specifically interact with its substrate. Nevertheless, it was revealed that polyubiquitination can decrease the thermal stability of the modified protein in a site-specific manner because of alterations of the thermodynamic properties of the folded and unfolded states. In this study, we used detailed atomistic simulations to focus on the molecular effects of ubiquitination on the native structure of the modified protein. As a model, we used Ubc7, which is an E2 enzyme whose in vivo ubiquitination process is well characterized and known to lead to degradation. We found that, despite the lack of specific direct interactions between the ubiquitin moiety and Ubc7, ubiquitination decreases the conformational flexibility of certain regions of the substrate Ubc7 protein, which reduces its entropy and thus destabilizes it. The strongest destabilizing effect was observed for systems in which Lys48-linked tetra-ubiquitin was attached to sites used for in vivo degradation. These results reveal how changes in the configurational entropy of the folded state may modulate the stability of the protein's native state. Overall, our results imply that ubiquitination can modify the biophysical properties of the attached protein in the folded state and that, in some proteins, different ubiquitination sites will lead to different biophysical outcomes. We propose that this destabilizing effect of polyubiquitin on the substrate is linked to the functions carried out by the modification, and in particular, regulatory control of protein half-life through proteasomal degradation.

Collapse