1
|
Guclu TF, Atilgan AR, Atilgan C. Deciphering GB1's Single Mutational Landscape: Insights from MuMi Analysis. J Phys Chem B 2024; 128:7987-7996. [PMID: 39115184 DOI: 10.1021/acs.jpcb.4c04916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2024]
Abstract
Mutational changes that affect the binding of the C2 fragment of Streptococcal protein G (GB1) to the Fc domain of human IgG (IgG-Fc) have been extensively studied using deep mutational scanning (DMS), and the binding affinity of all single mutations has been measured experimentally in the literature. To investigate the underlying molecular basis, we perform in silico mutational scanning for all possible single mutations, along with 2 μs-long molecular dynamics (WT-MD) of the wild-type (WT) GB1 in both unbound and IgG-Fc bound forms. We compute the hydrogen bonds between GB1 and IgG-Fc in WT-MD to identify the dominant hydrogen bonds for binding, which we then assess in conformations produced by Mutation and Minimization (MuMi) to explain the fitness landscape of GB1 and IgG-Fc binding. Furthermore, we analyze MuMi and WT-MD to investigate the dynamics of binding, focusing on the relative solvent accessibility of residues and the probability of residues being located at the binding interface. With these analyses, we explain the interactions between GB1 and IgG-Fc and display the structural features of binding. In sum, our findings highlight the potential of MuMi as a reliable and computationally efficient tool for predicting protein fitness landscapes, offering significant advantages over traditional methods. The methodologies and results presented in this study pave the way for improved predictive accuracy in protein stability and interaction studies, which are crucial for advancements in drug design and synthetic biology.
Collapse
Affiliation(s)
- Tandac F Guclu
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Ali Rana Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| | - Canan Atilgan
- Faculty of Natural Sciences and Engineering, Sabanci University, Tuzla, Istanbul 34956, Turkey
| |
Collapse
|
2
|
Arutyunyan A, Seuma M, Faure AJ, Bolognesi B, Lehner B. Energetic portrait of the amyloid beta nucleation transition state. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.24.604935. [PMID: 39091732 PMCID: PMC11291115 DOI: 10.1101/2024.07.24.604935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Amyloid protein aggregates are pathological hallmarks of more than fifty human diseases including the most common neurodegenerative disorders. The atomic structures of amyloid fibrils have now been determined, but the process by which soluble proteins nucleate to form amyloids remains poorly characterised and difficult to study, even though this is the key step to understand to prevent the formation and spread of aggregates. Here we use massively parallel combinatorial mutagenesis, a kinetic selection assay, and machine learning to reveal the transition state of the nucleation reaction of amyloid beta, the protein that aggregates in Alzheimer's disease. By quantifying the nucleation of >140,000 proteins we infer the changes in activation energy for all 798 amino acid substitutions in amyloid beta and the energetic couplings between >600 pairs of mutations. This unprecedented dataset provides the first comprehensive view of the energy landscape and the first large-scale measurement of energetic couplings for a protein transition state. The energy landscape reveals that the amyloid beta nucleation transition state contains a short structured C-terminal hydrophobic core with a subset of interactions similar to mature fibrils. This study demonstrates the feasibility of using mutation-selection-sequencing experiments to study transition states and identifies the key molecular species that initiates amyloid beta aggregation and, potentially, Alzheimer's disease.
Collapse
|
3
|
Chitra U, Arnold BJ, Raphael BJ. Quantifying higher-order epistasis: beware the chimera. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.17.603976. [PMID: 39071303 PMCID: PMC11275791 DOI: 10.1101/2024.07.17.603976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Epistasis, or interactions in which alleles at one locus modify the fitness effects of alleles at other loci, plays a fundamental role in genetics, protein evolution, and many other areas of biology. Epistasis is typically quantified by computing the deviation from the expected fitness under an additive or multiplicative model using one of several formulae. However, these formulae are not all equivalent. Importantly, one widely used formula - which we call the chimeric formula - measures deviations from a multiplicative fitness model on an additive scale, thus mixing two measurement scales. We show that for pairwise interactions, the chimeric formula yields a different magnitude, but the same sign (synergistic vs. antagonistic) of epistasis compared to the multiplicative formula that measures both fitness and deviations on a multiplicative scale. However, for higher-order interactions, we show that the chimeric formula can have both different magnitude and sign compared to the multiplicative formula - thus confusing negative epistatic interactions with positive interactions, and vice versa. We resolve these inconsistencies by deriving fundamental connections between the different epistasis formulae and the parameters of the multivariate Bernoulli distribution . Our results demonstrate that the additive and multiplicative epistasis formulae are more mathematically sound than the chimeric formula. Moreover, we demonstrate that the mathematical issues with the chimeric epistasis formula lead to markedly different biological interpretations of real data. Analyzing multi-gene knockout data in yeast, multi-way drug interactions in E. coli , and deep mutational scanning (DMS) of several proteins, we find that 10 - 60% of higher-order interactions have a change in sign with the multiplicative or additive epistasis formula. These sign changes result in qualitatively different findings on functional divergence in the yeast genome, synergistic vs. antagonistic drug interactions, and and epistasis between protein mutations. In particular, in the yeast data, the more appropriate multiplicative formula identifies nearly 500 additional negative three-way interactions, thus extending the trigenic interaction network by 25%.
Collapse
|
4
|
Mermet-Meillon F, Mercan S, Bauer-Probst B, Allard C, Bleu M, Calkins K, Knehr J, Altorfer M, Naumann U, Sprouffske K, Barys L, Sesterhenn F, Galli GG. Protein destabilization underlies pathogenic missense mutations in ARID1B. Nat Struct Mol Biol 2024; 31:1018-1022. [PMID: 38347147 PMCID: PMC11257965 DOI: 10.1038/s41594-024-01229-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 01/18/2024] [Indexed: 07/20/2024]
Abstract
ARID1B is a SWI/SNF subunit frequently mutated in human Coffin-Siris syndrome (CSS) and it is necessary for proliferation of ARID1A mutant cancers. While most CSS ARID1B aberrations introduce frameshifts or stop codons, the functional consequence of missense mutations found in ARID1B is unclear. We here perform saturated mutagenesis screens on ARID1B and demonstrate that protein destabilization is the main mechanism associated with pathogenic missense mutations in patients with Coffin-Siris Syndrome.
Collapse
Affiliation(s)
| | - Samuele Mercan
- Disease Area Oncology, Novartis Biomedical Research, Basel, Switzerland
| | | | - Cyril Allard
- Disease Area Immunology, Novartis Biomedical Research, Basel, Switzerland
| | - Melusine Bleu
- Disease Area Oncology, Novartis Biomedical Research, Basel, Switzerland
| | - Keith Calkins
- Disease Area Oncology, Novartis Biomedical Research, Basel, Switzerland
| | - Judith Knehr
- Discovery Sciences, Novartis Biomedical Research, Basel, Switzerland
| | - Marc Altorfer
- Discovery Sciences, Novartis Biomedical Research, Basel, Switzerland
| | - Ulrike Naumann
- Discovery Sciences, Novartis Biomedical Research, Basel, Switzerland
| | | | - Louise Barys
- Disease Area Oncology, Novartis Biomedical Research, Basel, Switzerland
| | - Fabian Sesterhenn
- Discovery Sciences, Novartis Biomedical Research, Basel, Switzerland.
| | - Giorgio G Galli
- Disease Area Oncology, Novartis Biomedical Research, Basel, Switzerland.
| |
Collapse
|
5
|
Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024; 121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open
Abstract
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Lorenzo Posani
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| |
Collapse
|
6
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
7
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
8
|
Ding D, Shaw AY, Sinai S, Rollins N, Prywes N, Savage DF, Laub MT, Marks DS. Protein design using structure-based residue preferences. Nat Commun 2024; 15:1639. [PMID: 38388493 PMCID: PMC10884402 DOI: 10.1038/s41467-024-45621-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/29/2024] [Indexed: 02/24/2024] Open
Abstract
Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.
Collapse
Affiliation(s)
- David Ding
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
| | - Ada Y Shaw
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sam Sinai
- Dyno Therapeutics, Watertown, MA, 02472, USA
| | - Nathan Rollins
- Seismic Therapeutics, Lab Central, Cambridge, MA, 02142, USA
| | - Noam Prywes
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
9
|
Zheng L, Shi S, Sun X, Lu M, Liao Y, Zhu S, Zhang H, Pan Z, Fang P, Zeng Z, Li H, Li Z, Xue W, Zhu F. MoDAFold: a strategy for predicting the structure of missense mutant protein based on AlphaFold2 and molecular dynamics. Brief Bioinform 2024; 25:bbae006. [PMID: 38305456 PMCID: PMC10835750 DOI: 10.1093/bib/bbae006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 12/26/2023] [Accepted: 01/01/2024] [Indexed: 02/03/2024] Open
Abstract
Protein structure prediction is a longstanding issue crucial for identifying new drug targets and providing a mechanistic understanding of protein functions. To enhance the progress in this field, a spectrum of computational methodologies has been cultivated. AlphaFold2 has exhibited exceptional precision in predicting wild-type protein structures, with performance exceeding that of other methods. However, predicting the structures of missense mutant proteins using AlphaFold2 remains challenging due to the intricate and substantial structural alterations caused by minor sequence variations in the mutant proteins. Molecular dynamics (MD) has been validated for precisely capturing changes in amino acid interactions attributed to protein mutations. Therefore, for the first time, a strategy entitled 'MoDAFold' was proposed to improve the accuracy and reliability of missense mutant protein structure prediction by combining AlphaFold2 with MD. Multiple case studies have confirmed the superior performance of MoDAFold compared to other methods, particularly AlphaFold2.
Collapse
Affiliation(s)
- Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
| | - Yang Liao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Sisi Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Hongning Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Pan Fang
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhenyu Zeng
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhaorong Li
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou 330110, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
10
|
Nemoto T, Ocari T, Planul A, Tekinsoy M, Zin EA, Dalkara D, Ferrari U. ACIDES: on-line monitoring of forward genetic screens for protein engineering. Nat Commun 2023; 14:8504. [PMID: 38148337 PMCID: PMC10751290 DOI: 10.1038/s41467-023-43967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Forward genetic screens of mutated variants are a versatile strategy for protein engineering and investigation, which has been successfully applied to various studies like directed evolution (DE) and deep mutational scanning (DMS). While next-generation sequencing can track millions of variants during the screening rounds, the vast and noisy nature of the sequencing data impedes the estimation of the performance of individual variants. Here, we propose ACIDES that combines statistical inference and in-silico simulations to improve performance estimation in the library selection process by attributing accurate statistical scores to individual variants. We tested ACIDES first on a random-peptide-insertion experiment and then on multiple public datasets from DE and DMS studies. ACIDES allows experimentalists to reliably estimate variant performance on the fly and can aid protein engineering and research pipelines in a range of applications, including gene therapy.
Collapse
Affiliation(s)
- Takahiro Nemoto
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
- Graduate School of Informatics, Kyoto University, Yoshida Hon-machi, Sakyo-ku, Kyoto, 606-8501, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, 565-0871, Japan.
| | - Tommaso Ocari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Arthur Planul
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Muge Tekinsoy
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Emilia A Zin
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France
| | - Deniz Dalkara
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| | - Ulisse Ferrari
- Institut de la Vision, Sorbonne Université, INSERM, CNRS, 17 rue Moreau, 75012, Paris, France.
| |
Collapse
|
11
|
Beal MA, Meier MJ, Dykes A, Yauk CL, Lambert IB, Marchetti F. The functional mutational landscape of the lacZ gene. iScience 2023; 26:108407. [PMID: 38058303 PMCID: PMC10696112 DOI: 10.1016/j.isci.2023.108407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/23/2023] [Accepted: 11/03/2023] [Indexed: 12/08/2023] Open
Abstract
The lacZ gene of Escherichia coli encodes β-galactosidase (β-gal), a lactose metabolism enzyme of the lactose operon. Previous chemical modification or site-directed mutagenesis experiments have identified 21 amino acids that are essential for β-gal catalytic activity. We have assembled over 10,000 lacZ mutations from published studies that were collected using a positive selection assay to identify mutations in lacZ that disrupted β-gal function. We analyzed 6,465 independent lacZ mutations that resulted in 2,732 missense mutations that impaired β-gal function. Those mutations affected 492 of the 1,023 lacZ codons, including most of the 21 previously known residues critical for catalytic activity. Most missense mutations occurred near the catalytic site and in regions important for subunit tetramerization. Overall, our work provides a comprehensive and detailed map of the amino acid residues affecting the structure and catalytic activity of the β-gal enzyme.
Collapse
Affiliation(s)
- Marc A. Beal
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON K1A 0K9, Canada
| | - Matthew J. Meier
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON K1A 0K9, Canada
| | - Angela Dykes
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON K1A 0K9, Canada
- Department of Biology, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Carole L. Yauk
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON K1A 0K9, Canada
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Iain B. Lambert
- Department of Biology, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Francesco Marchetti
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON K1A 0K9, Canada
- Department of Biology, Carleton University, Ottawa, ON K1S 5B6, Canada
| |
Collapse
|
12
|
Curatolo AI, Kimchi O, Goodrich CP, Krueger RK, Brenner MP. A computational toolbox for the assembly yield of complex and heterogeneous structures. Nat Commun 2023; 14:8328. [PMID: 38097568 PMCID: PMC10721878 DOI: 10.1038/s41467-023-43168-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 11/02/2023] [Indexed: 12/17/2023] Open
Abstract
The self-assembly of complex structures from a set of non-identical building blocks is a hallmark of soft matter and biological systems, including protein complexes, colloidal clusters, and DNA-based assemblies. Predicting the dependence of the equilibrium assembly yield on the concentrations and interaction energies of building blocks is highly challenging, owing to the difficulty of computing the entropic contributions to the free energy of the many structures that compete with the ground state configuration. While these calculations yield well known results for spherically symmetric building blocks, they do not hold when the building blocks have internal rotational degrees of freedom. Here we present an approach for solving this problem that works with arbitrary building blocks, including proteins with known structure and complex colloidal building blocks. Our algorithm combines classical statistical mechanics with recently developed computational tools for automatic differentiation. Automatic differentiation allows efficient evaluation of equilibrium averages over configurations that would otherwise be intractable. We demonstrate the validity of our framework by comparison to molecular dynamics simulations of simple examples, and apply it to calculate the yield curves for known protein complexes and for the assembly of colloidal shells.
Collapse
Affiliation(s)
- Agnese I Curatolo
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Ofer Kimchi
- Lewis-Sigler Institute, Princeton University, Princeton, NJ, 08544, USA
| | - Carl P Goodrich
- Institute of Science and Technology Austria, A-3400, Klosterneuburg, Austria
| | - Ryan K Krueger
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
| | - Michael P Brenner
- School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA.
- Department of Physics, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
13
|
Hoffmann MD, Zdechlik AC, He Y, Nedrud D, Aslanidi G, Gordon W, Schmidt D. Multiparametric domain insertional profiling of adeno-associated virus VP1. Mol Ther Methods Clin Dev 2023; 31:101143. [PMID: 38027057 PMCID: PMC10661864 DOI: 10.1016/j.omtm.2023.101143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 10/21/2023] [Indexed: 12/01/2023]
Abstract
Several evolved properties of adeno-associated virus (AAV), such as broad tropism and immunogenicity in humans, are barriers to AAV-based gene therapy. Most efforts to re-engineer these properties have focused on variable regions near AAV's 3-fold protrusions and capsid protein termini. To comprehensively survey AAV capsids for engineerable hotspots, we determined multiple AAV fitness phenotypes upon insertion of six structured protein domains into the entire AAV-DJ capsid protein VP1. This is the largest and most comprehensive AAV domain insertion dataset to date. Our data revealed a surprising robustness of AAV capsids to accommodate large domain insertions. Insertion permissibility depended strongly on insertion position, domain type, and measured fitness phenotype, which clustered into contiguous structural units that we could link to distinct roles in AAV assembly, stability, and infectivity. We also identified engineerable hotspots of AAV that facilitate the covalent attachment of binding scaffolds, which may represent an alternative approach to re-direct AAV tropism.
Collapse
Affiliation(s)
- Mareike D. Hoffmann
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Alina C. Zdechlik
- Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Yungui He
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - David Nedrud
- Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | | | - Wendy Gordon
- Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
14
|
Hu X, Xu Y, Wang C, Liu Y, Zhang L, Zhang J, Wang W, Chen Q, Liu H. Combined prediction and design reveals the target recognition mechanism of an intrinsically disordered protein interaction domain. Proc Natl Acad Sci U S A 2023; 120:e2305603120. [PMID: 37722056 PMCID: PMC10523638 DOI: 10.1073/pnas.2305603120] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/14/2023] [Indexed: 09/20/2023] Open
Abstract
An increasing number of protein interaction domains and their targets are being found to be intrinsically disordered proteins (IDPs). The corresponding target recognition mechanisms are mostly elusive because of challenges in performing detailed structural analysis of highly dynamic IDP-IDP complexes. Here, we show that by combining recently developed computational approaches with experiments, the structure of the complex between the intrinsically disordered C-terminal domain (CTD) of protein 4.1G and its target IDP region in NuMA can be dissected at high resolution. First, we carry out systematic mutational scanning using dihydrofolate reductase-based protein complementarity analysis to identify essential interaction regions and key residues. The results are found to be highly consistent with an α/β-type complex structure predicted by AlphaFold2 (AF2). We then design mutants based on the predicted structure using a deep learning protein sequence design method. The solved crystal structure of one mutant presents the same core structure as predicted by AF2. Further computational prediction and experimental assessment indicate that the well-defined core structure is conserved across complexes of 4.1G CTD with other potential targets. Thus, we reveal that an intrinsically disordered protein interaction domain uses an α/β-type structure module formed through synergistic folding to recognize broad IDP targets. Moreover, we show that computational prediction and experiment can be jointly applied to segregate true IDP regions from the core structural domains of IDP-IDP complexes and to uncover the structure-dependent mechanisms of some otherwise elusive IDP-IDP interactions.
Collapse
Affiliation(s)
- Xiuhong Hu
- Department of Rheumatology and Immunology, Division of Life Sciences and Medicine, The First Affiliated Hospital, University of Science and Technology of China, Hefei, Anhui230001, China
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Yang Xu
- Department of Rheumatology and Immunology, Division of Life Sciences and Medicine, The First Affiliated Hospital, University of Science and Technology of China, Hefei, Anhui230001, China
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Chenchen Wang
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Yufeng Liu
- Department of Rheumatology and Immunology, Division of Life Sciences and Medicine, The First Affiliated Hospital, University of Science and Technology of China, Hefei, Anhui230001, China
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Lu Zhang
- Department of Rheumatology and Immunology, Division of Life Sciences and Medicine, The First Affiliated Hospital, University of Science and Technology of China, Hefei, Anhui230001, China
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Jiahai Zhang
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Wenning Wang
- Department of Chemistry, Institutes of Biomedical Sciences and Multiscale Research Institute of Complex Systems, Fudan University, Shanghai200438, China
| | - Quan Chen
- Department of Rheumatology and Immunology, Division of Life Sciences and Medicine, The First Affiliated Hospital, University of Science and Technology of China, Hefei, Anhui230001, China
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui230027, China
| | - Haiyan Liu
- Ministry of Education Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui230027, China
- School of Data Science, University of Science and Technology of China, Hefei, Anhui230027, China
| |
Collapse
|
15
|
McConnell A, Hackel BJ. Protein engineering via sequence-performance mapping. Cell Syst 2023; 14:656-666. [PMID: 37494931 PMCID: PMC10527434 DOI: 10.1016/j.cels.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 05/10/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023]
Abstract
Discovery and evolution of new and improved proteins has empowered molecular therapeutics, diagnostics, and industrial biotechnology. Discovery and evolution both require efficient screens and effective libraries, although they differ in their challenges because of the absence or presence, respectively, of an initial protein variant with the desired function. A host of high-throughput technologies-experimental and computational-enable efficient screens to identify performant protein variants. In partnership, an informed search of sequence space is needed to overcome the immensity, sparsity, and complexity of the sequence-performance landscape. Early in the historical trajectory of protein engineering, these elements aligned with distinct approaches to identify the most performant sequence: selection from large, randomized combinatorial libraries versus rational computational design. Substantial advances have now emerged from the synergy of these perspectives. Rational design of combinatorial libraries aids the experimental search of sequence space, and high-throughput, high-integrity experimental data inform computational design. At the core of the collaborative interface, efficient protein characterization (rather than mere selection of optimal variants) maps sequence-performance landscapes. Such quantitative maps elucidate the complex relationships between protein sequence and performance-e.g., binding, catalytic efficiency, biological activity, and developability-thereby advancing fundamental protein science and facilitating protein discovery and evolution.
Collapse
Affiliation(s)
- Adam McConnell
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA
| | - Benjamin J Hackel
- Department of Biomedical Engineering, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA; Department of Chemical Engineering and Materials Science, University of Minnesota - Twin Cities, 421 Washington Avenue SE, Minneapolis, MN 55455, USA.
| |
Collapse
|
16
|
Chen L, Zhang Z, Li Z, Li R, Huo R, Chen L, Wang D, Luo X, Chen K, Liao C, Zheng M. Learning protein fitness landscapes with deep mutational scanning data from multiple sources. Cell Syst 2023; 14:706-721.e5. [PMID: 37591206 DOI: 10.1016/j.cels.2023.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/30/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023]
Abstract
One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Lin Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zehong Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhenghao Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; Shanghai Institute for Advanced Immunochemical Studies, School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China
| | - Rui Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Ruifeng Huo
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Lifan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | | | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China
| | - Cangsong Liao
- University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China.
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Pharmacy, China Pharmaceutical University, Nanjing 211198, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
| |
Collapse
|
17
|
Fowler DM, Adams DJ, Gloyn AL, Hahn WC, Marks DS, Muffley LA, Neal JT, Roth FP, Rubin AF, Starita LM, Hurles ME. An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biol 2023; 24:147. [PMID: 37394429 PMCID: PMC10316620 DOI: 10.1186/s13059-023-02986-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 06/13/2023] [Indexed: 07/04/2023] Open
Abstract
Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.
Collapse
Affiliation(s)
- Douglas M. Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | | - Anna L. Gloyn
- Department of Pediatrics & Department of Genetics, Division of Endocrinology, Stanford School of Medicine, Stanford University, Stanford, CA USA
| | - William C. Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA USA
- Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Debora S. Marks
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Department of Systems Biology, Harvard Medical School, Cambridge, USA
| | - Lara A. Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - James T. Neal
- Broad Institute of MIT and Harvard, Cambridge, MA USA
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease at Broad Institute, Cambridge, MA USA
| | - Frederick P. Roth
- Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON Canada
| | - Alan F. Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Lea M. Starita
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Department of Bioengineering, University of Washington, Seattle, WA USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA USA
| | | |
Collapse
|
18
|
Yee SW, Macdonald C, Mitrovic D, Zhou X, Koleske ML, Yang J, Silva DB, Grimes PR, Trinidad D, More SS, Kachuri L, Witte JS, Delemotte L, Giacomini KM, Coyote-Maestas W. The full spectrum of OCT1 (SLC22A1) mutations bridges transporter biophysics to drug pharmacogenomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543963. [PMID: 37333090 PMCID: PMC10274788 DOI: 10.1101/2023.06.06.543963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Membrane transporters play a fundamental role in the tissue distribution of endogenous compounds and xenobiotics and are major determinants of efficacy and side effects profiles. Polymorphisms within these drug transporters result in inter-individual variation in drug response, with some patients not responding to the recommended dosage of drug whereas others experience catastrophic side effects. For example, variants within the major hepatic Human organic cation transporter OCT1 (SLC22A1) can change endogenous organic cations and many prescription drug levels. To understand how variants mechanistically impact drug uptake, we systematically study how all known and possible single missense and single amino acid deletion variants impact expression and substrate uptake of OCT1. We find that human variants primarily disrupt function via folding rather than substrate uptake. Our study revealed that the major determinants of folding reside in the first 300 amino acids, including the first 6 transmembrane domains and the extracellular domain (ECD) with a stabilizing and highly conserved stabilizing helical motif making key interactions between the ECD and transmembrane domains. Using the functional data combined with computational approaches, we determine and validate a structure-function model of OCT1s conformational ensemble without experimental structures. Using this model and molecular dynamic simulations of key mutants, we determine biophysical mechanisms for how specific human variants alter transport phenotypes. We identify differences in frequencies of reduced function alleles across populations with East Asians vs European populations having the lowest and highest frequency of reduced function variants, respectively. Mining human population databases reveals that reduced function alleles of OCT1 identified in this study associate significantly with high LDL cholesterol levels. Our general approach broadly applied could transform the landscape of precision medicine by producing a mechanistic basis for understanding the effects of human mutations on disease and drug response.
Collapse
Affiliation(s)
- Sook Wah Yee
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Christian Macdonald
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Darko Mitrovic
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Xujia Zhou
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Megan L Koleske
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Jia Yang
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Dina Buitrago Silva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Patrick Rockefeller Grimes
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Donovan Trinidad
- Department of Medicine, Division of Infectious Disease, University of California, San Francisco, United States
| | - Swati S More
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Current address: Center for Drug Design (CDD), College of Pharmacy, University of Minnesota, Minnesota, United States
| | - Linda Kachuri
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - John S Witte
- Epidemiology and Population Health, Stanford University, California, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, United States
| | - Lucie Delemotte
- Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, 12121 Solna, Sweden
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
| | - Willow Coyote-Maestas
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
- Quantitative Biosciences Institute, University of California, San Francisco, United States
| |
Collapse
|
19
|
Mészáros B, Park E, Malinverni D, Sejdiu BI, Immadisetty K, Sandhu M, Lang B, Babu MM. Recent breakthroughs in computational structural biology harnessing the power of sequences and structures. Curr Opin Struct Biol 2023; 80:102608. [PMID: 37182396 DOI: 10.1016/j.sbi.2023.102608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 05/16/2023]
Abstract
Recent advances in computational approaches and their integration into structural biology enable tackling increasingly complex questions. Here, we discuss several key areas, highlighting breakthroughs and remaining challenges. Theoretical modeling has provided tools to accurately predict and design protein structures on a scale currently difficult to achieve using experimental approaches. Molecular Dynamics simulations have become faster and more precise, delivering actionable information inaccessible by current experimental methods. Virtual screening workflows allow a high-throughput approach to discover ligands that bind and modulate protein function, while Machine Learning methods enable the design of proteins with new functionalities. Integrative structural biology combines several of these approaches, pushing the frontiers of structural and functional characterization to ever larger systems, advancing towards a complete understanding of the living cell. These breakthroughs will accelerate and significantly impact diverse areas of science.
Collapse
Affiliation(s)
- Bálint Mészáros
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| | - Electa Park
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| | - Duccio Malinverni
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/DucMalinverni
| | - Besian I Sejdiu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/bisejdiu
| | - Kalyan Immadisetty
- Department of Bone Marrow Transplantation & Cellular Therapy, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/k_immadisetty
| | - Manbir Sandhu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/M5andhu
| | - Benjamin Lang
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA. https://twitter.com/langbnj
| | - M Madan Babu
- Department of Structural Biology and Center of Excellence for Data Driven Discovery, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN, 38105, USA.
| |
Collapse
|
20
|
Huh E, Agosto MA, Wensel TG, Lichtarge O. Coevolutionary signals in metabotropic glutamate receptors capture residue contacts and long-range functional interactions. J Biol Chem 2023; 299:103030. [PMID: 36806686 PMCID: PMC10060750 DOI: 10.1016/j.jbc.2023.103030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023] Open
Abstract
Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.
Collapse
Affiliation(s)
- Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Retina and Optic Nerve Research Laboratory, Department of Physiology and Biophysics, Dalhousie University, Halifax, Canada
| | - Theodore G Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
21
|
Coelho MA, Cooper S, Strauss ME, Karakoc E, Bhosle S, Gonçalves E, Picco G, Burgold T, Cattaneo CM, Veninga V, Consonni S, Dinçer C, Vieira SF, Gibson F, Barthorpe S, Hardy C, Rein J, Thomas M, Marioni J, Voest EE, Bassett A, Garnett MJ. Base editing screens map mutations affecting interferon-γ signaling in cancer. Cancer Cell 2023; 41:288-303.e6. [PMID: 36669486 PMCID: PMC9942875 DOI: 10.1016/j.ccell.2022.12.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 11/14/2022] [Accepted: 12/22/2022] [Indexed: 01/20/2023]
Abstract
Interferon-γ (IFN-γ) signaling mediates host responses to infection, inflammation and anti-tumor immunity. Mutations in the IFN-γ signaling pathway cause immunological disorders, hematological malignancies, and resistance to immune checkpoint blockade (ICB) in cancer; however, the function of most clinically observed variants remains unknown. Here, we systematically investigate the genetic determinants of IFN-γ response in colorectal cancer cells using CRISPR-Cas9 screens and base editing mutagenesis. Deep mutagenesis of JAK1 with cytidine and adenine base editors, combined with pathway-wide screens, reveal loss-of-function and gain-of-function mutations, including causal variants in hematological malignancies and mutations detected in patients refractory to ICB. We functionally validate variants of uncertain significance in primary tumor organoids, where engineering missense mutations in JAK1 enhanced or reduced sensitivity to autologous tumor-reactive T cells. We identify more than 300 predicted missense mutations altering IFN-γ pathway activity, generating a valuable resource for interpreting gene variant function.
Collapse
Affiliation(s)
- Matthew A Coelho
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Sarah Cooper
- Gene Editing and Cellular Research and Development, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | | | - Emre Karakoc
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Shriram Bhosle
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK
| | - Emanuel Gonçalves
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Instituto Superior Técnico, Universidade de Lisboa, 1049-001, and, INESC-ID, 1000-029, Lisbon, Portugal
| | - Gabriele Picco
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Thomas Burgold
- Gene Editing and Cellular Research and Development, Wellcome Sanger Institute, Hinxton, UK
| | - Chiara M Cattaneo
- Department of Immunology and Molecular Oncology, Netherlands Cancer Institute, Amsterdam, the Netherlands; Open Targets, Cambridge, UK
| | - Vivien Veninga
- Department of Immunology and Molecular Oncology, Netherlands Cancer Institute, Amsterdam, the Netherlands; Open Targets, Cambridge, UK
| | - Sarah Consonni
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Cansu Dinçer
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK
| | - Sara F Vieira
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Freddy Gibson
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK
| | - Syd Barthorpe
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK
| | - Claire Hardy
- Cancer, Ageing and Somatic Mutation, Wellcome Sanger Institute, Hinxton, UK
| | - Joel Rein
- Cellular Operations and Stem Cell Informatics, Wellcome Sanger Institute, Hinxton, UK
| | - Mark Thomas
- Cellular Operations and Stem Cell Informatics, Wellcome Sanger Institute, Hinxton, UK
| | - John Marioni
- EMBL-European Bioinformatics Institute, Cambridge, UK
| | - Emile E Voest
- Department of Immunology and Molecular Oncology, Netherlands Cancer Institute, Amsterdam, the Netherlands; Oncode Institute, Utrecht, the Netherlands; Open Targets, Cambridge, UK
| | - Andrew Bassett
- Gene Editing and Cellular Research and Development, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK
| | - Mathew J Garnett
- Translational Cancer Genomics, Wellcome Sanger Institute, Hinxton, UK; Open Targets, Cambridge, UK.
| |
Collapse
|
22
|
Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023; 12:82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Collapse
Affiliation(s)
- Abel Chandra
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Laura Tünnermann
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
| | - Tommy Löfstedt
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Regina Gratz
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
- Department of Forest Ecology and Management, Swedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
23
|
Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet 2023; 14:1087267. [PMID: 36713072 PMCID: PMC9878224 DOI: 10.3389/fgene.2023.1087267] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Collapse
Affiliation(s)
- Huijin Wei
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
| | - Xianghua Li
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom
- The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China
- Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China
| |
Collapse
|
24
|
Draghi JA, Ogbunugafor CB. Exploring the expanse between theoretical questions and experimental approaches in the modern study of evolvability. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART B, MOLECULAR AND DEVELOPMENTAL EVOLUTION 2023; 340:8-17. [PMID: 35451559 PMCID: PMC10083935 DOI: 10.1002/jez.b.23134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/04/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]
Abstract
Despite several decades of computational and experimental work across many systems, evolvability remains on the periphery with regards to its status as a widely accepted and regularly applied theoretical concept. Here we propose that its marginal status is partly a result of large gaps between the diverse but disconnected theoretical treatments of evolvability and the relatively narrower range of studies that have tested it empirically. To make this case, we draw on a range of examples-from experimental evolution in microbes, to molecular evolution in proteins-where attempts have been made to mend this disconnect. We highlight some examples of progress that has been made and point to areas where synthesis and translation of existing theory can lead to further progress in the still-new field of empirical measurements of evolvability.
Collapse
Affiliation(s)
- Jeremy A Draghi
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, USA
| | - C Brandon Ogbunugafor
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, Connecticut, USA
| |
Collapse
|
25
|
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu Rev Genet 2022; 56:441-465. [PMID: 36055970 DOI: 10.1146/annurev-genet-072920-032107] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Collapse
Affiliation(s)
- Daniel Tabet
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Victoria Parikh
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Frederick P Roth
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Genomic Medicine and Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Harvard University, Boston, Massachusetts, USA;
| |
Collapse
|
26
|
Nakashima N, Nakashima A, Nakashima K, Takano M. Olfactory marker protein contains a leucine-rich domain in the Ω-loop important for nuclear export. Mol Brain 2022; 15:89. [PMID: 36333725 PMCID: PMC9636679 DOI: 10.1186/s13041-022-00973-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 10/15/2022] [Indexed: 11/06/2022] Open
Abstract
Olfactory marker protein (OMP) is a cytosolic protein expressed in mature olfactory receptor neurons (ORNs). OMP modulates cAMP signalling and regulates olfactory sensation and axonal targeting. OMP is a small soluble protein, and passive diffusion between nucleus and cytoplasm is expected. However, OMP is mostly situated in the cytosol and is only sparsely detected in the nuclei of a subset of ORNs, hypothalamic neurons and heterologously OMP-expressing cultured cells. OMP can enter the nucleus in association with transcription factors. However, how OMP is retained in the cytosol at rest is unclear. Because OMP is proposed to affect cell differentiation, it is important to understand how OMP is distributed between cytoplasm and nucleus. To elucidate the structural profile of OMP, we applied several bioinformatics methods to a multiple sequence alignment (MSA) of OMP protein sequences and ranked the evolutionarily conserved residues. In addition to the previously reported cAMP-binding domain, we identified a leucine-rich domain in the Ω-loop of OMP. We introduced mutations into the leucine-rich region and heterologously expressed the mutant OMP in HEK293T cells. Mutations into alanine increased the nuclear distribution of OMP quantified by immunocytochemistry and western blotting. Therefore, we concluded that OMP contains a leucine-rich domain important for nuclear transport.
Collapse
Affiliation(s)
- Noriyuki Nakashima
- grid.410781.b0000 0001 0706 0776Department of Physiology, Kurume University School of Medicine, 67 Asahi-Machi, Kurume-Shi, Fukuoka 830-0011 Japan
| | - Akiko Nakashima
- grid.410781.b0000 0001 0706 0776Department of Physiology, Kurume University School of Medicine, 67 Asahi-Machi, Kurume-Shi, Fukuoka 830-0011 Japan
| | - Kie Nakashima
- grid.31432.370000 0001 1092 3077Department of Physiology and Cell Biology, Kobe University School of Medicine, 7-5-1 Kusunoki-Cho, Chuo-Ku, Kobe, 650-0017 Japan
| | - Makoto Takano
- grid.410781.b0000 0001 0706 0776Department of Physiology, Kurume University School of Medicine, 67 Asahi-Machi, Kurume-Shi, Fukuoka 830-0011 Japan
| |
Collapse
|
27
|
Azbukina N, Zharikova A, Ramensky V. Intragenic compensation through the lens of deep mutational scanning. Biophys Rev 2022; 14:1161-1182. [PMID: 36345285 PMCID: PMC9636336 DOI: 10.1007/s12551-022-01005-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 09/26/2022] [Indexed: 12/20/2022] Open
Abstract
A significant fraction of mutations in proteins are deleterious and result in adverse consequences for protein function, stability, or interaction with other molecules. Intragenic compensation is a specific case of positive epistasis when a neutral missense mutation cancels effect of a deleterious mutation in the same protein. Permissive compensatory mutations facilitate protein evolution, since without them all sequences would be extremely conserved. Understanding compensatory mechanisms is an important scientific challenge at the intersection of protein biophysics and evolution. In human genetics, intragenic compensatory interactions are important since they may result in variable penetrance of pathogenic mutations or fixation of pathogenic human alleles in orthologous proteins from related species. The latter phenomenon complicates computational and clinical inference of an allele's pathogenicity. Deep mutational scanning is a relatively new technique that enables experimental studies of functional effects of thousands of mutations in proteins. We review the important aspects of the field and discuss existing limitations of current datasets. We reviewed ten published DMS datasets with quantified functional effects of single and double mutations and described rates and patterns of intragenic compensation in eight of them. Supplementary Information The online version contains supplementary material available at 10.1007/s12551-022-01005-w.
Collapse
Affiliation(s)
- Nadezhda Azbukina
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
| | - Anastasia Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| | - Vasily Ramensky
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 1-73, Leninskie Gory, 119991 Moscow, Russia
- National Medical Research Center for Therapy and Preventive Medicine, Petroverigsky per., 10, Bld.3, 101000 Moscow, Russia
| |
Collapse
|
28
|
Seuma M, Bolognesi B. Understanding and evolving prions by yeast multiplexed assays. Curr Opin Genet Dev 2022; 75:101941. [PMID: 35777350 DOI: 10.1016/j.gde.2022.101941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/19/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022]
Abstract
Yeast genetics made it possible to derive the first fundamental insights into prion composition, conformation, and propagation. Fast-forward 30 years and the same model organism is now proving an extremely powerful tool to comprehensively explore the impact of mutations in prion sequences on their function, toxicity, and physical properties. Here, we provide an overview of novel multiplexed strategies where deep mutagenesis is combined to a range of tailored selection assays in yeast, which are particularly amenable for investigating prions and prion-like sequences. By mimicking evolution in a flask, these multiplexed approaches are revealing mechanistic insights on the consequences of prion self-assembly, while also reporting on the structure prion sequences adopt in vivo.
Collapse
Affiliation(s)
- Mireia Seuma
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028 Barcelona, Spain. https://twitter.com/@mseumaar
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028 Barcelona, Spain.
| |
Collapse
|
29
|
Braberg H, Echeverria I, Kaake RM, Sali A, Krogan NJ. From systems to structure - using genetic data to model protein structures. Nat Rev Genet 2022; 23:342-354. [PMID: 35013567 PMCID: PMC8744059 DOI: 10.1038/s41576-021-00441-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2021] [Indexed: 12/11/2022]
Abstract
Understanding the effects of genetic variation is a fundamental problem in biology that requires methods to analyse both physical and functional consequences of sequence changes at systems-wide and mechanistic scales. To achieve a systems view, protein interaction networks map which proteins physically interact, while genetic interaction networks inform on the phenotypic consequences of perturbing these protein interactions. Until recently, understanding the molecular mechanisms that underlie these interactions often required biophysical methods to determine the structures of the proteins involved. The past decade has seen the emergence of new approaches based on coevolution, deep mutational scanning and genome-scale genetic or chemical-genetic interaction mapping that enable modelling of the structures of individual proteins or protein complexes. Here, we review the emerging use of large-scale genetic datasets and deep learning approaches to model protein structures and their interactions, and discuss the integration of structural data from different sources.
Collapse
Affiliation(s)
- Hannes Braberg
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Ignacia Echeverria
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Robyn M Kaake
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Gladstone Institutes, San Francisco, CA, USA
| | - Andrej Sali
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
| | - Nevan J Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA, USA.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Gladstone Institutes, San Francisco, CA, USA.
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
30
|
Coyote-Maestas W, Nedrud D, He Y, Schmidt D. Determinants of trafficking, conduction, and disease within a K + channel revealed through multiparametric deep mutational scanning. eLife 2022; 11:e76903. [PMID: 35639599 PMCID: PMC9273215 DOI: 10.7554/elife.76903] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 05/27/2022] [Indexed: 01/04/2023] Open
Abstract
A long-standing goal in protein science and clinical genetics is to develop quantitative models of sequence, structure, and function relationships to understand how mutations cause disease. Deep mutational scanning (DMS) is a promising strategy to map how amino acids contribute to protein structure and function and to advance clinical variant interpretation. Here, we introduce 7429 single-residue missense mutations into the inward rectifier K+ channel Kir2.1 and determine how this affects folding, assembly, and trafficking, as well as regulation by allosteric ligands and ion conduction. Our data provide high-resolution information on a cotranslationally folded biogenic unit, trafficking and quality control signals, and segregated roles of different structural elements in fold stability and function. We show that Kir2.1 surface trafficking mutants are underrepresented in variant effect databases, which has implications for clinical practice. By comparing fitness scores with expert-reviewed variant effects, we can predict the pathogenicity of 'variants of unknown significance' and disease mechanisms of known pathogenic mutations. Our study in Kir2.1 provides a blueprint for how multiparametric DMS can help us understand the mechanistic basis of genetic disorders and the structure-function relationships of proteins.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- Department of Biochemistry, Molecular Biology and Biophysics, University of MinnesotaMinneapolisUnited States
| | - David Nedrud
- Department of Biochemistry, Molecular Biology and Biophysics, University of MinnesotaMinneapolisUnited States
| | - Yungui He
- Department of Genetics, Cell Biology and Development, University of MinnesotaMinneapolisUnited States
| | - Daniel Schmidt
- Department of Genetics, Cell Biology and Development, University of MinnesotaMinneapolisUnited States
| |
Collapse
|
31
|
Wang C, Anglès F, Balch WE. Triangulating variation in the population to define mechanisms for precision management of genetic disease. Structure 2022; 30:1190-1207.e5. [PMID: 35714602 PMCID: PMC9357173 DOI: 10.1016/j.str.2022.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 04/18/2022] [Accepted: 05/17/2022] [Indexed: 10/18/2022]
Abstract
To understand mechanistically how the protein fold is shaped by therapeutics to inform precision management of disease, we developed variation-capture (VarC) mapping. VarC triangulates sparse sequence variation information found in the population using Gaussian process regression (GPR)-based machine learning to define the combined pairwise-residue interactions contributing to dynamic protein function in the individual in response to therapeutics. Using VarC mapping, we now reveal the pairwise-residue covariant relationships across the entire protein fold of cystic fibrosis (CF) transmembrane conductance regulator (CFTR) to define the molecular mechanisms of clinically approved CF chemical modulators. We discover an energetically destabilized covariant core containing a di-acidic YKDAD endoplasmic reticulum (ER) exit code that is only weakly corrected by current therapeutics. Our results illustrate that VarC provides a generalizable tool to triangulate information from genetic variation in the population to mechanistically discover therapeutic strategies that guide precision management of the individual.
Collapse
Affiliation(s)
- Chao Wang
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA
| | - Frédéric Anglès
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA
| | - William E Balch
- Department of Molecular Medicine, Scripps Research, La Jolla, CA 92037, USA.
| |
Collapse
|
32
|
Ding D, Green AG, Wang B, Lite TLV, Weinstein EN, Marks DS, Laub MT. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 2022; 6:590-603. [PMID: 35361892 PMCID: PMC9090974 DOI: 10.1038/s41559-022-01688-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/31/2022] [Indexed: 01/08/2023]
Abstract
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin-antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin-antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.
Collapse
Affiliation(s)
- David Ding
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Boyuan Wang
- Department of Pharmacology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Thuy-Lan Vo Lite
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
33
|
Staller MV, Ramirez E, Kotha SR, Holehouse AS, Pappu RV, Cohen BA. Directed mutational scanning reveals a balance between acidic and hydrophobic residues in strong human activation domains. Cell Syst 2022; 13:334-345.e5. [PMID: 35120642 PMCID: PMC9241528 DOI: 10.1016/j.cels.2022.01.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 10/20/2021] [Accepted: 01/05/2022] [Indexed: 01/01/2023]
Abstract
Acidic activation domains are intrinsically disordered regions of the transcription factors that bind coactivators. The intrinsic disorder and low evolutionary conservation of activation domains have made it difficult to identify the sequence features that control activity. To address this problem, we designed thousands of variants in seven acidic activation domains and measured their activities with a high-throughput assay in human cell culture. We found that strong activation domain activity requires a balance between the number of acidic residues and aromatic and leucine residues. These findings motivated a predictor of acidic activation domains that scans the human proteome for clusters of aromatic and leucine residues embedded in regions of high acidity. This predictor identifies known activation domains and accurately predicts previously unidentified ones. Our results support a flexible acidic exposure model of activation domains in which the acidic residues solubilize hydrophobic motifs so that they can interact with coactivators. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Max V Staller
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA.
| | - Eddie Ramirez
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA
| | - Sanjana R Kotha
- Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Center for Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Rohit V Pappu
- Center for Science and Engineering of Living Systems, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Barak A Cohen
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine in St. Louis, Saint Louis, MO 63110, USA.
| |
Collapse
|
34
|
Lin WC, Tang HC, Wang HY, Kao CY, Chang YC, Li AH, Yang SB, Mou KY. Fragment-Directed Random Mutagenesis by the Reverse Kunkel Method. ACS Synth Biol 2022; 11:1658-1668. [PMID: 35324156 DOI: 10.1021/acssynbio.2c00086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Two fundamentally different approaches are routinely used for protein engineering: user-defined mutagenesis and random mutagenesis, each with its own strengths and weaknesses. Here, we invent a unique mutagenesis protocol, which combines the advantages of user-defined mutagenesis and random mutagenesis. The new method, termed the reverse Kunkel method, allows the user to create random mutations at multiple specified regions in a one-pot reaction. We demonstrated the reverse Kunkel method by mimicking the somatic hypermutation in antibodies that introduces random mutations concentrated in complementarity-determining regions. Coupling with the phage display and yeast display selections, we successfully generated dramatically improved antibodies against a model protein and a neurotransmitter peptide in terms of affinity and immunostaining performance. The reverse Kunkel method is especially suitable for engineering proteins whose activities are determined by multiple variable regions, such as antibodies and adeno-associated virus capsids, or whose functional domains are composed of several discontinuous sequences, such as Cas9 and Cas12a.
Collapse
Affiliation(s)
- Wen-Ching Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| | - Hao-Cheng Tang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| | - Han Ying Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Yi Kao
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
- Taiwan International Graduate Program in Molecular Medicine, National Yang Ming Chiao Tung University and Academia Sinica, Taipei 11529, Taiwan
| | - You-Chiun Chang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
- Taiwan International Graduate Program in Chemical Biology and Molecular Biophysics, National Taiwan University and Academia Sinica, Taipei 11529, Taiwan
| | - Athena Hsu Li
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
- Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Yang Ming Chiao Tung University and Academia Sinica, Taipei 11529, Taiwan
| | - Shi-Bing Yang
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| | - Kurt Yun Mou
- Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
35
|
Echeverria I, Braberg H, Krogan NJ, Sali A. Integrative structure determination of histones H3 and H4 using genetic interactions. FEBS J 2022; 290:2565-2575. [PMID: 35298864 PMCID: PMC9481981 DOI: 10.1111/febs.16435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/11/2022] [Accepted: 03/15/2022] [Indexed: 11/28/2022]
Abstract
Integrative structure modeling is increasingly used for determining the architectures of biological assemblies, especially those that are structurally heterogeneous. Recently, we reported on how to convert in vivo genetic interaction measurements into spatial restraints for structural modeling: first, phenotypic profiles are generated for each point mutation and thousands of gene deletions or environmental perturbations. Following, the phenotypic profile similarities are converted into distance restraints on the pairs of mutated residues. We illustrate the approach by determining the structure of the histone H3-H4 complex. The method is implemented in our open-source IMP program, expanding the structural biology toolbox by allowing structural characterization based on in vivo data without the need to purify the target system. We compare genetic interaction measurements to other sources of structural information, such as residue coevolution and deep-learning structure prediction of complex subunits. We also suggest that determining genetic interactions could benefit from new technologies, such as CRISPR-Cas9 approaches to gene editing, especially for mammalian cells. Finally, we highlight the opportunity for using genetic interactions to determine recalcitrant biomolecular structures, such as those of disordered proteins, transient protein assemblies, and host-pathogen protein complexes.
Collapse
Affiliation(s)
- Ignacia Echeverria
- Department of Cellular and Molecular Pharmacology University of California, San Francisco CA USA
- Quantitative Biosciences Institute University of California, San Francisco CA USA
- Department of Bioengineering and Therapeutic Sciences University of California, San Francisco CA USA
| | - Hannes Braberg
- Department of Cellular and Molecular Pharmacology University of California, San Francisco CA USA
- Quantitative Biosciences Institute University of California, San Francisco CA USA
| | - Nevan J. Krogan
- Department of Cellular and Molecular Pharmacology University of California, San Francisco CA USA
- Quantitative Biosciences Institute University of California, San Francisco CA USA
- Gladstone Institute of Data Science and Biotechnology J. David Gladstone Institutes San Francisco CA USA
| | - Andrej Sali
- Quantitative Biosciences Institute University of California, San Francisco CA USA
- Department of Bioengineering and Therapeutic Sciences University of California, San Francisco CA USA
- Department of Pharmaceutical Chemistry University of California, San Francisco CA USA
| |
Collapse
|
36
|
Spielmann M, Kircher M. Computational and experimental methods for classifying variants of unknown clinical significance. Cold Spring Harb Mol Case Stud 2022; 8:mcs.a006196. [PMID: 35483875 PMCID: PMC9059783 DOI: 10.1101/mcs.a006196] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The increase in sequencing capacity, reduction in costs, and national and international coordinated efforts have led to the widespread introduction of next-generation sequencing (NGS) technologies in patient care. More generally, human genetics and genomic medicine are gaining importance for more and more patients. Some communities are already discussing the prospect of sequencing each individual's genome at time of birth. Together with digital health records, this shall enable individualized treatments and preventive measures, so-called precision medicine. A central step in this process is the identification of disease causal mutations or variant combinations that make us more susceptible for diseases. Although various technological advances have improved the identification of genetic alterations, the interpretation and ranking of the identified variants remains a major challenge. Based on our knowledge of molecular processes or previously identified disease variants, we can identify potentially functional genetic variants and, using different lines of evidence, we are sometimes able to demonstrate their pathogenicity directly. However, the vast majority of variants are classified as variants of uncertain clinical significance (VUSs) with not enough experimental evidence to determine their pathogenicity. In these cases, computational methods may be used to improve the prioritization and an increasing toolbox of experimental methods is emerging that can be used to assay the molecular effects of VUSs. Here, we discuss how computational and experimental methods can be used to create catalogs of variant effects for a variety of molecular and cellular phenotypes. We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity predictions for clinical applications.
Collapse
Affiliation(s)
- Malte Spielmann
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Institute of Human Genetics, Christian-Albrechts-Universität, 24105 Kiel, Germany;,Human Molecular Genomics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Martin Kircher
- Institute of Human Genetics, University of Lübeck, 23562 Lübeck, Germany;,Berlin Institute of Health at Charité—Universitätsmedizin Berlin, 10117 Berlin, Germany;,DZHK (German Centre for Cardiovascular Research), partner site Berlin, 10115 Berlin, Germany
| |
Collapse
|
37
|
Linking protein structural and functional change to mutation using amino acid networks. PLoS One 2022; 17:e0261829. [PMID: 35061689 PMCID: PMC8782487 DOI: 10.1371/journal.pone.0261829] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 12/11/2021] [Indexed: 11/30/2022] Open
Abstract
The function of a protein is strongly dependent on its structure. During evolution, proteins acquire new functions through mutations in the amino-acid sequence. Given the advance in deep mutational scanning, recent findings have found functional change to be position dependent, notwithstanding the chemical properties of mutant and mutated amino acids. This could indicate that structural properties of a given position are potentially responsible for the functional relevance of a mutation. Here, we looked at the relation between structure and function of positions using five proteins with experimental data of functional change available. In order to measure structural change, we modeled mutated proteins via amino-acid networks and quantified the perturbation of each mutation. We found that structural change is position dependent, and strongly related to functional change. Strong changes in protein structure correlate with functional loss, and positions with functional gain due to mutations tend to be structurally robust. Finally, we constructed a computational method to predict functionally sensitive positions to mutations using structural change that performs well on all five proteins with a mean precision of 74.7% and recall of 69.3% of all functional positions.
Collapse
|
38
|
Scheele RA, Lindenburg LH, Petek M, Schober M, Dalby KN, Hollfelder F. Droplet-based screening of phosphate transfer catalysis reveals how epistasis shapes MAP kinase interactions with substrates. Nat Commun 2022; 13:844. [PMID: 35149678 PMCID: PMC8837617 DOI: 10.1038/s41467-022-28396-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 01/10/2022] [Indexed: 11/20/2022] Open
Abstract
The combination of ultrahigh-throughput screening and sequencing informs on function and intragenic epistasis within combinatorial protein mutant libraries. Establishing a droplet-based, in vitro compartmentalised approach for robust expression and screening of protein kinase cascades (>107 variants/day) allowed us to dissect the intrinsic molecular features of the MKK-ERK signalling pathway, without interference from endogenous cellular components. In a six-residue combinatorial library of the MKK1 docking domain, we identified 29,563 sequence permutations that allow MKK1 to efficiently phosphorylate and activate its downstream target kinase ERK2. A flexibly placed hydrophobic sequence motif emerges which is defined by higher order epistatic interactions between six residues, suggesting synergy that enables high connectivity in the sequence landscape. Through positive epistasis, MKK1 maintains function during mutagenesis, establishing the importance of co-dependent residues in mammalian protein kinase-substrate interactions, and creating a scenario for the evolution of diverse human signalling networks. Here, the authors use a droplet-based screen for phosphate transfer catalysis, testing variants of the human protein kinase MKK1 for its ability to activate its downstream target ERK2. Data reveal a flexible motif in the MKK1 docking domain that promotes efficient activation of ERK2, and suggest epistasis between the residues within that sequence.
Collapse
Affiliation(s)
- Remkes A Scheele
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | | | - Maya Petek
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.,Faculty of Medicine, University of Maribor, SI-2000, Maribor, Slovenia
| | - Markus Schober
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Kevin N Dalby
- Division of Chemical Biology and Medicinal Chemistry, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| |
Collapse
|
39
|
Barbon L, Offord V, Radford EJ, Butler AP, Gerety SS, Adams DJ, Tan HK, Waters AJ. Variant Library Annotation Tool (VaLiAnT): an oligonucleotide library design and annotation tool for saturation genome editing and other deep mutational scanning experiments. Bioinformatics 2022; 38:892-899. [PMID: 34791067 PMCID: PMC8796380 DOI: 10.1093/bioinformatics/btab776] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 07/13/2021] [Accepted: 11/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION CRISPR/Cas9-based technology allows for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context. This approach, known as saturation genome editing (SGE), a form of deep mutational scanning, systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome. This technology is applicable to diverse fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and host-pathogen interactions. Here, we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate variant libraries from user-defined genomic coordinates and standard input files. The software can accommodate user-specified species, reference sequences and transcript annotations. RESULTS Coordinates for a genomic range are provided by the user to retrieve a corresponding oligonucleotide reference sequence. A user-specified range within this sequence is then subject to systematic, nucleotide and/or amino acid saturating mutator functions. VaLiAnT provides a novel way to retrieve, mutate and annotate genomic sequences for oligonucleotide library generation. Specific features for SGE library generation can be employed. In addition, VaLiAnT is configurable, allowing for cDNA and prime editing saturation library generation, with other diverse applications possible. AVAILABILITY AND IMPLEMENTATION VaLiAnT is a command line tool written in Python. Source code, testing data, example input and output files and executables are available (https://github.com/cancerit/VaLiAnT) in addition to a detailed user manual (https://github.com/cancerit/VaLiAnT/wiki). VaLiAnT is licensed under AGPLv3. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca Barbon
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Victoria Offord
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Elizabeth J Radford
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Cambridge CB2 0QQ, UK
| | - Adam P Butler
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Sebastian S Gerety
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - David J Adams
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Hong Kee Tan
- Human Genetics Programme, Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew J Waters
- Cancer, Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
40
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
41
|
Coyote-Maestas W, Nedrud D, Suma A, He Y, Matreyek KA, Fowler DM, Carnevale V, Myers CL, Schmidt D. Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling. Nat Commun 2021; 12:7114. [PMID: 34880224 PMCID: PMC8654947 DOI: 10.1038/s41467-021-27342-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - David Nedrud
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Antonio Suma
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Yungui He
- grid.17635.360000000419368657Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455 USA
| | - Kenneth A. Matreyek
- grid.67105.350000 0001 2164 3847Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Douglas M. Fowler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98115 USA ,grid.34477.330000000122986657Department of Bioengineering, University of Washington, Seattle, WA 98115 USA
| | - Vincenzo Carnevale
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Chad L. Myers
- grid.17635.360000000419368657Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
42
|
Chu HY, Wong ASL. Facilitating Machine Learning-Guided Protein Engineering with Smart Library Design and Massively Parallel Assays. ADVANCED GENETICS (HOBOKEN, N.J.) 2021; 2:2100038. [PMID: 36619853 PMCID: PMC9744531 DOI: 10.1002/ggn2.202100038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 11/08/2021] [Indexed: 01/11/2023]
Abstract
Protein design plays an important role in recent medical advances from antibody therapy to vaccine design. Typically, exhaustive mutational screens or directed evolution experiments are used for the identification of the best design or for improvements to the wild-type variant. Even with a high-throughput screening on pooled libraries and Next-Generation Sequencing to boost the scale of read-outs, surveying all the variants with combinatorial mutations for their empirical fitness scores is still of magnitudes beyond the capacity of existing experimental settings. To tackle this challenge, in-silico approaches using machine learning to predict the fitness of novel variants based on a subset of empirical measurements are now employed. These machine learning models turn out to be useful in many cases, with the premise that the experimentally determined fitness scores and the amino-acid descriptors of the models are informative. The machine learning models can guide the search for the highest fitness variants, resolve complex epistatic relationships, and highlight bio-physical rules for protein folding. Using machine learning-guided approaches, researchers can build more focused libraries, thus relieving themselves from labor-intensive screens and fast-tracking the optimization process. Here, we describe the current advances in massive-scale variant screens, and how machine learning and mutagenesis strategies can be integrated to accelerate protein engineering. More specifically, we examine strategies to make screens more economical, informative, and effective in discovery of useful variants.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic BiologySchool of Biomedical SciencesThe University of Hong KongHong Kong852China
| | - Alan S. L. Wong
- Laboratory of Combinatorial Genetics and Synthetic BiologySchool of Biomedical SciencesThe University of Hong KongHong Kong852China
- Electrical and Electronic EngineeringThe University of Hong KongPokfulamHong Kong852China
| |
Collapse
|
43
|
Modulating Glycoside Hydrolase Activity between Hydrolysis and Transfer Reactions Using an Evolutionary Approach. Molecules 2021; 26:molecules26216586. [PMID: 34770995 PMCID: PMC8587830 DOI: 10.3390/molecules26216586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/27/2021] [Accepted: 10/28/2021] [Indexed: 01/02/2023] Open
Abstract
The proteins within the CAZy glycoside hydrolase family GH13 catalyze the hydrolysis of polysaccharides such as glycogen and starch. Many of these enzymes also perform transglycosylation in various degrees, ranging from secondary to predominant reactions. Identifying structural determinants associated with GH13 family reaction specificity is key to modifying and designing enzymes with increased specificity towards individual reactions for further applications in industrial, chemical, or biomedical fields. This work proposes a computational approach for decoding the determinant structural composition defining the reaction specificity. This method is based on the conservation of coevolving residues in spatial contacts associated with reaction specificity. To evaluate the algorithm, mutants of α-amylase (TmAmyA) and glucanotransferase (TmGTase) from Thermotoga maritima were constructed to modify the reaction specificity. The K98P/D99A/H222Q variant from TmAmyA doubled the transglycosydation/hydrolysis (T/H) ratio while the M279N variant from TmGTase increased the hydrolysis/transglycosidation ratio five-fold. Molecular dynamic simulations of the variants indicated changes in flexibility that can account for the modified T/H ratio. An essential contribution of the presented computational approach is its capacity to identify residues outside of the active center that affect the reaction specificity.
Collapse
|
44
|
Zutz A, Hamborg L, Pedersen LE, Kassem MM, Papaleo E, Koza A, Herrgård MJ, Jensen SI, Teilum K, Lindorff-Larsen K, Nielsen AT. A dual-reporter system for investigating and optimizing protein translation and folding in E. coli. Nat Commun 2021; 12:6093. [PMID: 34667164 PMCID: PMC8526717 DOI: 10.1038/s41467-021-26337-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 10/01/2021] [Indexed: 01/29/2023] Open
Abstract
Strategies for investigating and optimizing the expression and folding of proteins for biotechnological and pharmaceutical purposes are in high demand. Here, we describe a dual-reporter biosensor system that simultaneously assesses in vivo protein translation and protein folding, thereby enabling rapid screening of mutant libraries. We have validated the dual-reporter system on five different proteins and find an excellent correlation between reporter signals and the levels of protein expression and solubility of the proteins. We further demonstrate the applicability of the dual-reporter system as a screening assay for deep mutational scanning experiments. The system enables high throughput selection of protein variants with high expression levels and altered protein stability. Next generation sequencing analysis of the resulting libraries of protein variants show a good correlation between computationally predicted and experimentally determined protein stabilities. We furthermore show that the mutational experimental data obtained using this system may be useful for protein structure calculations.
Collapse
Affiliation(s)
- Ariane Zutz
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Louise Hamborg
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Lasse Ebdrup Pedersen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Maher M Kassem
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Elena Papaleo
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Anna Koza
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Markus J Herrgård
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Sheila Ingemann Jensen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark
| | - Kaare Teilum
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Alex Toftgaard Nielsen
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 220, 2800 Kgs, Lyngby, Denmark.
| |
Collapse
|
45
|
Sesta L, Uguzzoni G, Fernandez-de-Cossio-Diaz J, Pagnani A. AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape. Int J Mol Sci 2021; 22:10908. [PMID: 34681569 PMCID: PMC8535593 DOI: 10.3390/ijms222010908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 01/12/2023] Open
Abstract
We present Annealed Mutational approximated Landscape (AMaLa), a new method to infer fitness landscapes from Directed Evolution experiments sequencing data. Such experiments typically start from a single wild-type sequence, which undergoes Darwinian in vitro evolution via multiple rounds of mutation and selection for a target phenotype. In the last years, Directed Evolution is emerging as a powerful instrument to probe fitness landscapes under controlled experimental conditions and as a relevant testing ground to develop accurate statistical models and inference algorithms (thanks to high-throughput screening and sequencing). Fitness landscape modeling either uses the enrichment of variants abundances as input, thus requiring the observation of the same variants at different rounds or assuming the last sequenced round as being sampled from an equilibrium distribution. AMaLa aims at effectively leveraging the information encoded in the whole time evolution. To do so, while assuming statistical sampling independence between sequenced rounds, the possible trajectories in sequence space are gauged with a time-dependent statistical weight consisting of two contributions: (i) an energy term accounting for the selection process and (ii) a generalized Jukes-Cantor model for the purely mutational step. This simple scheme enables accurately describing the Directed Evolution dynamics and inferring a fitness landscape that correctly reproduces the measures of the phenotype under selection (e.g., antibiotic drug resistance), notably outperforming widely used inference strategies. In addition, we assess the reliability of AMaLa by showing how the inferred statistical model could be used to predict relevant structural properties of the wild-type sequence.
Collapse
Affiliation(s)
- Luca Sesta
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Guido Uguzzoni
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Sorbonne Université, 24 rue Lhomond, 75005 Paris, France
- Center of Molecular Immunology, Systems Biology Department, Playa, Havana CP 11600, Cuba
| | - Andrea Pagnani
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo, Italy
- INFN, Sezione di Torino, I-10125 Torino, Italy
| |
Collapse
|
46
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
47
|
Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 2021; 30:R187-R197. [PMID: 34338757 PMCID: PMC8490018 DOI: 10.1093/hmg/ddab219] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
The application of genomics to medicine has accelerated the discovery of mutations underlying disease and has enhanced our knowledge of the molecular underpinnings of diverse pathologies. As the amount of human genetic material queried via sequencing has grown exponentially in recent years, so too has the number of rare variants observed. Despite progress, our ability to distinguish which rare variants have clinical significance remains limited. Over the last decade, however, powerful experimental approaches have emerged to characterize variant effects orders of magnitude faster than before. Fueled by improved DNA synthesis and sequencing and, more recently, by CRISPR/Cas9 genome editing, multiplex functional assays provide a means of generating variant effect data in wide-ranging experimental systems. Here, I review recent applications of multiplex assays that link human variants to disease phenotypes and I describe emerging strategies that will enhance their clinical utility in coming years.
Collapse
Affiliation(s)
- Gregory M Findlay
- The Francis Crick Institute, The Genome Function Laboratory, London NW1 1AT, UK
| |
Collapse
|
48
|
Luo Y, Jiang G, Yu T, Liu Y, Vo L, Ding H, Su Y, Qian WW, Zhao H, Peng J. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 2021; 12:5743. [PMID: 34593817 PMCID: PMC8484459 DOI: 10.1038/s41467-021-25976-8] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 09/09/2021] [Indexed: 11/28/2022] Open
Abstract
Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.
Collapse
Affiliation(s)
- Yunan Luo
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Guangde Jiang
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Tianhao Yu
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Yang Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Lam Vo
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Hantian Ding
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Yufeng Su
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Wesley Wei Qian
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA.
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA.
| |
Collapse
|
49
|
Atilgan AR, Atilgan C. Computational strategies for protein conformational ensemble detection. Curr Opin Struct Biol 2021; 72:79-87. [PMID: 34563946 DOI: 10.1016/j.sbi.2021.08.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/13/2021] [Accepted: 08/17/2021] [Indexed: 01/18/2023]
Abstract
Protein function is constrained by the three-dimensional structure but is delineated by its dynamics. This framework must satisfy specificity of function along with adaptability to changing environments and evolvability under external constraints. The accessibility of the available regions of the energy landscape for a set of conditions and shifts in the populations upon their modulation have effects propagating across scales, from biomolecular interactions, to organisms, to populations. Developing the ability to detect and juggle protein conformations supplemented by a physics-based understanding has implications for not only in vivo problems but also for resistance impeding drug discovery and bionano-sensor design.
Collapse
Affiliation(s)
- Ali Rana Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Istanbul, Turkey
| | - Canan Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956, Istanbul, Turkey.
| |
Collapse
|
50
|
Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021; 10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open
Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
Collapse
Affiliation(s)
- Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| |
Collapse
|