1
|
Wirnsberger G, Pritišanac I, Oberdorfer G, Gruber K. Flattening the curve-How to get better results with small deep-mutational-scanning datasets. Proteins 2024; 92:886-902. [PMID: 38501649 DOI: 10.1002/prot.26686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/24/2024] [Accepted: 03/07/2024] [Indexed: 03/20/2024]
Abstract
Proteins are used in various biotechnological applications, often requiring the optimization of protein properties by introducing specific amino-acid exchanges. Deep mutational scanning (DMS) is an effective high-throughput method for evaluating the effects of these exchanges on protein function. DMS data can then inform the training of a neural network to predict the impact of mutations. Most approaches use some representation of the protein sequence for training and prediction. As proteins are characterized by complex structures and intricate residue interaction networks, directly providing structural information as input reduces the need to learn these features from the data. We introduce a method for encoding protein structures as stacked 2D contact maps, which capture residue interactions, their evolutionary conservation, and mutation-induced interaction changes. Furthermore, we explored techniques to augment neural network training performance on smaller DMS datasets. To validate our approach, we trained three neural network architectures originally used for image analysis on three DMS datasets, and we compared their performances with networks trained solely on protein sequences. The results confirm the effectiveness of the protein structure encoding in machine learning efforts on DMS data. Using structural representations as direct input to the networks, along with data augmentation and pretraining, significantly reduced demands on training data size and improved prediction performance, especially on smaller datasets, while performance on large datasets was on par with state-of-the-art sequence convolutional neural networks. The methods presented here have the potential to provide the same workflow as DMS without the experimental and financial burden of testing thousands of mutants. Additionally, we present an open-source, user-friendly software tool to make these data analysis techniques accessible, particularly to biotechnology and protein engineering researchers who wish to apply them to their mutagenesis data.
Collapse
Affiliation(s)
| | - Iva Pritišanac
- Institute of Molecular Biology and Biochemistry, Medical University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
| | - Gustav Oberdorfer
- BioTechMed-Graz, Graz, Austria
- Institute of Biochemistry, Graz University of Technology, Graz, Austria
| | - Karl Gruber
- Institute of Molecular Biosciences, University of Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
- Field of Excellence BioHealth, University of Graz, Graz, Austria
| |
Collapse
|
2
|
Scrima S, Lambrughi M, Tiberti M, Fadda E, Papaleo E. ASM variants in the spotlight: A structure-based atlas for unraveling pathogenic mechanisms in lysosomal acid sphingomyelinase. Biochim Biophys Acta Mol Basis Dis 2024:167260. [PMID: 38782304 DOI: 10.1016/j.bbadis.2024.167260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/30/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]
Abstract
Lysosomal acid sphingomyelinase (ASM), a critical enzyme in lipid metabolism encoded by the SMPD1 gene, plays a crucial role in sphingomyelin hydrolysis in lysosomes. ASM deficiency leads to acid sphingomyelinase deficiency, a rare genetic disorder with diverse clinical manifestations, and the protein can be found mutated in other diseases. We employed a structure-based framework to comprehensively understand the functional implications of ASM variants, integrating pathogenicity predictions with molecular insights derived from a molecular dynamics simulation in a lysosomal membrane environment. Our analysis, encompassing over 400 variants, establishes a structural atlas of missense variants of lysosomal ASM, associating mechanistic indicators with pathogenic potential. Our study highlights variants that influence structural stability or exert local and long-range effects at functional sites. To validate our predictions, we compared them to available experimental data on residual catalytic activity in 135 ASM variants. Notably, our findings also suggest applications of the resulting data for identifying cases suited for enzyme replacement therapy. This comprehensive approach enhances the understanding of ASM variants and provides valuable insights for potential therapeutic interventions.
Collapse
Affiliation(s)
- Simone Scrima
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Lambrughi
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elisa Fadda
- Department of Chemistry and Hamilton Institute, Maynooth University, Maynooth, co. Kildare, Ireland
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark; Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark.
| |
Collapse
|
3
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal determinants of catechol-O-methyltransferase gene expression. Mol Syst Biol 2024; 20:481-505. [PMID: 38355921 PMCID: PMC11066095 DOI: 10.1038/s44320-024-00018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 02/16/2024] Open
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase or decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
4
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
5
|
Araujo NA, Bubis J. Analysis of a Novel Peptide That Is Capable of Inhibiting the Enzymatic Activity of the Protein Kinase A Catalytic Subunit-Like Protein from Trypanosoma equiperdum. Protein J 2023; 42:709-727. [PMID: 37713008 DOI: 10.1007/s10930-023-10153-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/15/2023] [Indexed: 09/16/2023]
Abstract
A 26-residue peptide possessing the αN-helix motif of the protein kinase A (PKA) regulatory subunit-like proteins from the Trypanozoom subgenera (VAP26, sequence = VAPYFEKSEDETALILKLLTYNVLFS), was shown to inhibit the enzymatic activity of the Trypanosoma equiperdum PKA catalytic subunit-like protein, in a similar manner that the mammalian heat-stable soluble PKA inhibitor known as PKI. However, VAP26 does not contain the PKI inhibitory sequence. Bioinformatics analyzes of the αN-helix motif from various Trypanozoon PKA regulatory subunit-like proteins suggested that the sequence could form favorable peptide-protein interactions of hydrophobic nature with the PKA catalytic subunit-like protein, which possibly may represent an alternative PKA inhibitory mechanism. The sequence of the αN-helix motif of the Trypanozoon proteins was shown to be highly homologous but significantly divergent from the corresponding αN-helix motifs of their Leishmania and mammalian counterparts. This sequence divergence contrasted with the proposed secondary structure of the αN-helix motif, which appeared conserved in every analyzed regulatory subunit-like protein. In silico mutation experiments at positions I234, L238 and F244 of the αN-helix motif from the Trypanozoon proteins destabilized both the specific motif and the protein. On the contrary, mutations at positions T239 and Y240 stabilized the motif and the protein. These results suggested that the αN-helix motif from the Trypanozoon proteins probably possessed a different evolutionary path than their Leishmania and mammalian counterparts. Moreover, finding stabilizing mutations indicated that new inhibitory peptides may be designed based on the αN-helix motif from the Trypanozoon PKA regulatory subunit-like proteins.
Collapse
Affiliation(s)
- Nelson A Araujo
- Escuela de Ciencias Agroalimentarias, Animales y Ambientales, Universidad de O'Higgins, Campus Colchagua, ruta I-90, Km 3, San Fernando, Chile.
| | - José Bubis
- Unidad de Polimorfismo Genético, Genómica y Proteómica, Dirección de Salud, Fundación Instituto de Estudios Avanzados IDEA, Caracas, 1015-A, Venezuela
- Unidad de Señalización Celular y Bioquímica de Parásitos, Dirección de Salud, Fundación Instituto de Estudios Avanzados IDEA, Caracas, 1015-A, Venezuela
- Departamento de Biología Celular, Universidad Simón Bolívar, Apartado 89.000, Caracas, 1081‑A, Venezuela
| |
Collapse
|
6
|
Luo Y, Ma X, Qiu Y, Lu Y, Shen S, Li Y, Gao H, Chen K, Zhou J, Hu T, Tu L, Zhao H, Li D, Leng F, Gao W, Jiang T, Liu C, Huang L, Wu R, Tong Y. Structural and Catalytic Insight into the Unique Pentacyclic Triterpene Synthase TwOSC. Angew Chem Int Ed Engl 2023; 62:e202313429. [PMID: 37840440 DOI: 10.1002/anie.202313429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/17/2023]
Abstract
The oxidosqualene cyclase (OSC) catalyzed cyclization of the linear substrate (3S)-2,3-oxidosqualene to form diverse pentacyclic triterpenoid (PT) skeletons is one of the most complex reactions in nature. Friedelin has a unique PT skeleton involving a fascinating nine-step cation shuttle run (CSR) cascade rearrangement reaction, in which the carbocation formed at C2 moves to the other side of the skeleton, runs back to C3 to yield a friedelin cation, which is finally deprotonated. However, as crystal structure data of plant OSCs are lacking, it remains unknown why the CSR cascade reactions occur in friedelin biosynthesis, as does the exact catalytic mechanism of the CSR. In this study, we determined the first cryogenic electron microscopy structure of a plant OSC, friedelin synthase, from Tripterygium wilfordii Hook. f (TwOSC). We also performed quantum mechanics/molecular mechanics simulations to reveal the energy profile for the CSR cascade reaction and identify key residues crucial for PT skeleton formation. Furthermore, we semirationally designed two TwOSC mutants, which significantly improved the yields of friedelin and β-amyrin, respectively.
Collapse
Affiliation(s)
- Yunfeng Luo
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Xiaoli Ma
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yufan Qiu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yun Lu
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Siyu Shen
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Yang Li
- School of Pharmaceutical Sciences, Capital Medical University, Beijing, 100069, China
| | - Haiyun Gao
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Kang Chen
- State Key Laboratory Breeding Base of Dao-di Herbs, National Resource Center for Chinese Materia Media, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Jiawei Zhou
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Tianyuan Hu
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Lichan Tu
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Huan Zhao
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Dan Li
- School of Pharmaceutical Sciences, Capital Medical University, Beijing, 100069, China
| | - Faqiang Leng
- School of Pharmaceutical Sciences, Capital Medical University, Beijing, 100069, China
| | - Wei Gao
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Tao Jiang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Changli Liu
- School of Traditional Chinese Medicine, Capital Medical University, Beijing, 100069, China
| | - Luqi Huang
- State Key Laboratory Breeding Base of Dao-di Herbs, National Resource Center for Chinese Materia Media, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yuru Tong
- School of Pharmaceutical Sciences, Capital Medical University, Beijing, 100069, China
| |
Collapse
|
7
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
8
|
Hoskins I, Rao S, Tante C, Cenik C. Integrated multiplexed assays of variant effect reveal cis-regulatory determinants of catechol- O-methyltransferase gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.02.551517. [PMID: 38014045 PMCID: PMC10680568 DOI: 10.1101/2023.08.02.551517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiplexed assays of variant effect are powerful methods to profile the consequences of rare variants on gene expression and organismal fitness. Yet, few studies have integrated several multiplexed assays to map variant effects on gene expression in coding sequences. Here, we pioneered a multiplexed assay based on polysome profiling to measure variant effects on translation at scale, uncovering single-nucleotide variants that increase and decrease ribosome load. By combining high-throughput ribosome load data with multiplexed mRNA and protein abundance readouts, we mapped the cis-regulatory landscape of thousands of catechol-O-methyltransferase (COMT) variants from RNA to protein and found numerous coding variants that alter COMT expression. Finally, we trained machine learning models to map signatures of variant effects on COMT gene expression and uncovered both directional and divergent impacts across expression layers. Our analyses reveal expression phenotypes for thousands of variants in COMT and highlight variant effects on both single and multiple layers of expression. Our findings prompt future studies that integrate several multiplexed assays for the readout of gene expression.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Shilpa Rao
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Charisma Tante
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
9
|
van Loggerenberg W, Sowlati-Hashjin S, Weile J, Hamilton R, Chawla A, Sheykhkarimli D, Gebbia M, Kishore N, Frésard L, Mustajoki S, Pischik E, Di Pierro E, Barbaro M, Floderus Y, Schmitt C, Gouya L, Colavin A, Nussbaum R, Friesema ECH, Kauppinen R, To-Figueras J, Aarsand AK, Desnick RJ, Garton M, Roth FP. Systematically testing human HMBS missense variants to reveal mechanism and pathogenic variation. Am J Hum Genet 2023; 110:1769-1786. [PMID: 37729906 PMCID: PMC10577081 DOI: 10.1016/j.ajhg.2023.08.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 08/15/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023] Open
Abstract
Defects in hydroxymethylbilane synthase (HMBS) can cause acute intermittent porphyria (AIP), an acute neurological disease. Although sequencing-based diagnosis can be definitive, ∼⅓ of clinical HMBS variants are missense variants, and most clinically reported HMBS missense variants are designated as "variants of uncertain significance" (VUSs). Using saturation mutagenesis, en masse selection, and sequencing, we applied a multiplexed validated assay to both the erythroid-specific and ubiquitous isoforms of HMBS, obtaining confident functional impact scores for >84% of all possible amino acid substitutions. The resulting variant effect maps generally agreed with biochemical expectations and provide further evidence that HMBS can function as a monomer. Additionally, the maps implicated specific residues as having roles in active site dynamics, which was further supported by molecular dynamics simulations. Most importantly, these maps can help discriminate pathogenic from benign HMBS variants, proactively providing evidence even for yet-to-be-observed clinical missense variants.
Collapse
Affiliation(s)
- Warren van Loggerenberg
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | | | - Jochen Weile
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Rayna Hamilton
- Advanced Academic Programs, Johns Hopkins University, Washington, DC 20036, USA
| | - Aditya Chawla
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Dayag Sheykhkarimli
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Marinella Gebbia
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | - Nishka Kishore
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
| | | | - Sami Mustajoki
- Research Program in Molecular Medicine, Biomedicum-Helsinki, University of Helsinki, 00290 Helsinki, Finland
| | - Elena Pischik
- Research Program in Molecular Medicine, Biomedicum-Helsinki, University of Helsinki, 00290 Helsinki, Finland
| | - Elena Di Pierro
- Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Unit of Medicine and Metabolic Diseases, 20122 Milano, Italy
| | - Michela Barbaro
- Porphyria Centre Sweden, Centre for Inherited Metabolic Diseases, Karolinska Institutet, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Ylva Floderus
- Porphyria Centre Sweden, Centre for Inherited Metabolic Diseases, Karolinska Institutet, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Caroline Schmitt
- Centre français des porphyries, hôpital Louis-Mourier, Assistance Publique-Hopitaux de Paris, 92701 Colombes, France; Centre de recherche sur l'inflammation, Université Paris Cité, UMR1149 INSERM, 75018 Paris, France
| | - Laurent Gouya
- Centre français des porphyries, hôpital Louis-Mourier, Assistance Publique-Hopitaux de Paris, 92701 Colombes, France; Centre de recherche sur l'inflammation, Université Paris Cité, UMR1149 INSERM, 75018 Paris, France
| | | | | | - Edith C H Friesema
- Porphyria Expertcenter Rotterdam, Center for Lysosomal and Metabolic Diseases, Department of Internal Medicine, Erasmus MC, 3015 Rotterdam, the Netherlands
| | - Raili Kauppinen
- Research Program in Molecular Medicine, Biomedicum-Helsinki, University of Helsinki, 00290 Helsinki, Finland
| | - Jordi To-Figueras
- Biochemistry and Molecular Genetics Department, Hospital Clínic, IDIBAPS, University of Barcelona, 08036 Barcelona, Spain
| | - Aasne K Aarsand
- Norwegian Porphyria Centre, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, 5021 Bergen, Norway
| | - Robert J Desnick
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael Garton
- Institute Biomedical Engineering, University of Toronto, Toronto, ON M5S 3G9, Canada.
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada.
| |
Collapse
|
10
|
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, Pritzel A, Wong LH, Zielinski M, Sargeant T, Schneider RG, Senior AW, Jumper J, Hassabis D, Kohli P, Avsec Ž. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 2023; 381:eadg7492. [PMID: 37733863 DOI: 10.1126/science.adg7492] [Citation(s) in RCA: 189] [Impact Index Per Article: 189.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 08/23/2023] [Indexed: 09/23/2023]
Abstract
The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
Collapse
|
11
|
Jessen-Howard D, Pan Q, Ascher DB. Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases. Int J Mol Sci 2023; 24:10157. [PMID: 37373306 DOI: 10.3390/ijms241210157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Human aldehyde dehydrogenases (ALDHs) comprising 19 isoenzymes play a vital role on both endogenous and exogenous aldehyde metabolism. This NAD(P)-dependent catalytic process relies on the intact structural and functional activity of the cofactor binding, substrate interaction, and the oligomerization of ALDHs. Disruptions on the activity of ALDHs, however, could result in the accumulation of cytotoxic aldehydes, which have been linked with a wide range of diseases, including both cancers as well as neurological and developmental disorders. In our previous works, we have successfully characterised the structure-function relationships of the missense variants of other proteins. We, therefore, applied a similar analysis pipeline to identify potential molecular drivers of pathogenic ALDH missense mutations. Variants data were first carefully curated and labelled as cancer-risk, non-cancer diseases, and benign. We then leveraged various computational biophysical methods to describe the changes caused by missense mutations, informing a bias of detrimental mutations with destabilising effects. Cooperating with these insights, several machine learning approaches were further utilised to investigate the combination of features, revealing the necessity of the conservation of ALDHs. Our work aims to provide important biological perspectives on pathogenic consequences of missense mutations of ALDHs, which could be invaluable resources in the development of cancer treatment.
Collapse
Affiliation(s)
- Dana Jessen-Howard
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| |
Collapse
|
12
|
Fu Y, Bedő J, Papenfuss AT, Rubin AF. Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants. Gigascience 2022; 12:giad073. [PMID: 37721410 PMCID: PMC10506130 DOI: 10.1093/gigascience/giad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/02/2023] [Accepted: 08/23/2023] [Indexed: 09/19/2023] Open
Abstract
BACKGROUND Evaluating the impact of amino acid variants has been a critical challenge for studying protein function and interpreting genomic data. High-throughput experimental methods like deep mutational scanning (DMS) can measure the effect of large numbers of variants in a target protein, but because DMS studies have not been performed on all proteins, researchers also model DMS data computationally to estimate variant impacts by predictors. RESULTS In this study, we extended a linear regression-based predictor to explore whether incorporating data from alanine scanning (AS), a widely used low-throughput mutagenesis method, would improve prediction results. To evaluate our model, we collected 146 AS datasets, mapping to 54 DMS datasets across 22 distinct proteins. CONCLUSIONS We show that improved model performance depends on the compatibility of the DMS and AS assays, and the scale of improvement is closely related to the correlation between DMS and AS results.
Collapse
Affiliation(s)
- Yunfan Fu
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Justin Bedő
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| | - Anthony T Papenfuss
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
- Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia
| | - Alan F Rubin
- The Walter and Eliza Hall Institute of Medical Research, Bioinformatics Division, 1G Royal Pde, Parkville, Victoria 3052, Australia
- The University of Melbourne, Department of Medical Biology, Parkville, Victoria 3010, Australia
| |
Collapse
|
13
|
Anderson CL, Munawar S, Reilly L, Kamp TJ, January CT, Delisle BP, Eckhardt LL. How Functional Genomics Can Keep Pace With VUS Identification. Front Cardiovasc Med 2022; 9:900431. [PMID: 35859585 PMCID: PMC9291992 DOI: 10.3389/fcvm.2022.900431] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 06/09/2022] [Indexed: 01/03/2023] Open
Abstract
Over the last two decades, an exponentially expanding number of genetic variants have been identified associated with inherited cardiac conditions. These tremendous gains also present challenges in deciphering the clinical relevance of unclassified variants or variants of uncertain significance (VUS). This review provides an overview of the advancements (and challenges) in functional and computational approaches to characterize variants and help keep pace with VUS identification related to inherited heart diseases.
Collapse
Affiliation(s)
- Corey L. Anderson
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Saba Munawar
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Louise Reilly
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Timothy J. Kamp
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Craig T. January
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| | - Brian P. Delisle
- Department of Physiology, University of Kentucky College of Medicine, Lexington, KY, United States
| | - Lee L. Eckhardt
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
14
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA. .,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
15
|
Trivedi VD, Chappell TC, Krishna NB, Shetty A, Sigamani GG, Mohan K, Ramesh A, R PK, Nair NU. In-Depth Sequence–Function Characterization Reveals Multiple Pathways to Enhance Enzymatic Activity. ACS Catal 2022. [DOI: 10.1021/acscatal.1c05508] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Vikas D. Trivedi
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Todd C. Chappell
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | | | - Anuj Shetty
- Kcat Enzymatic Private Limited, Bengaluru, Karnataka, India 560005
| | | | - Karishma Mohan
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Athreya Ramesh
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States
| | - Pravin Kumar R
- Kcat Enzymatic Private Limited, Bengaluru, Karnataka, India 560005
| | - Nikhil U. Nair
- Department of Chemical and Biological Engineering, Tufts University, Medford, Massachusetts 02155, United States
| |
Collapse
|
16
|
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23:40-55. [PMID: 34518686 DOI: 10.1038/s41580-021-00407-0] [Citation(s) in RCA: 468] [Impact Index Per Article: 234.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 02/08/2023]
Abstract
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
| | - Lewis Moffat
- Department of Computer Science, University College London, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.
| |
Collapse
|
17
|
Intelligent host engineering for metabolic flux optimisation in biotechnology. Biochem J 2021; 478:3685-3721. [PMID: 34673920 PMCID: PMC8589332 DOI: 10.1042/bcj20210535] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 12/13/2022]
Abstract
Optimising the function of a protein of length N amino acids by directed evolution involves navigating a 'search space' of possible sequences of some 20N. Optimising the expression levels of P proteins that materially affect host performance, each of which might also take 20 (logarithmically spaced) values, implies a similar search space of 20P. In this combinatorial sense, then, the problems of directed protein evolution and of host engineering are broadly equivalent. In practice, however, they have different means for avoiding the inevitable difficulties of implementation. The spare capacity exhibited in metabolic networks implies that host engineering may admit substantial increases in flux to targets of interest. Thus, we rehearse the relevant issues for those wishing to understand and exploit those modern genome-wide host engineering tools and thinking that have been designed and developed to optimise fluxes towards desirable products in biotechnological processes, with a focus on microbial systems. The aim throughput is 'making such biology predictable'. Strategies have been aimed at both transcription and translation, especially for regulatory processes that can affect multiple targets. However, because there is a limit on how much protein a cell can produce, increasing kcat in selected targets may be a better strategy than increasing protein expression levels for optimal host engineering.
Collapse
|
18
|
Harnessing the Genetic Plasticity of Porcine Circovirus Type 2 to Target Suicidal Replication. Viruses 2021; 13:v13091676. [PMID: 34578257 PMCID: PMC8473201 DOI: 10.3390/v13091676] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/17/2021] [Accepted: 08/19/2021] [Indexed: 12/22/2022] Open
Abstract
Porcine circovirus type 2 (PCV2), the causative agent of a wasting disease in weanling piglets, has periodically evolved into several new subtypes since its discovery, indicating that the efficacy of current vaccines can be improved. Although a DNA virus, the mutation rates of PCV2 resemble RNA viruses. The hypothesis that recoding of selected serine and leucine codons in the PCV2b capsid gene could result in stop codons due to mutations occurring during viral replication and thus result in rapid attenuation was tested. Vaccination of weanling pigs with the suicidal vaccine constructs elicited strong virus-neutralizing antibody responses. Vaccination prevented lesions, body-weight loss, and viral replication on challenge with a heterologous PCV2d strain. The suicidal PCV2 vaccine construct was not detectable in the sera of vaccinated pigs at 14 days post-vaccination, indicating that the attenuated vaccine was very safe. Exposure of the modified virus to immune selection pressure with sub-neutralizing levels of antibodies resulted in 5 of the 22 target codons mutating to a stop signal. Thus, the described approach for the rapid attenuation of PCV2 was both effective and safe. It can be readily adapted to newly emerging viruses with high mutation rates to meet the current need for improved platforms for rapid-response vaccines.
Collapse
|