1
|
Gnanaolivu R, Hart SN. Using AI-predicted protein structures as a reference to predict loss-of-function activity in tumor suppressor breast cancer genes. Comput Struct Biotechnol J 2024; 23:3472-3480. [PMID: 39430403 PMCID: PMC11490748 DOI: 10.1016/j.csbj.2024.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 10/03/2024] [Accepted: 10/03/2024] [Indexed: 10/22/2024] Open
Abstract
Background The loss-of-function (LOF) classification of most missense variants in tumor suppressor breast cancer genes BRCA1, BRCA2, PALB2, and RAD51C remains unclassified and confounds clinical actionability. Classifying these variants is challenging due to their rarity, leading clinicians to rely on in silico predictive methods. Protein stability changes are associated with function, making stability predictors valuable. Stability predictions upon missense variant perturbations require high-resolution protein structures. However, the availability of these high-resolution structures is lacking. This study explores using generative AI to predict high-resolution protein structures, which can then be analyzed with in silico protein stability prediction methods to assess LOF activity in ordered regions of the protein. This study also determines the appropriate in silico protein stability and dedicated in silico missense prediction methods in dbNSFP v4.7 database to predict LOF activity in ordered regions of these four genes. Functional classifications from homology recombination DNA repair (HDR) assays and variant classifications from the ClinVar database provide a reliable dataset for evaluating the performance of these in silico prediction methods. Results Complex AlphaFold2 structures of the BRCA1-C terminal (BRCT) domain and the DNA-binding (DB) domain of BRCA2, analyzed using protein stability tool FoldX predicts LOF activity from missense variants significantly better than experimentally-derived structures in ordered regions. The BRCT domain achieved an Area Under the Curve (AUC)= 0.861 (95 % CI:0.858-0.863) and AUC= 0.842 (95 % CI:0.840-0.845), while the DB domain achieved an AUC= 0.836 (95 % CI:0.8322-0.841), compared to AUC= 0.847 (95 % CI:0.844-0.850) and AUC= 0.835 (95 % CI:0.832-0.837) from the BRCT domain, and AUC= 0.830 (95 % CI:0.821-0.8320) from the DB domain from experimentally-derived structures. Protein stability does not predict LOF activity from missense variants better than dedicated in silico missense predictors. Overall, we find that AlphaMissense ranks highly, with an average AUC= 0.890 (95 % CI 0.886-0.895) from ordered regions across these four cancer genes, compared to all other in silico missense predictors present in the dbNSFP database. Conclusions The study reveals that generative AI protein predicted structures can outperform experimentally-derived structures in evaluating LOF activity from predicted protein stability in ordered regions of genes BRCA1, BRCA2, PALB2 and RAD51C. The study also highlights the predictive performance of AlphaMissense as the premier in silico missense prediction method to predict LOF activity from missense variants in these four tumor suppressor breast cancer genes. The code for this study can be downloaded for free on GitHub (https://github.com/rohandavidg/CarePred).
Collapse
Affiliation(s)
- Rohan Gnanaolivu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Steven N. Hart
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
2
|
Li D, Zhu Y, Zhang W, Liu J, Yang X, Liu Z, Wei D. AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. Interdiscip Sci 2024:10.1007/s12539-024-00662-7. [PMID: 39367992 DOI: 10.1007/s12539-024-00662-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/18/2024] [Accepted: 09/22/2024] [Indexed: 10/07/2024]
Abstract
The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.
Collapse
Affiliation(s)
- Daixi Li
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China.
- Pengcheng Laboratory, Shenzhen, 518055, China.
| | - Yuqi Zhu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Wujie Zhang
- Chemical and Biomolecular Engineering Program, Physics and Chemistry Department, Milwaukee School of Engineering, Milwaukee, 53202, USA
| | - Jing Liu
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Xiaochen Yang
- Institute of Biothermal Engineering, University of Shanghai for Science and Technology, Shanghai, 20093, China
| | - Zhihong Liu
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China
| | - Dongqing Wei
- Pengcheng Laboratory, Shenzhen, 518055, China
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation, Center On Antibacterial Resistances, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
3
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
4
|
Liu P, Cai J, Tian H, Li J, Lu L, Xu M, Zhu X, Fu X, Wang X, Zhong H, Jia R, Ge Y, Zhu Y, Zeng M, Xu J. Characteristics of SARS-CoV-2 Omicron BA.5 variants in Shanghai after ending the zero-COVID policy in December 2022: a clinical and genomic analysis. Front Microbiol 2024; 15:1372078. [PMID: 38605705 PMCID: PMC11007228 DOI: 10.3389/fmicb.2024.1372078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 03/15/2024] [Indexed: 04/13/2024] Open
Abstract
Introduction An unprecedented surge of Omicron infections appeared nationwide in China in December 2022 after the adjustment of the COVID-19 response policy. Here, we report the clinical and genomic characteristics of SARS-CoV-2 infections among children in Shanghai during this outbreak. Methods A total of 64 children with symptomatic COVID-19 were enrolled. SARS-CoV-2 whole genome sequences were obtained using next-generation sequencing (NGS) technology. Patient demographics and clinical characteristics were compared between variants. Phylogenetic tree, mutation spectrum, and the impact of unique mutations on SARS-CoV-2 proteins were analysed in silico. Results The genomic monitoring revealed that the emerging BA.5.2.48 and BF.7.14 were the dominant variants. The BA.5.2.48 infections were more frequently observed to experience vomiting/diarrhea and less frequently present cough compared to the BF.7.14 infections among patients without comorbidities in the study. The high-frequency unique non-synonymous mutations were present in BA.5.2.48 (N:Q241K) and BF.7.14 (nsp2:V94L, nsp12:L247F, S:C1243F, ORF7a:H47Y) with respect to their parental lineages. Of these mutations, S:C1243F, nsp12:L247F, and ORF7a:H47Y protein were predicted to have a deleterious effect on the protein function. Besides, nsp2:V94L and nsp12:L247F were predicted to destabilize the proteins. Discussion Further in vitro to in vivo studies are needed to verify the role of these specific mutations in viral fitness. In addition, continuous genomic monitoring and clinical manifestation assessments of the emerging variants will still be crucial for the effective responses to the ongoing COVID-19 pandemic.
Collapse
Affiliation(s)
- Pengcheng Liu
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Jiehao Cai
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - He Tian
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Jingjing Li
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Lijuan Lu
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Menghua Xu
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Xunhua Zhu
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Xiaomin Fu
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Xiangshi Wang
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Huaqing Zhong
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Ran Jia
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Yanling Ge
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Yanfeng Zhu
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Mei Zeng
- Department of Infectious Diseases, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Jin Xu
- Department of Clinical Laboratory, National Children’s Medical Center, Children’s Hospital of Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China
| |
Collapse
|
5
|
Rodrigues CHM, Portelli S, Ascher DB. Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. Hum Genet 2024:10.1007/s00439-023-02623-4. [PMID: 38227011 DOI: 10.1007/s00439-023-02623-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/18/2023] [Indexed: 01/17/2024]
Abstract
Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
6
|
Zhang R, Jia H, Chang Q, Zhang Z, Peng C, Ma Q, Liang Y, Yang S, Jiao Y. Two novel CHN1 variants identified in Duane retraction syndrome pedigrees disrupt development of ocular motor nerves in zebrafish. J Hum Genet 2024; 69:33-39. [PMID: 37853116 DOI: 10.1038/s10038-023-01201-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/20/2023]
Abstract
Duane retraction syndrome (DRS) is a rare congenital eye movement disorder causing by the dysplasia of abducens nerve, and has highly variable phenotype. MRI can reveal the endophenotype of DRS. Most DRS cases are sporadical and isolated, while some are familial or accompanied by other ocular disorders and systemic congenital abnormalities. CHN1 was the most common causative gene for familial DRS. Until now, 13 missense variants of CHN1 have been reported. In this study, we enrolled two unrelated pedigrees with DRS. Detailed clinical examinations, MRI, and the whole exome sequencing (WES) were performed to reveal their clinical and genetic characteristics. Patients from pedigree-1 presented with isolated DRS, and a novel heterozygous variant c.650 A > G, p. His217Arg was identified in CHN1 gene. Patients from pedigree-2 presented with classic DRS and abnormalities in auricle morphology, and the pedigree segregated another novel heterozygous CHN1 variant c.637 T > C, p. Phe213Leu. A variety of bioinformatics software predicted that the two variants had deleterious or disease-causing effects. After injecting of two mutant CHN1 mRNAs into zebrafish embryos, the dysplasia of ocular motor nerves (OMN) was observed. Our present findings expanded the phenotypic and genotypic spectrum of CHN1 related DRS, as well as provided new insights into the role of CHN1 in OMN development. Genetic testing is strongly recommended for patients with a DRS family history or accompanying systemic congenital abnormalities.
Collapse
Affiliation(s)
- Ranran Zhang
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China
| | - Hongyan Jia
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China
| | - Qinglin Chang
- Department of Radiology, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
| | - Zongrui Zhang
- Department of Radiology, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
| | - Chuzhi Peng
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China
| | - Qian Ma
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China
| | - Yi Liang
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China
| | - Shuyan Yang
- Beijing Municipal Key Laboratory of Child Development and Nutriomics, Capital Institute of Pediatrics, 100020, Beijing, China.
| | - Yonghong Jiao
- Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, 100730, Beijing, China.
- Beijing Ophthalmology and Visual Science Key Lab, 100730, Beijing, China.
| |
Collapse
|
7
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
8
|
Rollo C, Pancotti C, Birolo G, Rossi I, Sanavia T, Fariselli P. Influence of Model Structures on Predictors of Protein Stability Changes from Single-Point Mutations. Genes (Basel) 2023; 14:2228. [PMID: 38137050 PMCID: PMC10742815 DOI: 10.3390/genes14122228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.
Collapse
Affiliation(s)
- Cesare Rollo
- Department of Medical Sciences, University Torino, 10126 Torino, Italy (G.B.); (I.R.); (T.S.); (P.F.)
| | | | | | | | | | | |
Collapse
|
9
|
Saygılı S, Koşukcu C, Baştuğ T, Doğan ÖA, Yılmaz EK, Kalyoncu AU, Ağbaş A, Canpolat N, Çalışkan S, Ozaltin F. A novel homozygous missense variant in TBC1D31 in a consanguineous family with congenital anomalies of the kidney and urinary tract (CAKUT). Clin Genet 2023; 104:679-685. [PMID: 37468454 DOI: 10.1111/cge.14406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/28/2023] [Accepted: 07/08/2023] [Indexed: 07/21/2023]
Abstract
Congenital anomalies of the kidney and urinary tract (CAKUT) is the leading cause of chronic kidney disease in the first three decades of life. Until now, more than 180 monogenic causes of isolated or syndromic CAKUT have been described. In addition, copy number variants (CNV) have also been implicated, however, all of these causative factors only explain a small fraction of patients with CAKUT, suggesting that additional yet-to-be-discovered novel genes are present. Herein, we report three siblings (two of them are monozygotic twin) of a consanguineous family with CAKUT. Whole-exome sequencing identified a homozygous variant in TBC1D31. Three dimensional protein modeling as well as molecular dynamics simulations predicted it as pathogenic. We therefore showed for the first time an association between a homozygous TBC1D31 variant with CAKUT in humans, expanding its genetic spectrum.
Collapse
Affiliation(s)
- Seha Saygılı
- Department of Pediatric Nephrology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Can Koşukcu
- Department of Bioinformatics, Hacettepe University Institute of Health Sciences, Ankara, Türkiye
| | - Turgut Baştuğ
- Department of Biophysics, Faculty of Medicine, Hacettepe University, Ankara, Türkiye
| | - Özlem Akgün Doğan
- Department of Pediatric Genetics, Faculty of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Türkiye
| | - Esra Karabağ Yılmaz
- Department of Pediatric Nephrology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Ayşe Uçar Kalyoncu
- Department of Pediatric Radiology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Ayşe Ağbaş
- Department of Pediatric Nephrology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Nur Canpolat
- Department of Pediatric Nephrology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Salim Çalışkan
- Department of Pediatric Nephrology, Cerrahpasa Faculty of Medicine, Istanbul University-Cerrahpasa, Istanbul, Türkiye
| | - Fatih Ozaltin
- Department of Bioinformatics, Hacettepe University Institute of Health Sciences, Ankara, Türkiye
- Department of Pediatric Nephrology, Hacettepe University Faculty of Medicine, Ankara, Türkiye
- Nephrogenetics Laboratory, Department of Pediatric Nephrology, Hacettepe University Faculty of Medicine, Ankara, Türkiye
- Center for Genomics and Rare Diseases, Hacettepe University, Ankara, Türkiye
| |
Collapse
|
10
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
11
|
Hoffman J, Tan H, Sandoval-Cooper C, de Villiers K, Reed SM. GTExome: Modeling commonly expressed missense mutations in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.14.567143. [PMID: 38014287 PMCID: PMC10680684 DOI: 10.1101/2023.11.14.567143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
A web application, GTExome, is described that quickly identifies, classifies, and models missense mutations in commonly expressed human proteins. GTExome can be used to categorize genomic mutation data with tissue specific expression data from the Genotype-Tissue Expression (GTEx) project. Commonly expressed missense mutations in proteins from a wide range of tissue types can be selected and assessed for modeling suitability. Information about the consequences of each mutation is provided to the user including if disulfide bonds, hydrogen bonds, or salt bridges are broken, buried prolines introduced, buried charges are created or lost, charge is swapped, a buried glycine is replaced, or if the residue that would be removed is a proline in the cis configuration. Also, if the mutation site is in a binding pocket the number of pockets and their volumes are reported. The user can assess this information and then select from available experimental or computationally predicted structures of native proteins to create, visualize, and download a model of the mutated protein using Fast and Accurate Side-chain Protein Repacking (FASPR). For AlphaFold modeled proteins, confidence scores for native proteins are provided. Using this tool, we explored a set of 9,666 common missense mutations from a variety of tissues from GTEx and show that most mutations can be modeled using this tool to facilitate studies of protein-protein and protein-drug interactions. The open-source tool is freely available at https://pharmacogenomics.clas.ucdenver.edu/gtexome/.
Collapse
Affiliation(s)
| | | | | | | | - Scott M. Reed
- Department of Chemistry, Department of Chemistry, University of Colorado Denver, 1151 Arapahoe St., Denver, CO 80204 USA
| |
Collapse
|
12
|
Kurniawan J, Ishida T. Comparing Supervised Learning and Rigorous Approach for Predicting Protein Stability upon Point Mutations in Difficult Targets. J Chem Inf Model 2023; 63:6778-6788. [PMID: 37897811 DOI: 10.1021/acs.jcim.3c00750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
Accurate prediction of protein stability upon a point mutation has important applications in drug discovery and personalized medicine. It remains a challenging issue in computational biology. Existing computational prediction methods, which range from mechanistic to supervised learning approaches, have experienced limited progress over the last few decades. This stagnation is largely due to their heavy reliance on both the quantity and quality of the training data. This is evident in recent state-of-the-art methods that continue to yield substantial errors on two challenging blind test sets: frataxin and p53, with average root-mean-square errors exceeding 3 and 1.5 kcal/mol, respectively, which is still above the theoretical 1 kcal/mol prediction barrier. Rigorous approaches, on the other hand, offer greater potential for accuracy without relying on training data but are computationally demanding and require both wild-type and mutant structure information. Although they showed high accuracy for conserving mutations, their performance is still limited for charge-changing mutation cases. This might be due to the lack of an available mutant structure, often represented by a simplified capped peptide. The recent advances in protein structure prediction methods now make it possible to obtain structures comparable to experimental ones, including complete mutant structure information. In this work, we compare the performance of supervised learning-based methods and rigorous approaches for predicting protein stability on point mutations in difficult targets: frataxin and p53. The rigorous alchemical method significantly surpasses state-of-the-art techniques in terms of both the root-mean-squared error and Pearson correlation coefficient in these two challenging blind test sets. Additionally, we propose an improved alchemical method that employs the pmx double-system/single-box approach to accurately predict the folding free energy change upon both conserving and charge-changing mutations. The enhanced protocol can accurately predict both types of mutations, thereby outperforming existing state-of-the-art methods in overall performance.
Collapse
Affiliation(s)
- Jason Kurniawan
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
13
|
Jessen-Howard D, Pan Q, Ascher DB. Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases. Int J Mol Sci 2023; 24:10157. [PMID: 37373306 DOI: 10.3390/ijms241210157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Human aldehyde dehydrogenases (ALDHs) comprising 19 isoenzymes play a vital role on both endogenous and exogenous aldehyde metabolism. This NAD(P)-dependent catalytic process relies on the intact structural and functional activity of the cofactor binding, substrate interaction, and the oligomerization of ALDHs. Disruptions on the activity of ALDHs, however, could result in the accumulation of cytotoxic aldehydes, which have been linked with a wide range of diseases, including both cancers as well as neurological and developmental disorders. In our previous works, we have successfully characterised the structure-function relationships of the missense variants of other proteins. We, therefore, applied a similar analysis pipeline to identify potential molecular drivers of pathogenic ALDH missense mutations. Variants data were first carefully curated and labelled as cancer-risk, non-cancer diseases, and benign. We then leveraged various computational biophysical methods to describe the changes caused by missense mutations, informing a bias of detrimental mutations with destabilising effects. Cooperating with these insights, several machine learning approaches were further utilised to investigate the combination of features, revealing the necessity of the conservation of ALDHs. Our work aims to provide important biological perspectives on pathogenic consequences of missense mutations of ALDHs, which could be invaluable resources in the development of cancer treatment.
Collapse
Affiliation(s)
- Dana Jessen-Howard
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| |
Collapse
|
14
|
Coppa C, Bazzoli A, Barkhordari M, Contini A. Accelerated Molecular Dynamics for Peptide Folding: Benchmarking Different Combinations of Force Fields and Explicit Solvent Models. J Chem Inf Model 2023; 63:3030-3042. [PMID: 37163419 DOI: 10.1021/acs.jcim.3c00138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Accelerated molecular dynamics (aMD) protocols were assessed on predicting the secondary structure of eight peptides, of which two are helical, three are β-hairpins, and three are disordered. Protocols consisted of combinations of three force fields (ff99SB, ff14SB, ff19SB) and two explicit solvation models (TIP3P and OPC), and were evaluated in two independent aMD simulations, one starting from an extended conformation, the other starting from a misfolded conformation. The results of these analyses indicate that all three combinations performed well on helical peptides. As for β-hairpins, ff19SB performed well with both solvation methods, with a slight preference for the TIP3P solvation model, even though performance was dependent on both peptide sequence and initial conformation. The ff19SB/OPC combination had the best performance on intrinsically disordered peptides. In general, ff14SB/TIP3P suffered the strongest helical bias.
Collapse
Affiliation(s)
- Crescenzo Coppa
- Dipartimento di Scienze Farmaceutiche - Sezione di Chimica Generale e Organica "Alessandro Marchesini", Università degli Studi di Milano, Via Venezian, 21, 20133 Milano, Italy
| | - Andrea Bazzoli
- Dipartimento di Scienze Farmaceutiche - Sezione di Chimica Generale e Organica "Alessandro Marchesini", Università degli Studi di Milano, Via Venezian, 21, 20133 Milano, Italy
| | - Maral Barkhordari
- Dipartimento di Scienze Farmaceutiche - Sezione di Chimica Generale e Organica "Alessandro Marchesini", Università degli Studi di Milano, Via Venezian, 21, 20133 Milano, Italy
| | - Alessandro Contini
- Dipartimento di Scienze Farmaceutiche - Sezione di Chimica Generale e Organica "Alessandro Marchesini", Università degli Studi di Milano, Via Venezian, 21, 20133 Milano, Italy
| |
Collapse
|
15
|
Li C, Hou I, Ma M, Wang G, Bai Y, Liu X. Orthogonal analysis of variants in APOE gene using in-silico approaches reveals novel disrupting variants. FRONTIERS IN BIOINFORMATICS 2023; 3:1122559. [PMID: 37091907 PMCID: PMC10117898 DOI: 10.3389/fbinf.2023.1122559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/31/2023] [Indexed: 04/08/2023] Open
Abstract
Introduction: Alzheimer's disease (AD) is one of the most prominent medical conditions in the world. Understanding the genetic component of the disease can greatly advance our knowledge regarding its progression, treatment and prognosis. Single amino-acid variants (SAVs) in the APOE gene have been widely investigated as a risk factor for AD Studies, including genome-wide association studies, meta-analysis based studies, and in-vivo animal studies, were carried out to investigate the functional importance and pathogenesis potential of APOE SAVs. However, given the high cost of such large-scale or experimental studies, there are only a handful of variants being reported that have definite explanations. The recent development of in-silico analytical approaches, especially large-scale deep learning models, has opened new opportunities for us to probe the structural and functional importance of APOE variants extensively. Method: In this study, we are taking an ensemble approach that simultaneously uses large-scale protein sequence-based models, including Evolutionary Scale Model and AlphaFold, together with a few in-silico functional prediction web services to investigate the known and possibly disease-causing SAVs in APOE and evaluate their likelihood of being functional and structurally disruptive. Results: As a result, using an ensemble approach with little to no prior field-specific knowledge, we reported 5 SAVs in APOE gene to be potentially disruptive, one of which (C112R) was classificed by previous studies as a key risk factor for AD. Discussion: Our study provided a novel framework to analyze and prioritize the functional and structural importance of SAVs for future experimental and functional validation.
Collapse
Affiliation(s)
- Chang Li
- USF Genomics and College of Public Health, University of South Florida, Tampa, FL, United States
| | - Ian Hou
- The John Cooper School, The Woodlands, TX, United States
| | - Mingjia Ma
- Novi High School, Novi, MI, United States
| | - Grace Wang
- Del Norte High School, San Diego, CA, United States
| | - Yongsheng Bai
- Next-Gen Intelligent Science Training, Ann Arbor, MI, United States
- Department of Biology, Eastern Michigan University, Ypsilanti, MI, United States
| | - Xiaoming Liu
- USF Genomics and College of Public Health, University of South Florida, Tampa, FL, United States
| |
Collapse
|
16
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 104] [Impact Index Per Article: 104.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
17
|
Abbasian MH, Mahmanzar M, Rahimian K, Mahdavi B, Tokhanbigli S, Moradi B, Sisakht MM, Deng Y. Global landscape of SARS-CoV-2 mutations and conserved regions. J Transl Med 2023; 21:152. [PMID: 36841805 PMCID: PMC9958328 DOI: 10.1186/s12967-023-03996-w] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 02/15/2023] [Indexed: 02/27/2023] Open
Abstract
BACKGROUND At the end of December 2019, a novel strain of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) disease (COVID-19) has been identified in Wuhan, a central city in China, and then spread to every corner of the globe. As of October 8, 2022, the total number of COVID-19 cases had reached over 621 million worldwide, with more than 6.56 million confirmed deaths. Since SARS-CoV-2 genome sequences change due to mutation and recombination, it is pivotal to surveil emerging variants and monitor changes for improving pandemic management. METHODS 10,287,271 SARS-CoV-2 genome sequence samples were downloaded in FASTA format from the GISAID databases from February 24, 2020, to April 2022. Python programming language (version 3.8.0) software was utilized to process FASTA files to identify variants and sequence conservation. The NCBI RefSeq SARS-CoV-2 genome (accession no. NC_045512.2) was considered as the reference sequence. RESULTS Six mutations had more than 50% frequency in global SARS-CoV-2. These mutations include the P323L (99.3%) in NSP12, D614G (97.6) in S, the T492I (70.4) in NSP4, R203M (62.8%) in N, T60A (61.4%) in Orf9b, and P1228L (50.0%) in NSP3. In the SARS-CoV-2 genome, no mutation was observed in more than 90% of nsp11, nsp7, nsp10, nsp9, nsp8, and nsp16 regions. On the other hand, N, nsp3, S, nsp4, nsp12, and M had the maximum rate of mutations. In the S protein, the highest mutation frequency was observed in aa 508-635(0.77%) and aa 381-508 (0.43%). The highest frequency of mutation was observed in aa 66-88 (2.19%), aa 7-14, and aa 164-246 (2.92%) in M, E, and N proteins, respectively. CONCLUSION Therefore, monitoring SARS-CoV-2 proteomic changes and detecting hot spots mutations and conserved regions could be applied to improve the SARS-CoV-2 diagnostic efficiency and design safe and effective vaccines against emerging variants.
Collapse
Affiliation(s)
- Mohammad Hadi Abbasian
- Department of Medical Genetics, National Institute for Genetic Engineering and Biotechnology, Tehran, Iran
| | - Mohammadamin Mahmanzar
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813, USA
| | - Karim Rahimian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Bahar Mahdavi
- Department of Computer Science, Tarbiat Modares University, Tehran, Iran
| | - Samaneh Tokhanbigli
- Discipline of Pharmacy, Graduate School of Health, University of Technology Sydney, Sydney, Australia
| | - Bahman Moradi
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Mahsa Mollapour Sisakht
- Department of Biochemistry, Erasmus University Medical Center, 2040, 3000 CA, Rotterdam, The Netherlands
| | - Youping Deng
- Department of Quantitative Health Sciences, John A. Burns School of Medicine, University of Hawaii at Manoa, Honolulu, HI, 96813, USA.
| |
Collapse
|
18
|
Chandra A, Tünnermann L, Löfstedt T, Gratz R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023; 12:e82819. [PMID: 36651724 PMCID: PMC9848389 DOI: 10.7554/elife.82819] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.
Collapse
Affiliation(s)
- Abel Chandra
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Laura Tünnermann
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
| | - Tommy Löfstedt
- Department of Computing Science, Umeå UniversityUmeåSweden
| | - Regina Gratz
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural SciencesUmeåSweden
- Department of Forest Ecology and Management, Swedish University of Agricultural SciencesUmeåSweden
| |
Collapse
|
19
|
Boer JC, Pan Q, Holien JK, Nguyen TB, Ascher DB, Plebanski M. A bias of Asparagine to Lysine mutations in SARS-CoV-2 outside the receptor binding domain affects protein flexibility. Front Immunol 2022; 13:954435. [PMID: 36569921 PMCID: PMC9788125 DOI: 10.3389/fimmu.2022.954435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 11/14/2022] [Indexed: 12/14/2022] Open
Abstract
Introduction COVID-19 pandemic has been threatening public health and economic development worldwide for over two years. Compared with the original SARS-CoV-2 strain reported in 2019, the Omicron variant (B.1.1.529.1) is more transmissible. This variant has 34 mutations in its Spike protein, 15 of which are present in the Receptor Binding Domain (RBD), facilitating viral internalization via binding to the angiotensin-converting enzyme 2 (ACE2) receptor on endothelial cells as well as promoting increased immune evasion capacity. Methods Herein we compared SARS-CoV-2 proteins (including ORF3a, ORF7, ORF8, Nucleoprotein (N), membrane protein (M) and Spike (S) proteins) from multiple ancestral strains. We included the currently designated original Variant of Concern (VOC) Omicron, its subsequent emerged variants BA.1, BA2, BA3, BA.4, BA.5, the two currently emerging variants BQ.1 and BBX.1, and compared these with the previously circulating VOCs Alpha, Beta, Gamma, and Delta, to better understand the nature and potential impact of Omicron specific mutations. Results Only in Omicron and its subvariants, a bias toward an Asparagine to Lysine (N to K) mutation was evident within the Spike protein, including regions outside the RBD domain, while none of the regions outside the Spike protein domain were characterized by this mutational bias. Computational structural analysis revealed that three of these specific mutations located in the central core region, contribute to a preference for the alteration of conformations of the Spike protein. Several mutations in the RBD which have circulated across most Omicron subvariants were also analysed, and these showed more potential for immune escape. Conclusion This study emphasizes the importance of understanding how specific N to K mutations outside of the RBD region affect SARS-CoV-2 conformational changes and the need for neutralizing antibodies for Omicron to target a subset of conformationally dependent B cell epitopes.
Collapse
Affiliation(s)
- Jennifer C. Boer
- School of Health and Biomedical Science, Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Jessica K. Holien
- School of Science, Royal Melbourne Institute of Technology (RMIT) University, Melbourne, VIC, Australia
| | - Thanh-Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Magdalena Plebanski
- School of Health and Biomedical Science, Royal Melbourne Institute of Technology, Melbourne, VIC, Australia,*Correspondence: Magdalena Plebanski,
| |
Collapse
|
20
|
Masson P, Lushchekina S. Conformational Stability and Denaturation Processes of Proteins Investigated by Electrophoresis under Extreme Conditions. Molecules 2022; 27:6861. [PMID: 36296453 PMCID: PMC9610776 DOI: 10.3390/molecules27206861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/10/2022] [Accepted: 10/10/2022] [Indexed: 11/17/2022] Open
Abstract
The functional structure of proteins results from marginally stable folded conformations. Reversible unfolding, irreversible denaturation, and deterioration can be caused by chemical and physical agents due to changes in the physicochemical conditions of pH, ionic strength, temperature, pressure, and electric field or due to the presence of a cosolvent that perturbs the delicate balance between stabilizing and destabilizing interactions and eventually induces chemical modifications. For most proteins, denaturation is a complex process involving transient intermediates in several reversible and eventually irreversible steps. Knowledge of protein stability and denaturation processes is mandatory for the development of enzymes as industrial catalysts, biopharmaceuticals, analytical and medical bioreagents, and safe industrial food. Electrophoresis techniques operating under extreme conditions are convenient tools for analyzing unfolding transitions, trapping transient intermediates, and gaining insight into the mechanisms of denaturation processes. Moreover, quantitative analysis of electrophoretic mobility transition curves allows the estimation of the conformational stability of proteins. These approaches include polyacrylamide gel electrophoresis and capillary zone electrophoresis under cold, heat, and hydrostatic pressure and in the presence of non-ionic denaturing agents or stabilizers such as polyols and heavy water. Lastly, after exposure to extremes of physical conditions, electrophoresis under standard conditions provides information on irreversible processes, slow conformational drifts, and slow renaturation processes. The impressive developments of enzyme technology with multiple applications in fine chemistry, biopharmaceutics, and nanomedicine prompted us to revisit the potentialities of these electrophoretic approaches. This feature review is illustrated with published and unpublished results obtained by the authors on cholinesterases and paraoxonase, two physiologically and toxicologically important enzymes.
Collapse
Affiliation(s)
- Patrick Masson
- Biochemical Neuropharmacology Laboratory, Kazan Federal University, Kremlievskaya Str. 18, 420111 Kazan, Russia
| | - Sofya Lushchekina
- Emanuel Institute of Biochemical Physics, Russian Academy of Sciences, Kosygin Str. 4, 119334 Moscow, Russia
| |
Collapse
|