1
|
Redhwan A, Adnan M, Bakhsh HR, Alshammari N, Surti M, Parashar M, Patel M, Patel M, Manjegowda DS, Sharma S. Computational Identification and Functional Analysis of Potentially Pathogenic nsSNPs in the NLRP3 Gene Linked to Alzheimer's Disease. Cell Biochem Biophys 2025; 83:357-375. [PMID: 39167281 DOI: 10.1007/s12013-024-01465-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2024] [Indexed: 08/23/2024]
Abstract
Single Nucleotide Polymorphisms (SNPs) are key in understanding complex diseases. Nonsynonymous single-nucleotide polymorphisms (nsSNPs) occur in protein-coding regions, potentially altering amino acid sequences, protein structure and function. Computational methods are vital for distinguishing deleterious nsSNPs from neutral ones. We investigated the role of NLRP3 gene in neuroinflammation associated with Alzheimer's disease (AD) pathogenesis. A total of 893 missense (nsSNPs) were obtained from the dbSNP database and subjected to rigorous filtering using bioinformatics tools like SIFT, Align GVGD, PolyPhen-2, and PANTHER to identify potentially damaging variants. Of these, 18 nsSNPs were consistently predicted to have deleterious effects across all tools. Notably, 16 of these variants exhibited reduced protein stability, while only 4 were predicted to be buried within the protein structure. Among the identified nsSNPs, rs180177442 (R262L and R262P), rs201875324 (T659I), and rs139814109 (T897M) were classified as high-risk variants due to their significant deleterious impact, probable damaging effects, and association with decreased protein stability. Molecular docking and simulation analyses were conducted utilizing Memantine, a standard drug utilized in AD treatment, to investigate potential interactions with the altered protein structures. Additional clinical and genetic investigations are necessary to elucidate the underlying mechanisms that link NLRP3 polymorphisms with the initiation of AD.
Collapse
Affiliation(s)
- Alya Redhwan
- Department of Health, College of Health and Rehabilitation Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia
| | - Mohd Adnan
- Department of Biology, College of Science, University of Ha'il, Ha'il, Saudi Arabia
| | - Hadeel R Bakhsh
- Department of Rehabilitation Sciences, College of Health and Rehabilitation Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Nawaf Alshammari
- Department of Health, College of Health and Rehabilitation Sciences, Princess Nourah bint Abdulrahman University, Riyadh, 11671, Saudi Arabia
| | - Malvi Surti
- 4Research and Development Cell, Department of Biotechnology, Parul Institute of Applied Sciences, Parul University, Vadodara, Gujarat, India
| | - Mansi Parashar
- 4Research and Development Cell, Department of Biotechnology, Parul Institute of Applied Sciences, Parul University, Vadodara, Gujarat, India
| | - Mirav Patel
- 4Research and Development Cell, Department of Biotechnology, Parul Institute of Applied Sciences, Parul University, Vadodara, Gujarat, India
| | - Mitesh Patel
- 4Research and Development Cell, Department of Biotechnology, Parul Institute of Applied Sciences, Parul University, Vadodara, Gujarat, India
| | - Dinesh Sosalagere Manjegowda
- 4Department of Human Genetics, School of Basic and Applied Sciences, Dayananda Sagar University, Bangalore, 560078, India
| | - Sameer Sharma
- Department of Bioinformatics, BioNome, Bangalore, India.
| |
Collapse
|
2
|
Tondar A, Irfan M, Sánchez-Herrero S, Athar H, Haqqi A, Bepari AK, Liñán LC, Hervás Marin D. In-silico structural and functional analysis of nonsynonymous single nucleotide polymorphisms in human FOLH1 gene. In Silico Pharmacol 2025; 13:32. [PMID: 40018382 PMCID: PMC11861814 DOI: 10.1007/s40203-025-00319-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 02/03/2025] [Indexed: 03/01/2025] Open
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs), also known as missense SNPs, can seriously affect an individual's vulnerability to numerous diseases, including cancer. In this study, we conducted a comprehensive in-silico analysis to examine the structural and functional implications of nsSNPs within the Folate Hydrolase 1(FOLH1) gene, which encodes the Prostate-Specific Membrane Antigen (PSMA). A total of 504 SNPs were retrieved, and after filtering, 15 pathogenic nsSNPs were identified using five different in-silico tools. Three of these SNPs-R255H (rs375565491), R255C (rs201789325), and G168E (rs267602926)-were consistently predicted to be pathogenic across all in-silico tools. MutPred2 was used to predict the structural and functional consequences of the identified mutations. The analysis revealed multiple alterations in the PSMA protein, including changes in helical conformations, glycosylation patterns, transmembrane properties, and solvent accessibility. Furthermore, I-Mutant 2.0 analysis demonstrated a decrease in protein stability for most nsSNPs, except for rs267602926 (G168E), which was predicted to increase stability. Conservation analysis using ConSurf revealed varying degrees of amino acid conservation, with R255H and R255C identified as highly conserved residues, indicating their potential functional and structural significance. Additionally, post-translational modification (PTM) analysis indicated that while phosphorylation and methylation sites remained unchanged, specific glycosylation sites were lost in two pathogenic mutant variants (R255H and R255C), potentially affecting PSMA function and adversely impacting prostate cancer. Our findings highlight the importance of in silico studies to investigate the structural and functional impacts of FOLH1 nsSNPs on the PSMA protein. Such in silico studies can deepen our understanding of the roles of nsSNPs in prostate cancer onset, progression, and drug resistance. Supplementary Information The online version contains supplementary material available at 10.1007/s40203-025-00319-3.
Collapse
Affiliation(s)
- Abtin Tondar
- Department of Computer Science, Multimedia and Telecommunication, Interuniversity Doctoral Program in Bioinformatics, Universitat Oberta de Catalunya, Barcelona (UOC), Spain
- Stanford Deep Data Research Center, Stanford University, Stanford, USA
| | - Muhammad Irfan
- Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Punjab Pakistan
| | - Sergio Sánchez-Herrero
- Department of Computer Science, Multimedia and Telecommunication, Interuniversity Doctoral Program in Bioinformatics, Universitat Oberta de Catalunya, Barcelona (UOC), Spain
| | - Hafsa Athar
- Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Punjab Pakistan
| | - Aleena Haqqi
- Atta-Ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Punjab Pakistan
- School of Medical Laboratory Technology, Minhaj University Lahore (MUL), Lahore, Punjab, Pakistan
| | - Asim Kumar Bepari
- Department of Pharmaceutical Sciences, North South University (NSU), Dhaka, Bangladesh
| | - Laura Calvet Liñán
- Telecommunications and Systems Engineering Department, Universitat Autònoma de Barcelona (UAB), Sabadell, Spain
| | - David Hervás Marin
- Department of Applied Statistics and Operational Research, and Quality Alcoy, Universitat Politècnica de València (UPV), Alcoy, Spain
| |
Collapse
|
3
|
Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. J Chem Inf Model 2025. [PMID: 39993834 DOI: 10.1021/acs.jcim.4c01907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/26/2025]
Abstract
Natural Language Processing (NLP) has revolutionized the way computers are used to study and interact with human languages and is increasingly influential in the study of protein and ligand binding, which is critical for drug discovery and development. This review examines how NLP techniques have been adapted to decode the "language" of proteins and small molecule ligands to predict protein-ligand interactions (PLIs). We discuss how methods such as long short-term memory (LSTM) networks, transformers, and attention mechanisms can leverage different protein and ligand data types to identify potential interaction patterns. Significant challenges are highlighted including the scarcity of high-quality negative data, difficulties in interpreting model decisions, and sampling biases in existing data sets. We argue that focusing on improving data quality, enhancing model robustness, and fostering both collaboration and competition could catalyze future advances in machine-learning-based predictions of PLIs.
Collapse
Affiliation(s)
- James Michels
- Department of Computer and Information Science, University of Mississippi, University, Mississippi 38677, United States
| | - Ramya Bandarupalli
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Amin Ahangar Akbari
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Thai Le
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, United States
| | - Hong Xiao
- Department of Computer and Information Science and Institute for Data Science, University of Mississippi, University, Mississippi 38677, United States
| | - Jing Li
- Department of BioMolecular Sciences, School of Pharmacy, University of Mississippi, University, Mississippi 38677, United States
| | - Erik F Y Hom
- Department of Biology and Center for Biodiversity and Conservation Research, University of Mississippi, University, Mississippi 38677, United States
| |
Collapse
|
4
|
Joshi D, Pradhan S, Sajeed R, Srinivasan R, Rana S. An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance. Hum Genet 2025:10.1007/s00439-025-02727-z. [PMID: 39869148 DOI: 10.1007/s00439-025-02727-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 01/12/2025] [Indexed: 01/28/2025]
Abstract
Variants of uncertain significance (VUS) represent variants that lack sufficient evidence to be confidently associated with a disease, thus posing a challenge in the interpretation of genetic testing results. Here we report an improved method for predicting the VUS of Arylsulfatase A (ARSA) gene as part of the Critical Assessment of Genome Interpretation challenge (CAGI6). Our method uses a transfer learning approach that leverages a pre-trained protein language model to predict the impact of mutations on the activity of the ARSA enzyme, whose deficiency is known to cause a rare genetic disorder, metachromatic leukodystrophy. Our innovative framework combines zero-shot log odds scores and embeddings from the ESM, an evolutionary scale model as features for training a supervised model on gene variants functionally related to the ARSA gene. The zero-shot log odds score feature captures the generic properties of the proteins learned due to its pre-training on millions of sequences in the UniProt data, while the ESM embeddings for the proteins in the ARSA family capture features specific to the family. We also tested our approach on another enzyme, N-acetyl-glucosaminidase (NAGLU), that belongs to the same superfamily as ARSA. Our results demonstrate that the performance of our family models (augmented ESM models) is either comparable or better than the ESM models. The ARSA model compares favorably with the majority of state-of-the-art predictors on area under precision and recall curve (AUPRC) performance metric. However, the NAGLU model outperforms all pathogenicity predictors evaluated in this study on AUPRC metric. The improved AUPRC has relevance in a diagnostic setting where variant prioritization generally entails identifying a small number of pathogenic variants from a larger number of benign variants. Our results also indicate that genes that have sparse or no experimental variant impact data, the family variant data can serve as a proxy training data for making accurate predictions. Attention analysis of active sites and binding sites in ARSA and NAGLU proteins shed light on probable mechanisms of pathogenicity for positions that are highly attended.
Collapse
Affiliation(s)
- Dinesh Joshi
- TCS Research, Tata Consultancy Services, Hyderabad, India
| | | | | | | | - Sadhna Rana
- TCS Research, Tata Consultancy Services, Hyderabad, India.
| |
Collapse
|
5
|
Sarker DK, Ray P, Salam FBA, Uddin SJ. Exploring the impact of deleterious missense nonsynonymous single nucleotide polymorphisms in the DRD4 gene using computational approaches. Sci Rep 2025; 15:3150. [PMID: 39856236 PMCID: PMC11761060 DOI: 10.1038/s41598-025-86916-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 01/15/2025] [Indexed: 01/27/2025] Open
Abstract
Dopamine receptor D4 (DRD4) plays a vital role in regulating various physiological functions, including attention, impulse control, and sleep, as well as being associated with various neurological diseases, including attention deficit hyperactivity disorder, novelty seeking, and so on. However, a comprehensive analysis of harmful nonsynonymous single nucleotide polymorphisms (nsSNPs) of the DRD4 gene and their effects remains unexplored. The aim of this study is to uncover novel damaging missense nsSNPs and their structural and functional effects on the DRD4 receptor. From the dbSNP database, we found 677 nsSNPs, and then we analyzed their functional consequences, disease associations, and effects on protein stability with fifteen in silico tools. Five variants, including L65ICL1P (rs1459150721), V1163.33D (rs761875546), I1293.46S (rs751467198), I1564.46T (rs757732258), and F2015.47S (rs199609858), were identified as the most deleterious mutations that were also present in the conserved region and showed lower interactions with neighboring residues. To comprehensively understand their impact, we docked agonist dopamine and antagonist nemonapride at the binding site of the receptor, followed by 200 ns molecular dynamics simulations. We identified the V116D and I129S mutations as the most damaging, followed by F201S in the dopamine-bound states. Both the V116D and I129S variants demonstrated significantly high RMSD, Rg, and SASA, and low thermodynamic stability. The F201S-dopamine complex exhibited lower compactness and higher motions, along with a significant loss of hydrogen bonds and active site interactions. By contrast, while interacting with nemonapride, the impact of the I156T and L65P mutations was highly deleterious; both showed lower stability, higher flexibility, and higher motions. Additionally, nemonapride significantly lost interactions with the active site, notably in the I156T variant. We also found the V116D-nemonapride complex as structurally damaging; however, the interaction patterns of nemonapride were less altered in the MMPBSA analysis. Overall, this study revealed five novel deleterious variants along with a comprehensive understanding of their effect in the presence of an agonist and antagonist, which could be helpful for understanding disease susceptibility, precision medicine, and developing potential drugs.
Collapse
Affiliation(s)
- Dipto Kumer Sarker
- Pharmacy Discipline, Life Science School, Khulna University, Khulna, 9208, Bangladesh
- Department of Pharmacy, Atish Dipankar University of Science & Technology, Dhaka, 1230, Bangladesh
| | - Pallobi Ray
- Pharmacy Discipline, Life Science School, Khulna University, Khulna, 9208, Bangladesh
| | - Fayad Bin Abdus Salam
- Pharmacy Discipline, Life Science School, Khulna University, Khulna, 9208, Bangladesh
| | - Shaikh Jamal Uddin
- Pharmacy Discipline, Life Science School, Khulna University, Khulna, 9208, Bangladesh.
| |
Collapse
|
6
|
Aspromonte MC, Del Conte A, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE, Leonardi E. CAGI6 ID panel challenge: assessment of phenotype and variant predictions in 415 children with neurodevelopmental disorders (NDDs). Hum Genet 2025:10.1007/s00439-024-02722-w. [PMID: 39786577 DOI: 10.1007/s00439-024-02722-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/13/2024] [Indexed: 01/12/2025]
Abstract
The Genetics of Neurodevelopmental Disorders Lab in Padua provided a new intellectual disability (ID) Panel challenge for computational methods to predict patient phenotypes and their causal variants in the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6). Eight research teams submitted a total of 30 models to predict phenotypes based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. Here, we assess the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and their causal variants. We also evaluated predictions for possible genetic causes in patients without a clear genetic diagnosis. Like the previous ID Panel challenge in CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (Pathogenic/Likely Pathogenic, Variants of Uncertain Significance and Risk Factors) were provided. The phenotypic traits and variant data of 150 patients from the CAGI5 ID Panel Challenge were provided as training set for predictors. The CAGI6 challenge confirms CAGI5 results that predicting phenotypes from gene panel data is highly challenging, with AUC values close to random, and no method able to predict relevant variants with both high accuracy and precision. However, a significant improvement is noted for the best method, with recall increasing from 66% to 82%. Several groups also successfully predicted difficult-to-detect variants, emphasizing the importance of variants initially excluded by the Padua NDD Lab.
Collapse
Affiliation(s)
- Maria Cristina Aspromonte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Department of Women's and Children's Health, University of Padova, Padova, Italy
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Yexian Zhang
- CUHK Shenzhen Research Institute, Shenzhen, China
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, SAR, China
| | - Qi Li
- CUHK Shenzhen Research Institute, Shenzhen, China
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, SAR, China
| | - Maggie Haitian Wang
- CUHK Shenzhen Research Institute, Shenzhen, China
- JC School of Public Health and Primary Care, Chinese University of Hong Kong, Hong Kong, SAR, China
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, 26571, Saudi Arabia
| | - Sumyyah Toonsi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98195, USA
| | - Wesley Surento
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98195, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98195, USA
| | - Uma Sunderam
- Innovation Labs, Tata Consultancy Services, Hyderabad, India
| | | | - Alessandra Murgia
- Department of Women's and Children's Health, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Padova, Italy.
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR- IBIOM), Bari, Italy.
| | - Emanuela Leonardi
- Department of Biomedical Sciences, University of Padova, Padova, Italy.
- Department of Women's and Children's Health, University of Padova, Padova, Italy.
| |
Collapse
|
7
|
Francisco S, Lamacchia L, Turco A, Ermondi G, Caron G, Rossi Sebastiano M. Restoring adapter protein complex 4 function with small molecules: an in silico approach to spastic paraplegia 50. Protein Sci 2025; 34:e70006. [PMID: 39723768 DOI: 10.1002/pro.70006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 11/22/2024] [Accepted: 12/06/2024] [Indexed: 12/28/2024]
Abstract
This study focuses on spastic paraplegia type 50 (SPG50), an adapter protein complex 4 deficiency syndrome caused by mutations in the adapter protein complex 4 subunit mu-1 (AP4M1) gene, and on the downstream alterations of the AP4M1 protein. We applied a battery of heterogeneous computational resources, encompassing two in-house tools described here for the first time, to (a) assess the druggability potential of AP4M1, (b) characterize SPG50-associated mutations and their 3D scenario, (c) identify mutation-tailored drug candidates for SPG50, and (d) elucidate their mechanisms of action by means of structural considerations on homology models of the adapter protein complex 4 core. Altogether, the collected results indicate R367Q as the mutation with the most promising potential of being corrected by small-molecule drugs, and the flavonoid rutin as best candidate for this purpose. Rutin shows promise in rescuing the interaction between the AP4M1 and adapter protein complex subunit beta-1 (AP4B1) subunits by means of a glue-like mode of action. Overall, this approach offers a framework that could be systematically applied to the investigation of mutation-wise molecular mechanisms in different hereditary spastic paraplegias, too.
Collapse
Affiliation(s)
- Serena Francisco
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Lorenzo Lamacchia
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Attilio Turco
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Giuseppe Ermondi
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Giulia Caron
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| | - Matteo Rossi Sebastiano
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Torino, Italy
| |
Collapse
|
8
|
Bhuyan P, Bharali V, Basumatary S, Lego A, Sarma J, Borbora D. Computational analysis of MYC gene variants: structural and functional impact of non-synonymous SNPs. J Appl Genet 2024:10.1007/s13353-024-00929-1. [PMID: 39673052 DOI: 10.1007/s13353-024-00929-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Revised: 11/15/2024] [Accepted: 11/22/2024] [Indexed: 12/15/2024]
Abstract
The MYC proto-oncogene encodes a basic helix-loop-helix leucine zipper (HLH-LZ) transcription factor, acting as a master regulator of genes involved in cellular proliferation, differentiation, and immune surveillance. Dysregulation of MYC is implicated in over 70% of human cancers, driving oncogenic processes through altered gene expression and disrupted cellular functions. Non-synonymous single nucleotide polymorphisms (nsSNPs) within coding regions can significantly impact protein structure and function, leading to abnormal cellular behaviours. This study employed 29 in silico tools to systematically evaluate the deleteriousness of nsSNPs within the MYC gene. These tools assessed the variants' effects on protein structure, disease association, functional domains, and post-translational modification sites. This study investigated if these variants may disrupt protein-protein interactions, critical for MYC's oncogenic roles and normal cellular functions. Our analysis identified 21 nsSNPs that were predicted to be deleterious and pathogenic. These variants correspond to residues D63H, D63Y, P74L, P75L, N375D, N375I, E378K, E378Q, E378A, E378G, E378V, R379P, R381K, R381T, R382W, L392P, R393C, R393H, R393P, L411H, and L411P. Stability assessments indicated that these variants could destabilise the MYC protein. None of the variants affected post-translational modifications. Protein-protein interaction and docking analysis revealed that variants within bHLH and LZ domains may disrupt MYC/MAX binding, potentially impacting MYC's oncogenic activity and transcriptional regulation. This computational assessment enhances our understanding of genetic variations within the MYC gene and prioritises candidate nsSNPs for experimental validation and therapeutic exploration.
Collapse
Affiliation(s)
- Plabita Bhuyan
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India
| | - Varshabi Bharali
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India
| | - Sangju Basumatary
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India
| | - Aido Lego
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India
| | - Juman Sarma
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India
| | - Debasish Borbora
- Department of Biotechnology, Gauhati University, Guwahati, Assam, 781014, India.
- Institutional Biotech Hub, Gauhati University, Guwahati, Assam, 781014, India.
| |
Collapse
|
9
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence. J Mol Biol 2024; 436:168494. [PMID: 39237207 DOI: 10.1016/j.jmb.2024.168494] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 02/09/2024] [Accepted: 02/10/2024] [Indexed: 09/07/2024]
Abstract
Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein-protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at https://e-prsa.biocomp.unibo.it/main/ where users can submit single-sequence and batch jobs.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
10
|
Jain S, Trinidad M, Nguyen TB, Jones K, Neto SD, Ge F, Glagovsky A, Jones C, Moran G, Wang B, Rahimi K, Çalıcı SZ, Cedillo LR, Berardelli S, Özden B, Chen K, Katsonis P, Williams A, Lichtarge O, Rana S, Pradhan S, Srinivasan R, Sajeed R, Joshi D, Faraggi E, Jernigan R, Kloczkowski A, Xu J, Song Z, Özkan S, Padilla N, de la Cruz X, Acuna-Hidalgo R, Grafmüller A, Jiménez Barrón LT, Manfredi M, Savojardo C, Babbi G, Martelli PL, Casadio R, Sun Y, Zhu S, Shen Y, Pucci F, Rooman M, Cia G, Raimondi D, Hermans P, Kwee S, Chen E, Astore C, Kamandula A, Pejaver V, Ramola R, Velyunskiy M, Zeiberg D, Mishra R, Sterling T, Goldstein JL, Lugo-Martinez J, Kazi S, Li S, Long K, Brenner SE, Bakolitsa C, Radivojac P, Suhr D, Suhr T, Clark WT. Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.594558. [PMID: 38798479 PMCID: PMC11118473 DOI: 10.1101/2024.05.16.594558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.
Collapse
Affiliation(s)
- Shantanu Jain
- The Institute for Experiential AI, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Marena Trinidad
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, USA
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Australia
| | | | | | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, Nanjing, China
| | | | | | | | - Boqi Wang
- Department of Bioinformatics and System Biology, University of California, San Diego, La Jolla, CA, USA
| | - Kobra Rahimi
- Department of Computational Biology, School of Life Sciences, Ochanomizu University, Tokyo, Japan
| | - Sümeyra Zeynep Çalıcı
- Department of Genomics, Faculty of Aquatic Science, Istanbul University, Istanbul, Türkiye
| | | | - Silvia Berardelli
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
- enGenome srl, Pavia, Italy
| | - Buse Özden
- Program of Molecular Biotechnology and Genetics, Institute of Science, Istanbul University, Istanbul, Türkiye
| | - Ken Chen
- University of California, Berkeley, Berkeley, CA, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | - Eshel Faraggi
- Research and Information Systems LLC, Indianapolis, IN, USA
- Physics Department, Indiana University-Purdue University, Indianapolis, IN, USA
| | - Robert Jernigan
- Roy J. Carver Department of Biochemistry, Iowa State University, Ames, IA, USA
| | - Andrzej Kloczkowski
- Institute for Genomic Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
| | - Jierui Xu
- University of California, Berkeley, Berkeley, CA, USA
| | | | - Selen Özkan
- Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Natàlia Padilla
- Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Universitat Autònoma de Barcelona, Barcelona, Spain
- Institucío Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | | | | | | | | | | | - Giulia Babbi
- Biocomputing Group, University of Bologna, Bologna, Italy
| | | | - Rita Casadio
- Biocomputing Group, University of Bologna, Bologna, Italy
| | - Yuanfei Sun
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Shaowen Zhu
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yang Shen
- Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Gabriel Cia
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Bruxelles, Belgium
| | | | - Pauline Hermans
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Sofia Kwee
- University of California, Berkeley, Berkeley, CA, USA
| | - Ella Chen
- University of California, Berkeley, Berkeley, CA, USA
| | | | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rashika Ramola
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Michelle Velyunskiy
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Reet Mishra
- Department of Bioengineering, University of California, Berkeley, CA, USA
- Department of Bioengineering, University of California, San Francisco, CA, USA
| | | | - Jennifer L Goldstein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jose Lugo-Martinez
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | - Sindy Li
- University of California, Berkeley, Berkeley, CA, USA
| | - Kinsey Long
- University of California, Berkeley, Berkeley, CA, USA
| | | | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | | | | |
Collapse
|
11
|
Yu H, Luo X. ThermoFinder: A sequence-based thermophilic proteins prediction framework. Int J Biol Macromol 2024; 270:132469. [PMID: 38761901 DOI: 10.1016/j.ijbiomac.2024.132469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024]
Abstract
Thermophilic proteins are important for academic research and industrial processes, and various computational methods have been developed to identify and screen them. However, their performance has been limited due to the lack of high-quality labeled data and efficient models for representing protein. Here, we proposed a novel sequence-based thermophilic proteins prediction framework, called ThermoFinder. The results demonstrated that ThermoFinder outperforms previous state-of-the-art tools on two benchmark datasets, and feature ablation experiments confirmed the effectiveness of our approach. Additionally, ThermoFinder exhibited exceptional performance and consistency across two newly constructed datasets, one of these was specifically constructed for the regression-based prediction of temperature optimum values directly derived from protein sequences. The feature importance analysis, using shapley additive explanations, further validated the advantages of ThermoFinder. We believe that ThermoFinder will be a valuable and comprehensive framework for predicting thermophilic proteins, and we have made our model open source and available on Github at https://github.com/Luo-SynBioLab/ThermoFinder.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; University of Chinese Academy of Sciences, Beijing 100049, China; CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| |
Collapse
|
12
|
Ahmed EM, Elangeeb ME, Adam KM, Abuagla HA, MohamedAhmed AAE, Ali EW, Eltieb EI, Edris AM, Ali Osman HM, Idris ES, Khalil KAA. Computational Analysis of Deleterious nsSNPs in INS Gene Associated with Permanent Neonatal Diabetes Mellitus. J Pers Med 2024; 14:425. [PMID: 38673052 PMCID: PMC11051494 DOI: 10.3390/jpm14040425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/06/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
Insulin gene mutations affect the structure of insulin and are considered a leading cause of neonatal diabetes and permanent neonatal diabetes mellitus PNDM. These mutations can affect the production and secretion of insulin, resulting in inadequate insulin levels and subsequent hyperglycemia. Early discovery or prediction of PNDM can aid in better management and treatment. The current study identified potential deleterious non-synonymous single nucleotide polymorphisms nsSNPs in the INS gene. The analysis of the nsSNPs in the INS gene was conducted using bioinformatics tools by implementing computational algorithms including SIFT, PolyPhen2, SNAP2, SNPs & GO, PhD-SNP, MutPred2, I-Mutant, MuPro, and HOPE tools to investigate the prediction of the potential association between nsSNPs in the INS gene and PNDM. Three mutations, C96Y, P52R, and C96R, were shown to potentially reduce the stability and function of the INS protein. These mutants were subjected to MDSs for structural analysis. Results suggested that these three potential pathogenic mutations may affect the stability and functionality of the insulin protein encoded by the INS gene. Therefore, these changes may influence the development of PNDM. Further researches are required to fully understand the various effects of mutations in the INS gene on insulin synthesis and function. These data can aid in genetic testing for PNDM to evaluate its risk and create treatment and prevention strategies in personalized medicine.
Collapse
Affiliation(s)
- Elsadig Mohamed Ahmed
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Bisha, P.O. Box 551, Bisha 61922, Saudi Arabia; (M.E.E.); (K.M.A.); (H.A.A.); (A.A.E.M.); (E.W.A.); (E.I.E.); (A.M.E.); (H.M.A.O.); (E.S.I.); (K.A.A.K.)
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Cox SN, Lo Giudice C, Lavecchia A, Poeta ML, Chiara M, Picardi E, Pesole G. Mitochondrial and Nuclear DNA Variants in Amyotrophic Lateral Sclerosis: Enrichment in the Mitochondrial Control Region and Sirtuin Pathway Genes in Spinal Cord Tissue. Biomolecules 2024; 14:411. [PMID: 38672428 PMCID: PMC11048214 DOI: 10.3390/biom14040411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/19/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is a progressive disease with prevalent mitochondrial dysfunctions affecting both upper and lower motor neurons in the motor cortex, brainstem, and spinal cord. Despite mitochondria having their own genome (mtDNA), in humans, most mitochondrial genes are encoded by the nuclear genome (nDNA). Our study aimed to simultaneously screen for nDNA and mtDNA genomes to assess for specific variant enrichment in ALS compared to control tissues. Here, we analysed whole exome (WES) and whole genome (WGS) sequencing data from spinal cord tissues, respectively, of 6 and 12 human donors. A total of 31,257 and 301,241 variants in nuclear-encoded mitochondrial genes were identified from WES and WGS, respectively, while mtDNA reads accounted for 73 and 332 variants. Despite technical differences, both datasets consistently revealed a specific enrichment of variants in the mitochondrial Control Region (CR) and in several of these genes directly associated with mitochondrial dynamics or with Sirtuin pathway genes within ALS tissues. Overall, our data support the hypothesis of a variant burden in specific genes, highlighting potential actionable targets for therapeutic interventions in ALS.
Collapse
Affiliation(s)
- Sharon Natasha Cox
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70126 Bari, Italy; (A.L.); (M.L.P.); (E.P.)
| | - Claudio Lo Giudice
- Institute of Biomedical Technologies, National Research Council, 70126 Bari, Italy;
| | - Anna Lavecchia
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70126 Bari, Italy; (A.L.); (M.L.P.); (E.P.)
| | - Maria Luana Poeta
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70126 Bari, Italy; (A.L.); (M.L.P.); (E.P.)
| | - Matteo Chiara
- Department of Biosciences, University of Milan, 20133 Milan, Italy;
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, 70126 Bari, Italy
| | - Ernesto Picardi
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70126 Bari, Italy; (A.L.); (M.L.P.); (E.P.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, 70126 Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari “Aldo Moro”, 70126 Bari, Italy; (A.L.); (M.L.P.); (E.P.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, National Research Council, 70126 Bari, Italy
| |
Collapse
|
14
|
Bai K, Yang L, Xue J, Zhao L, Hao F. Pathogenicity classification of missense mutations based on deep generative model. Comput Biol Med 2024; 170:107980. [PMID: 38242017 DOI: 10.1016/j.compbiomed.2024.107980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/12/2024] [Accepted: 01/12/2024] [Indexed: 01/21/2024]
Abstract
Missense mutations affect the function of human proteins and are closely associated with multiple acute and chronic diseases. The identification of disease-associated missense mutations and their classification for pathogenicity can provide insights into the genetic basis of disease and protein function. This paper proposes MLAE (Method based on LSTM-Ladder AutoEncoder), a deep learning classification model for identifying disease-associated missense mutations and classifying their pathogenicity based on the Variational AutoEncoder (VAE) framework. MLAE overcomes the limitations of the VAE framework by introducing the Ladder structure, combined with LSTM networks. This reduces the loss of original information during the transmission process, thereby making the model more effective in learning. In the experiment, MLAE classified all 27572 possible missense variants of the three input proteins with an average classification AUC of 0.941. This result provides evidence that MLAE is effective in predicting pathogenicity. Additionally, MLAE provides results for multi-label classification, with an average Hamming loss of 0.196, supporting the classification of complex variants. The proposed MLAE method provides an insightful approach to effectively capture amino acid sequence information and accurately predict the pathogenicity of mutations, thereby providing an analytical basis for the study and prevention of related diseases.
Collapse
Affiliation(s)
- Ke Bai
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Lu Yang
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Jian Xue
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Lin Zhao
- Shandong Jianzhu University, Jinan, 250101, PR China
| | - Fanchang Hao
- Shandong Jianzhu University, Jinan, 250101, PR China.
| |
Collapse
|
15
|
Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024; 64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
Collapse
Affiliation(s)
- Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and lnformation Displays & lnstitute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, PR China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou 225100, PR China
| | - Yumeng Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| |
Collapse
|
16
|
Aspromonte MC, Conte AD, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE, Leonardi E. CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs). RESEARCH SQUARE 2023:rs.3.rs-3209168. [PMID: 37577579 PMCID: PMC10418555 DOI: 10.21203/rs.3.rs-3209168/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
Collapse
Affiliation(s)
| | | | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | | | - Qi Li
- CUHK Shenzhen Research Institute, Shenzhen
| | | | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Sumyyah Toonsi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Wesley Surento
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Uma Sunderam
- Innovation Labs, Tata Consultancy Services, Hyderabad
| | | | | | | | | | | |
Collapse
|
17
|
Madeo G, Savojardo C, Manfredi M, Martelli PL, Casadio R. CoCoNat: a novel method based on deep learning for coiled-coil prediction. Bioinformatics 2023; 39:btad495. [PMID: 37540220 PMCID: PMC10425188 DOI: 10.1093/bioinformatics/btad495] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. RESULTS In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states. AVAILABILITY AND IMPLEMENTATION CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.
Collapse
Affiliation(s)
- Giovanni Madeo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
18
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
19
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
20
|
Yu H, Luo X. IPPF-FE: an integrated peptide and protein function prediction framework based on fused features and ensemble models. Brief Bioinform 2023; 24:6834141. [PMID: 36403184 DOI: 10.1093/bib/bbac476] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/23/2022] [Accepted: 10/05/2022] [Indexed: 11/21/2022] Open
Abstract
The prediction of peptide and protein function is important for research and industrial applications, and many machine learning methods have been developed for this purpose. The existing models have encountered many challenges, including the lack of effective and comprehensive features and the limited applicability of each model. Here, we introduce an Integrated Peptide and Protein function prediction Framework based on Fused features and Ensemble models (IPPF-FE), which can accurately capture the relationship between features and labels. The results indicated that IPPF-FE outperformed existing state-of-the-art (SOTA) models on more than 8 different categories of peptide and protein tasks. In addition, t-distributed Stochastic Neighbour Embedding demonstrated the advantages of IPPF-FE. We anticipate that our method will become a versatile tool for peptide and protein prediction tasks and shed light on the future development of related models. The model is open source and available in the GitHub repository https://github.com/Luo-SynBioLab/IPPF-FE.
Collapse
Affiliation(s)
- Han Yu
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|