1
|
Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning. J Mol Biol 2024; 436:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]
Abstract
Amino acid scales are crucial for protein prediction tasks, many of them being curated in the AAindex database. Despite various clustering attempts to organize them and to better understand their relationships, these approaches lack the fine-grained classification necessary for satisfactory interpretability in many protein prediction problems. To address this issue, we developed AAontology-a two-level classification for 586 amino acid scales (mainly from AAindex) together with an in-depth analysis of their relations-using bag-of-word-based classification, clustering, and manual refinement over multiple iterations. AAontology organizes physicochemical scales into 8 categories and 67 subcategories, enhancing the interpretability of scale-based machine learning methods in protein bioinformatics. Thereby it enables researchers to gain a deeper biological insight. We anticipate that AAontology will be a building block to link amino acid properties with protein function and dysfunctions as well as aid informed decision-making in mutation analysis or protein drug design.
Collapse
|
2
|
Abbas Q, Wilhelm M, Kuster B, Poppenberger B, Frishman D. Correction: Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies. BMC Genomics 2024; 25:881. [PMID: 39300357 DOI: 10.1186/s12864-024-10796-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024] Open
|
3
|
Newaz K, Schaefers C, Weisel K, Baumbach J, Frishman D. Prognostic importance of splicing-triggered aberrations of protein complex interfaces in cancer. NAR Genom Bioinform 2024; 6:lqae133. [PMID: 39328266 PMCID: PMC11426328 DOI: 10.1093/nargab/lqae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 08/30/2024] [Accepted: 09/13/2024] [Indexed: 09/28/2024] Open
Abstract
Aberrant alternative splicing (AS) is a prominent hallmark of cancer. AS can perturb protein-protein interactions (PPIs) by adding or removing interface regions encoded by individual exons. Identifying prognostic exon-exon interactions (EEIs) from PPI interfaces can help discover AS-affected cancer-driving PPIs that can serve as potential drug targets. Here, we assessed the prognostic significance of EEIs across 15 cancer types by integrating RNA-seq data with three-dimensional (3D) structures of protein complexes. By analyzing the resulting EEI network we identified patient-specific perturbed EEIs (i.e., EEIs present in healthy samples but absent from the paired cancer samples or vice versa) that were significantly associated with survival. We provide the first evidence that EEIs can be used as prognostic biomarkers for cancer patient survival. Our findings provide mechanistic insights into AS-affected PPI interfaces. Given the ongoing expansion of available RNA-seq data and the number of 3D structurally-resolved (or confidently predicted) protein complexes, our computational framework will help accelerate the discovery of clinically important cancer-promoting AS events.
Collapse
|
4
|
Trgovec-Greif L, Hellinger HJ, Mainguy J, Pfundner A, Frishman D, Kiening M, Webster NS, Laffy PW, Feichtinger M, Rattei T. VOGDB-Database of Virus Orthologous Groups. Viruses 2024; 16:1191. [PMID: 39205165 PMCID: PMC11360334 DOI: 10.3390/v16081191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/21/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Computational models of homologous protein groups are essential in sequence bioinformatics. Due to the diversity and rapid evolution of viruses, the grouping of protein sequences from virus genomes is particularly challenging. The low sequence similarities of homologous genes in viruses require specific approaches for sequence- and structure-based clustering. Furthermore, the annotation of virus genomes in public databases is not as consistent and up to date as for many cellular genomes. To tackle these problems, we have developed VOGDB, which is a database of virus orthologous groups. VOGDB is a multi-layer database that progressively groups viral genes into groups connected by increasingly remote similarity. The first layer is based on pair-wise sequence similarities, the second layer is based on the sequence profile alignments, and the third layer uses predicted protein structures to find the most remote similarity. VOGDB groups allow for more sensitive homology searches of novel genes and increase the chance of predicting annotations or inferring phylogeny. VOGD B uses all virus genomes from RefSeq and partially reannotates them. VOGDB is updated with every RefSeq release. The unique feature of VOGDB is the inclusion of both prokaryotic and eukaryotic viruses in the same clustering process, which makes it possible to explore old evolutionary relationships of the two groups. VOGDB is freely available at vogdb.org under the CC BY 4.0 license.
Collapse
|
5
|
Abbas Q, Wilhelm M, Kuster B, Poppenberger B, Frishman D. Exploring crop genomes: assembly features, gene prediction accuracy, and implications for proteomics studies. BMC Genomics 2024; 25:619. [PMID: 38898442 PMCID: PMC11186247 DOI: 10.1186/s12864-024-10521-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open
Abstract
Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.
Collapse
|
6
|
Aßfalg M, Güner G, Müller SA, Breimann S, Langosch D, Muhle-Goll C, Frishman D, Steiner H, Lichtenthaler SF. Cleavage efficiency of the intramembrane protease γ-secretase is reduced by the palmitoylation of a substrate's transmembrane domain. FASEB J 2024; 38:e23442. [PMID: 38275103 DOI: 10.1096/fj.202302152r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/20/2023] [Accepted: 01/09/2024] [Indexed: 01/27/2024]
Abstract
The intramembrane protease γ-secretase has broad physiological functions, but also contributes to Notch-dependent tumors and Alzheimer's disease. While γ-secretase cleaves numerous membrane proteins, only few nonsubstrates are known. Thus, a fundamental open question is how γ-secretase distinguishes substrates from nonsubstrates and whether sequence-based features or post-translational modifications of membrane proteins contribute to substrate recognition. Using mass spectrometry-based proteomics, we identified several type I membrane proteins with short ectodomains that were inefficiently or not cleaved by γ-secretase, including 'pituitary tumor-transforming gene 1-interacting protein' (PTTG1IP). To analyze the mechanism preventing cleavage of these putative nonsubstrates, we used the validated substrate FN14 as a backbone and replaced its transmembrane domain (TMD), where γ-cleavage occurs, with the one of nonsubstrates. Surprisingly, some nonsubstrate TMDs were efficiently cleaved in the FN14 backbone, demonstrating that a cleavable TMD is necessary, but not sufficient for cleavage by γ-secretase. Cleavage efficiencies varied by up to 200-fold. Other TMDs, including that of PTTG1IP, were still barely cleaved within the FN14 backbone. Pharmacological and mutational experiments revealed that the PTTG1IP TMD is palmitoylated, which prevented cleavage by γ-secretase. We conclude that the TMD sequence of a membrane protein and its palmitoylation can be key factors determining substrate recognition and cleavage efficiency by γ-secretase.
Collapse
|
7
|
Aschenbrenner I, Siebenmorgen T, Lopez A, Parr M, Ruckgaber P, Kerle A, Rührnößl F, Catici D, Haslbeck M, Frishman D, Sattler M, Zacharias M, Feige MJ. Assembly-dependent Structure Formation Shapes Human Interleukin-23 versus Interleukin-12 Secretion. J Mol Biol 2023; 435:168300. [PMID: 37805067 DOI: 10.1016/j.jmb.2023.168300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 09/29/2023] [Accepted: 10/01/2023] [Indexed: 10/09/2023]
Abstract
Interleukin 12 (IL-12) family cytokines connect the innate and adaptive branches of the immune system and regulate immune responses. A unique characteristic of this family is that each member is anα:βheterodimer. For human αsubunits it has been shown that they depend on theirβsubunit for structure formation and secretion from cells. Since subunits are shared within the family and IL-12 as well as IL-23 use the same βsubunit, subunit competition may influence cytokine secretion and thus downstream immunological functions. Here, we rationally design a folding-competent human IL-23α subunit that does not depend on itsβsubunit for structure formation. This engineered variant still forms a functional heterodimeric cytokine but shows less chaperone dependency and stronger affinity in assembly with its βsubunit. It forms IL-23 more efficiently than its natural counterpart, skewing the balance of IL-12 and IL-23 towards more IL-23 formation. Together, our study shows that folding-competent human IL-12 familyαsubunits are obtainable by only few mutations and compatible with assembly and function of the cytokine. These findings might suggest that human α subunits have evolved for assembly-dependent folding to maintain and regulate correct IL-12 family member ratios in the light of subunit competition.
Collapse
|
8
|
Tsai WY, Breimann S, Shen TW, Frishman D. Photoacoustic and absorption spectroscopy imaging analysis of human blood. PLoS One 2023; 18:e0289704. [PMID: 37540721 PMCID: PMC10403132 DOI: 10.1371/journal.pone.0289704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/25/2023] [Indexed: 08/06/2023] Open
Abstract
Photoacoustic and absorption spectroscopy imaging are safe and non-invasive molecular quantification techniques, which do not utilize ionizing radiation and allow for repeated probing of samples without them being contaminated or damaged. Here we assessed the potential of these techniques for measuring biochemical parameters. We investigated the statistical association between 31 time and frequency domain features derived from photoacoustic and absorption spectroscopy signals and 19 biochemical blood parameters. We found that photoacoustic and absorption spectroscopy imaging features are significantly correlated with 14 and 17 individual biochemical parameters, respectively. Moreover, some of the biochemical blood parameters can be accurately predicted based on photoacoustic and absorption spectroscopy imaging features by polynomial regression. In particular, the levels of uric acid and albumin can be accurately explained by a combination of photoacoustic and absorption spectroscopy imaging features (adjusted R-squared > 0.75), while creatinine levels can be accurately explained by the features of the photoacoustic system (adjusted R-squared > 0.80). We identified a number of imaging features that inform on the biochemical blood parameters and can be potentially useful in clinical diagnosis. We also demonstrated that linear and non-linear combinations of photoacoustic and absorption spectroscopy imaging features can accurately predict some of the biochemical blood parameters. These results demonstrate that photoacoustic and absorption spectroscopy imaging systems show promise for future applications in clinical practice.
Collapse
|
9
|
Mao L, Wang Y, An L, Zeng B, Wang Y, Frishman D, Liu M, Chen Y, Tang W, Xu H. Molecular Mechanisms and Clinical Phenotypes of GJB2 Missense Variants. BIOLOGY 2023; 12:biology12040505. [PMID: 37106706 PMCID: PMC10135792 DOI: 10.3390/biology12040505] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/19/2023] [Accepted: 01/20/2023] [Indexed: 03/29/2023]
Abstract
The GJB2 gene is the most common gene responsible for hearing loss (HL) worldwide, and missense variants are the most abundant type. GJB2 pathogenic missense variants cause nonsyndromic HL (autosomal recessive and dominant) and syndromic HL combined with skin diseases. However, the mechanism by which these different missense variants cause the different phenotypes is unknown. Over 2/3 of the GJB2 missense variants have yet to be functionally studied and are currently classified as variants of uncertain significance (VUS). Based on these functionally determined missense variants, we reviewed the clinical phenotypes and investigated the molecular mechanisms that affected hemichannel and gap junction functions, including connexin biosynthesis, trafficking, oligomerization into connexons, permeability, and interactions between other coexpressed connexins. We predict that all possible GJB2 missense variants will be described in the future by deep mutational scanning technology and optimizing computational models. Therefore, the mechanisms by which different missense variants cause different phenotypes will be fully elucidated.
Collapse
|
10
|
Bloemeke N, Meighen‐Berger K, Hitzenberger M, Bach NC, Parr M, Coelho JPL, Frishman D, Zacharias M, Sieber SA, Feige MJ. Intramembrane client recognition potentiates the chaperone functions of calnexin. EMBO J 2022; 41:e110959. [PMID: 36314723 PMCID: PMC9753464 DOI: 10.15252/embj.2022110959] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 10/11/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
One-third of the human proteome is comprised of membrane proteins, which are particularly vulnerable to misfolding and often require folding assistance by molecular chaperones. Calnexin (CNX), which engages client proteins via its sugar-binding lectin domain, is one of the most abundant ER chaperones, and plays an important role in membrane protein biogenesis. Based on mass spectrometric analyses, we here show that calnexin interacts with a large number of nonglycosylated membrane proteins, indicative of additional nonlectin binding modes. We find that calnexin preferentially bind misfolded membrane proteins and that it uses its single transmembrane domain (TMD) for client recognition. Combining experimental and computational approaches, we systematically dissect signatures for intramembrane client recognition by calnexin, and identify sequence motifs within the calnexin TMD region that mediate client binding. Building on this, we show that intramembrane client binding potentiates the chaperone functions of calnexin. Together, these data reveal a widespread role of calnexin client recognition in the lipid bilayer, which synergizes with its established lectin-based substrate binding. Molecular chaperones thus can combine different interaction modes to support the biogenesis of the diverse eukaryotic membrane proteome.
Collapse
|
11
|
Kulandaisamy A, Ridha F, Frishman D, Gromiha MM. Computational approaches for investigating disease-causing mutations in membrane proteins: database development, analysis and prediction. Curr Top Med Chem 2022; 22:1766-1775. [PMID: 35894475 DOI: 10.2174/1568026622666220726124705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 05/27/2022] [Accepted: 06/03/2022] [Indexed: 11/22/2022]
Abstract
Membrane proteins (MPs) play an essential role in a broad range of cellular functions, serving as transporters, enzymes, receptors, and communicators, and about ~60% of membrane proteins are primarily used as drug targets. These proteins adopt either -helical or -barrel structures in the lipid bilayer of a cell/organelle membrane. Mutations in membrane proteins alter their structure and function and may lead to diseases. Accumulation of data on disease-causing and neutral mutations in membrane proteins are available in MutHTP and TMSNP databases, which provide additional features based on sequence, structure, topology, and diseases. These databases have been effectively utilized for analysing sequence and structure-based features in disease-causing and neutral mutations in membrane proteins, exploring disease-causing mechanisms, elucidating the relationship between sequence/structural parameters with diseases, and developing computational tools. Further, machine learning based tools have been developed for identifying disease-causing mutations using diverse features such as evolutionary information, physicochemical properties, atomic contacts, contact potentials, atomic contacts, and contribution of different energetic terms. These membrane protein-specific tools are helpful to characterize the effect of new variants in whole human membrane proteome. In this review, we provide a discussion of the available databases for disease-causing mutations in membrane proteins followed by a statistical analysis of membrane protein mutations using sequence and structural features. In addition, available prediction tools for identifying disease-causing and neutral mutations in membrane proteins will be described with their performances. This comprehensive review provides deep insights to design mutation-specific strategies for different diseases.
Collapse
|
12
|
Liu H, Bergant V, Frishman G, Ruepp A, Pichlmair A, Vincendeau M, Frishman D. Influenza A Virus Infection Reactivates Human Endogenous Retroviruses Associated with Modulation of Antiviral Immunity. Viruses 2022; 14:v14071591. [PMID: 35891571 PMCID: PMC9320126 DOI: 10.3390/v14071591] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/18/2022] [Accepted: 07/20/2022] [Indexed: 02/06/2023] Open
Abstract
Human endogenous retrovirus (HERVs), normally silenced by methylation or mutations, can be reactivated by multiple environmental factors, including infections with exogenous viruses. In this work, we investigated the transcriptional activity of HERVs in human A549 cells infected by two wild-type (PR8M, SC35M) and one mutated (SC35MΔNS1) strains of Influenza A virus (IAVs). We found that the majority of differentially expressed HERVs (DEHERVS) and genes (DEGs) were up-regulated in the infected cells, with the most significantly enriched biological processes associated with the genes differentially expressed exclusively in SC35MΔNS1 being linked to the immune system. Most DEHERVs in PR8M and SC35M are mammalian apparent LTR retrotransposons, while in SC35MΔNS1, more HERV loci from the HERVW9 group were differentially expressed. Furthermore, up-regulated pairs of HERVs and genes in close chromosomal proximity to each other tended to be associated with immune responses, which implies that specific HERV groups might have the potential to trigger specific gene networks and influence host immunological pathways.
Collapse
|
13
|
Mösch A, Frishman D. TCRpair: prediction of functional pairing between HLA-A*02:01-restricted T cell receptor α and β chains. Bioinformatics 2021; 37:3938-3940. [PMID: 34487137 DOI: 10.1093/bioinformatics/btab573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 05/21/2021] [Accepted: 09/01/2021] [Indexed: 11/13/2022] Open
Abstract
SUMMARY The ability of a T cell to recognize foreign peptides is defined by a single α and a single β hypervariable complementarity determining region (CDR3), which together form the T cell receptor (TCR) heterodimer. In ∼30%-35% of T cells, two α chains are expressed at the mRNA level but only one α chain is part of the functional TCR. This effect can also be observed for β chains, although it is less common. The identification of functional α/β chain pairs is instrumental in high-throughput characterization of therapeutic TCRs. TCRpair is the first method that predicts whether an α and β chain pair forms a functional, HLA-A*02:01 specific TCR without requiring the sequence of a recognized peptide. By taking additional amino acids flanking the CDR3 regions into account, TCRpair achieves an AUC of 0.71. AVAILABILITY TCRpair is implemented in Python using TensorFlow 2.0 and is freely available at https://www.github.com/amoesch/TCRpair. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
14
|
Nair VP, Liu H, Ciceri G, Jungverdorben J, Frishman G, Tchieu J, Cederquist GY, Rothenaigner I, Schorpp K, Klepper L, Walsh RM, Kim TW, Cornacchia D, Ruepp A, Mayer J, Hadian K, Frishman D, Studer L, Vincendeau M. Activation of HERV-K(HML-2) disrupts cortical patterning and neuronal differentiation by increasing NTRK3. Cell Stem Cell 2021; 28:1671-1673. [PMID: 34478629 DOI: 10.1016/j.stem.2021.05.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Zakh R, Churkin A, Totzeck F, Parr M, Tuller T, Etzion O, Dahari H, Roggendorf M, Frishman D, Barash D. A Mathematical Analysis of HDV Genotypes: From Molecules to Cells. MATHEMATICS (BASEL, SWITZERLAND) 2021; 9:2063. [PMID: 34540628 PMCID: PMC8445514 DOI: 10.3390/math9172063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Hepatitis D virus (HDV) is classified according to eight genotypes. The various genotypes are included in the HDVdb database, where each HDV sequence is specified by its genotype. In this contribution, a mathematical analysis is performed on RNA sequences in HDVdb. The RNA folding predicted structures of the Genbank HDV genome sequences in HDVdb are classified according to their coarse-grain tree-graph representation. The analysis allows discarding in a simple and efficient way the vast majority of the sequences that exhibit a rod-like structure, which is important for the virus replication, to attempt to discover other biological functions by structure consideration. After the filtering, there remain only a small number of sequences that can be checked for their additional stem-loops besides the main one that is known to be responsible for virus replication. It is found that a few sequences contain an additional stem-loop that is responsible for RNA editing or other possible functions. These few sequences are grouped into two main classes, one that is well-known experimentally belonging to genotype 3 for patients from South America associated with RNA editing, and the other that is not known at present belonging to genotype 7 for patients from Cameroon. The possibility that another function besides virus replication reminiscent of the editing mechanism in HDV genotype 3 exists in HDV genotype 7 has not been explored before and is predicted by eigenvalue analysis. Finally, when comparing native and shuffled sequences, it is shown that HDV sequences belonging to all genotypes are accentuated in their mutational robustness and thermodynamic stability as compared to other viruses that were subjected to such an analysis.
Collapse
|
16
|
Afridi SQ, Usman Z, Donakonda S, Wettengel JM, Velkov S, Beck R, Gerhard M, Knolle P, Frishman D, Protzer U, Moeini H, Hoffmann D. Prolonged norovirus infections correlate to quasispecies evolution resulting in structural changes of surface-exposed epitopes. iScience 2021; 24:102802. [PMID: 34355146 PMCID: PMC8324856 DOI: 10.1016/j.isci.2021.102802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/13/2021] [Accepted: 06/24/2021] [Indexed: 11/19/2022] Open
Abstract
In this study, we analyzed norovirus (NoV) evolution in sequential samples of six chronically infected patients. The capsid gene was amplified from stool samples, and deep sequencing was performed. The role of amino acid flexibility in structural changes and ligand binding was studied with molecular dynamics (MD) simulations. Concentrations of capsid-specific antibodies increased in sequential sera. Capsid sequences accumulated mutations during chronic infection, particularly in the surface-exposed antigenic epitopes A, D, and E. The number of quasispecies increased in infections lasting for >1 month. Interestingly, high genetic complexity and distances were followed by ongoing NoV replication, whereas lower genetic complexity and distances preceded cure. MD simulation revealed that surface-exposed amino acid substitutions of the P2 domain caused fluctuation of blockade epitopes. In conclusion, the capsid protein accumulates numerous mutations during chronic infection; however, only those on the protein surface change the protein structure substantially and may lead to immune escape.
Collapse
|
17
|
Wang Z, Xu H, Xiang T, Liu D, Xu F, Zhao L, Feng Y, Xu L, Liu J, Fang Y, Liu H, Li R, Hu X, Guan J, Liu L, Feng G, Shen Q, Xu H, Frishman D, Tang W, Guo J, Rao J, Shang W. An accessible insight into genetic findings for transplantation recipients with suspected genetic kidney disease. NPJ Genom Med 2021; 6:57. [PMID: 34215756 PMCID: PMC8253729 DOI: 10.1038/s41525-021-00219-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/10/2021] [Indexed: 02/07/2023] Open
Abstract
Determining the etiology of end-stage renal disease (ESRD) constitutes a great challenge in the context of renal transplantation. Evidence is lacking on the genetic findings for adult renal transplant recipients through exome sequencing (ES). Adult patients on kidney transplant waitlist were recruited from 2017 to 2019. Trio-ES was conducted for the families who had multiple affected individuals with nephropathy or clinical suspicion of a genetic kidney disease owing to early onset or extrarenal features. Pathogenic variants were confirmed in 62 from 115 families post sequencing for 421 individuals including 195 health family members as potential living donors. Seventeen distinct genetic disorders were identified confirming the priori diagnosis in 33 (28.7%) families, modified or reclassified the clinical diagnosis in 27 (23.5%) families, and established a diagnosis in two families with ESRD of unknown etiology. In 14.8% of the families, we detected promising variants of uncertain significance in candidate genes associated with renal development or renal disease. Furthermore, we reported the secondary findings of oncogenes in 4.4% of the patients and known single-nucleotide polymorphisms associated with pharmacokinetics in our cohort to predict the drug levels of tacrolimus and mycophenolate. The diagnostic utility of the genetic findings has provided new clinical insight in most families that help with preplanned renal transplantation.
Collapse
|
18
|
Krafczyk R, Qi F, Sieber A, Mehler J, Jung K, Frishman D, Lassak J. Proline codon pair selection determines ribosome pausing strength and translation efficiency in bacteria. Commun Biol 2021; 4:589. [PMID: 34002016 PMCID: PMC8129111 DOI: 10.1038/s42003-021-02115-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 04/16/2021] [Indexed: 02/03/2023] Open
Abstract
The speed of mRNA translation depends in part on the amino acid to be incorporated into the nascent chain. Peptide bond formation is especially slow with proline and two adjacent prolines can even cause ribosome stalling. While previous studies focused on how the amino acid context of a Pro-Pro motif determines the stalling strength, we extend this question to the mRNA level. Bioinformatics analysis of the Escherichia coli genome revealed significantly differing codon usage between single and consecutive prolines. We therefore developed a luminescence reporter to detect ribosome pausing in living cells, enabling us to dissect the roles of codon choice and tRNA selection as well as to explain the genome scale observations. Specifically, we found a strong selective pressure against CCC/U-C, a sequon causing ribosomal frameshifting even under wild-type conditions. On the other hand, translation efficiency as positive evolutionary driving force led to an overrepresentation of CCG. This codon is not only translated the fastest, but the corresponding prolyl-tRNA reaches almost saturating levels. By contrast, CCA, for which the cognate prolyl-tRNA amounts are limiting, is used to regulate pausing strength. Thus, codon selection both in discrete positions but especially in proline codon pairs can tune protein copy numbers.
Collapse
|
19
|
Padmanabhan Nair V, Liu H, Ciceri G, Jungverdorben J, Frishman G, Tchieu J, Cederquist GY, Rothenaigner I, Schorpp K, Klepper L, Walsh RM, Kim TW, Cornacchia D, Ruepp A, Mayer J, Hadian K, Frishman D, Studer L, Vincendeau M. Activation of HERV-K(HML-2) disrupts cortical patterning and neuronal differentiation by increasing NTRK3. Cell Stem Cell 2021; 28:1566-1581.e8. [PMID: 33951478 DOI: 10.1016/j.stem.2021.04.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 03/05/2021] [Accepted: 04/12/2021] [Indexed: 12/20/2022]
Abstract
The biological function and disease association of human endogenous retroviruses (HERVs) are largely elusive. HERV-K(HML-2) has been associated with neurotoxicity, but there is no clear understanding of its role or mechanistic basis. We addressed the physiological functions of HERV-K(HML-2) in neuronal differentiation using CRISPR engineering to activate or repress its expression levels in a human-pluripotent-stem-cell-based system. We found that elevated HERV-K(HML-2) transcription is detrimental for the development and function of cortical neurons. These effects are cell-type-specific, as dopaminergic neurons are unaffected. Moreover, high HERV-K(HML-2) transcription alters cortical layer formation in forebrain organoids. HERV-K(HML-2) transcriptional activation leads to hyperactivation of NTRK3 expression and other neurodegeneration-related genes. Direct activation of NTRK3 phenotypically resembles HERV-K(HML-2) induction, and reducing NTRK3 levels in context of HERV-K(HML-2) induction restores cortical neuron differentiation. Hence, these findings unravel a cell-type-specific role for HERV-K(HML-2) in cortical neuron development.
Collapse
|
20
|
Sun J, Frishman D. Improved sequence-based prediction of interaction sites in α-helical transmembrane proteins by deep learning. Comput Struct Biotechnol J 2021; 19:1512-1530. [PMID: 33815689 PMCID: PMC7985279 DOI: 10.1016/j.csbj.2021.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 03/02/2021] [Accepted: 03/02/2021] [Indexed: 11/10/2022] Open
Abstract
Fast and accurate prediction of transmembrane protein interaction sites. First ever computational survey of interaction sites in membrane proteins. 10-30% of amino acid positions predicted to be involved in interactions.
Interactions between transmembrane (TM) proteins are fundamental for a wide spectrum of cellular functions, but precise molecular details of these interactions remain largely unknown due to the scarcity of experimentally determined three-dimensional complex structures. Computational techniques are therefore required for a large-scale annotation of interaction sites in TM proteins. Here, we present a novel deep-learning approach, DeepTMInter, for sequence-based prediction of interaction sites in α-helical TM proteins based on their topological, physiochemical, and evolutionary properties. Using a combination of ultra-deep residual neural networks with a stacked generalization ensemble technique DeepTMInter significantly outperforms existing methods, achieving the AUC/AUCPR values of 0.689/0.598. Across the main functional families of human transmembrane proteins, the percentage of amino acid sites predicted to be involved in interactions typically ranges between 10% and 25%, and up to 30% in ion channels. DeepTMInter is available as a standalone package at https://github.com/2003100127/deeptminter. The training and benchmarking datasets are available at https://data.mendeley.com/datasets/2t8kgwzp35.
Collapse
|
21
|
Kataka E, Zaucha J, Frishman G, Ruepp A, Frishman D. Author Correction: Edgetic perturbation signatures represent known and novel cancer biomarkers. Sci Rep 2021; 11:3582. [PMID: 33547345 PMCID: PMC7864967 DOI: 10.1038/s41598-021-82646-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
22
|
Zaucha J, Softley CA, Sattler M, Frishman D, Popowicz GM. Deep learning model predicts water interaction sites on the surface of proteins using limited-resolution data. Chem Commun (Camb) 2020; 56:15454-15457. [PMID: 33237041 DOI: 10.1039/d0cc04383d] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We develop a residual deep learning model, hotWater (https://pypi.org/project/hotWater/), to identify key water interaction sites on proteins for binding models and drug discovery. This is tested on new crystal structures, as well as cryo-EM and NMR structures from the PDB and in crystallographic refinement with promising results.
Collapse
|
23
|
Tüshaus J, Müller SA, Kataka ES, Zaucha J, Sebastian Monasor L, Su M, Güner G, Jocher G, Tahirovic S, Frishman D, Simons M, Lichtenthaler SF. An optimized quantitative proteomics method establishes the cell type-resolved mouse brain secretome. EMBO J 2020; 39:e105693. [PMID: 32954517 PMCID: PMC7560198 DOI: 10.15252/embj.2020105693] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/12/2020] [Accepted: 08/14/2020] [Indexed: 12/21/2022] Open
Abstract
To understand how cells communicate in the nervous system, it is essential to define their secretome, which is challenging for primary cells because of large cell numbers being required. Here, we miniaturized secretome analysis by developing the "high-performance secretome protein enrichment with click sugars" (hiSPECS) method. To demonstrate its broad utility, hiSPECS was used to identify the secretory response of brain slices upon LPS-induced neuroinflammation and to establish the cell type-resolved mouse brain secretome resource using primary astrocytes, microglia, neurons, and oligodendrocytes. This resource allowed mapping the cellular origin of CSF proteins and revealed that an unexpectedly high number of secreted proteins in vitro and in vivo are proteolytically cleaved membrane protein ectodomains. Two examples are neuronally secreted ADAM22 and CD200, which we identified as substrates of the Alzheimer-linked protease BACE1. hiSPECS and the brain secretome resource can be widely exploited to systematically study protein secretion and brain function and to identify cell type-specific biomarkers for CNS diseases.
Collapse
|
24
|
Tüshaus J, Kataka ES, Zaucha J, Frishman D, Müller SA, Lichtenthaler SF. Neuronal Differentiation of LUHMES Cells Induces Substantial Changes of the Proteome. Proteomics 2020; 21:e2000174. [PMID: 32951307 DOI: 10.1002/pmic.202000174] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 09/09/2020] [Indexed: 12/14/2022]
Abstract
Neuronal cell lines are important model systems to study mechanisms of neurodegenerative diseases. One example is the Lund Human Mesencephalic (LUHMES) cell line, which can differentiate into dopaminergic-like neurons and is frequently used to study mechanisms of Parkinson's disease and neurotoxicity. Neuronal differentiation of LUHMES cells is commonly verified with selected neuronal markers, but little is known about the proteome-wide protein abundance changes during differentiation. Using mass spectrometry and label-free quantification (LFQ), the proteome of differentiated and undifferentiated LUHMES cells and of primary murine midbrain neurons are compared. Neuronal differentiation induced substantial changes of the LUHMES cell proteome, with proliferation-related proteins being strongly down-regulated and neuronal and dopaminergic proteins, such as L1CAM and α-synuclein (SNCA) being up to 1,000-fold up-regulated. Several of these proteins, including MAPT and SYN1, may be useful as new markers for experimentally validating neuronal differentiation of LUHMES cells. Primary midbrain neurons are slightly more closely related to differentiated than to undifferentiated LUHMES cells, in particular with respect to the abundance of proteins related to neurodegeneration. In summary, the analysis demonstrates that differentiated LUHMES cells are a suitable model for studies on neurodegeneration and provides a resource of the proteome-wide changes during neuronal differentiation. (ProteomeXchange identifier PXD020044).
Collapse
|
25
|
Xiao Y, Zeng B, Berner N, Frishman D, Langosch D, George Teese M. Experimental determination and data-driven prediction of homotypic transmembrane domain interfaces. Comput Struct Biotechnol J 2020; 18:3230-3242. [PMID: 33209210 PMCID: PMC7649602 DOI: 10.1016/j.csbj.2020.09.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 09/22/2020] [Accepted: 09/24/2020] [Indexed: 12/22/2022] Open
Abstract
Homotypic TMD interfaces identified by different techniques share strong similarities. The GxxxG motif is the feature most strongly associated with interfaces. Other features include conservation, polarity, coevolution, and depth in the membrane The role of each of each feature strongly depends on the individual protein. Machine-learning helps predict interfaces from evolutionary sequence data
Interactions between their transmembrane domains (TMDs) frequently support the assembly of single-pass membrane proteins to non-covalent complexes. Yet, the TMD-TMD interactome remains largely uncharted. With a view to predicting homotypic TMD-TMD interfaces from primary structure, we performed a systematic analysis of their physical and evolutionary properties. To this end, we generated a dataset of 50 self-interacting TMDs. This dataset contains interfaces of nine TMDs from bitopic human proteins (Ire1, Armcx6, Tie1, ATP1B1, PTPRO, PTPRU, PTPRG, DDR1, and Siglec7) that were experimentally identified here and combined with literature data. We show that interfacial residues of these homotypic TMD-TMD interfaces tend to be more conserved, coevolved and polar than non-interfacial residues. Further, we suggest for the first time that interface positions are deficient in β-branched residues, and likely to be located deep in the hydrophobic core of the membrane. Overrepresentation of the GxxxG motif at interfaces is strong, but that of (small)xxx(small) motifs is weak. The multiplicity of these features and the individual character of TMD-TMD interfaces, as uncovered here, prompted us to train a machine learning algorithm. The resulting prediction method, THOIPA (www.thoipa.org), excels in the prediction of key interface residues from evolutionary sequence data.
Collapse
|