101
|
Waury K, Willemse EAJ, Vanmechelen E, Zetterberg H, Teunissen CE, Abeln S. Bioinformatics tools and data resources for assay development of fluid protein biomarkers. Biomark Res 2022; 10:83. [DOI: 10.1186/s40364-022-00425-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractFluid protein biomarkers are important tools in clinical research and health care to support diagnosis and to monitor patients. Especially within the field of dementia, novel biomarkers could address the current challenges of providing an early diagnosis and of selecting trial participants. While the great potential of fluid biomarkers is recognized, their implementation in routine clinical use has been slow. One major obstacle is the often unsuccessful translation of biomarker candidates from explorative high-throughput techniques to sensitive antibody-based immunoassays. In this review, we propose the incorporation of bioinformatics into the workflow of novel immunoassay development to overcome this bottleneck and thus facilitate the development of novel biomarkers towards clinical laboratory practice. Due to the rapid progress within the field of bioinformatics many freely available and easy-to-use tools and data resources exist which can aid the researcher at various stages. Current prediction methods and databases can support the selection of suitable biomarker candidates, as well as the choice of appropriate commercial affinity reagents. Additionally, we examine methods that can determine or predict the epitope - an antibody’s binding region on its antigen - and can help to make an informed choice on the immunogenic peptide used for novel antibody production. Selected use cases for biomarker candidates help illustrate the application and interpretation of the introduced tools.
Collapse
|
102
|
Binda O, Juillard F, Ducassou JN, Kleijwegt C, Paris G, Didillon A, Baklouti F, Corpet A, Couté Y, Côté J, Lomonte P. SMA-linked SMN mutants prevent phase separation properties and SMN interactions with FMRP family members. Life Sci Alliance 2022; 6:6/1/e202201429. [PMID: 36375840 PMCID: PMC9684302 DOI: 10.26508/lsa.202201429] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
Although recent advances in gene therapy provide hope for spinal muscular atrophy (SMA) patients, the pathology remains the leading genetic cause of infant mortality. SMA is a monogenic pathology that originates from the loss of the SMN1 gene in most cases or mutations in rare cases. Interestingly, several SMN1 mutations occur within the TUDOR methylarginine reader domain of SMN. We hypothesized that in SMN1 mutant cases, SMA may emerge from aberrant protein-protein interactions between SMN and key neuronal factors. Using a BioID proteomic approach, we have identified and validated a number of SMN-interacting proteins, including fragile X mental retardation protein (FMRP) family members (FMRFM). Importantly, SMA-linked SMNTUDOR mutant forms (SMNST) failed to interact with FMRFM In agreement with the recent work, we define biochemically that SMN forms droplets in vitro and these droplets are stabilized by RNA, suggesting that SMN could be involved in the formation of membraneless organelles, such as Cajal nuclear bodies. Finally, we found that SMN and FMRP co-fractionate with polysomes, in an RNA-dependent manner, suggesting a potential role in localized translation in motor neurons.
Collapse
Affiliation(s)
- Olivier Binda
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France .,University of Ottawa, Faculty of Medicine, Department of Cellular and Molecular Medicine, Ottawa, Canada
| | - Franceline Juillard
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France
| | - Julia Novion Ducassou
- Université Grenoble Alpes, INSERM, CEA, UMR BioSanté U1292, CNRS, CEA, FR2048, Grenoble, France
| | - Constance Kleijwegt
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France,Université de Montpellier, CNRS UMR 9002, Institut de Génétique Humaine, Montpellier, France
| | - Geneviève Paris
- University of Ottawa, Faculty of Medicine, Department of Cellular and Molecular Medicine, Ottawa, Canada
| | - Andréanne Didillon
- University of Ottawa, Faculty of Medicine, Department of Cellular and Molecular Medicine, Ottawa, Canada
| | - Faouzi Baklouti
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France
| | - Armelle Corpet
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France
| | - Yohann Couté
- Université Grenoble Alpes, INSERM, CEA, UMR BioSanté U1292, CNRS, CEA, FR2048, Grenoble, France
| | - Jocelyn Côté
- University of Ottawa, Faculty of Medicine, Department of Cellular and Molecular Medicine, Ottawa, Canada
| | - Patrick Lomonte
- Université Claude Bernard Lyon 1, CNRS UMR 5261, INSERM U1315, LabEx DEV2CAN, Institut NeuroMyoGène-Pathophysiology and Genetics of Neuron and Muscle, Team Chromatin Dynamics, Nuclear Domains, Virus, Lyon, France
| |
Collapse
|
103
|
Ismi DP, Pulungan R, Afiahayati. Deep learning for protein secondary structure prediction: Pre and post-AlphaFold. Comput Struct Biotechnol J 2022; 20:6271-6286. [PMID: 36420164 PMCID: PMC9678802 DOI: 10.1016/j.csbj.2022.11.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/13/2022] Open
Abstract
This paper aims to provide a comprehensive review of the trends and challenges of deep neural networks for protein secondary structure prediction (PSSP). In recent years, deep neural networks have become the primary method for protein secondary structure prediction. Previous studies showed that deep neural networks had uplifted the accuracy of three-state secondary structure prediction to more than 80%. Favored deep learning methods, such as convolutional neural networks, recurrent neural networks, inception networks, and graph neural networks, have been implemented in protein secondary structure prediction. Methods adapted from natural language processing (NLP) and computer vision are also employed, including attention mechanism, ResNet, and U-shape networks. In the post-AlphaFold era, PSSP studies focus on different objectives, such as enhancing the quality of evolutionary information and exploiting protein language models as the PSSP input. The recent trend to utilize pre-trained language models as input features for secondary structure prediction provides a new direction for PSSP studies. Moreover, the state-of-the-art accuracy achieved by previous PSSP models is still below its theoretical limit. There are still rooms for improvement to be made in the field.
Collapse
Affiliation(s)
- Dewi Pramudi Ismi
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
- Department of Infomatics, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
| | - Reza Pulungan
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Afiahayati
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
104
|
An J, Weng X. Collectively encoding protein properties enriches protein language models. BMC Bioinformatics 2022; 23:467. [DOI: 10.1186/s12859-022-05031-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/31/2022] [Indexed: 11/10/2022] Open
Abstract
AbstractPre-trained natural language processing models on a large natural language corpus can naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain tasks. However, few studies focused on enriching such protein language models by jointly learning protein properties from strongly-correlated protein tasks. Here we elaborately designed a multi-task learning (MTL) architecture, aiming to decipher implicit structural and evolutionary information from three sequence-level classification tasks for protein family, superfamily and fold. Considering the co-existing contextual relevance between human words and protein language, we employed BERT, pre-trained on a large natural language corpus, as our backbone to handle protein sequences. More importantly, the encoded knowledge obtained in the MTL stage can be well transferred to more fine-grained downstream tasks of TAPE. Experiments on structure- or evolution-related applications demonstrate that our approach outperforms many state-of-the-art Transformer-based protein models, especially in remote homology detection.
Collapse
|
105
|
Saxena S, Krishna Murthy TP, Chandrashekhar CR, Patil LS, Aditya A, Shukla R, Yadav AK, Singh TR, Samantaray M, Ramaswamy A. A bioinformatics approach to the identification of novel deleterious mutations of human TPMT through validated screening and molecular dynamics. Sci Rep 2022; 12:18872. [PMID: 36344599 PMCID: PMC9640560 DOI: 10.1038/s41598-022-23488-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 11/01/2022] [Indexed: 11/09/2022] Open
Abstract
Polymorphisms of Thiopurine S-methyltransferase (TPMT) are known to be associated with leukemia, inflammatory bowel diseases, and more. The objective of the present study was to identify novel deleterious missense SNPs of TPMT through a comprehensive in silico protocol. The initial SNP screening protocol used to identify deleterious SNPs from the pool of all TPMT SNPs in the dbSNP database yielded an accuracy of 83.33% in identifying extremely dangerous variants. Five novel deleterious missense SNPs (W33G, W78R, V89E, W150G, and L182P) of TPMT were identified through the aforementioned screening protocol. These 5 SNPs were then subjected to conservation analysis, interaction analysis, oncogenic and phenotypic analysis, structural analysis, PTM analysis, and molecular dynamics simulations (MDS) analysis to further assess and analyze their deleterious nature. Oncogenic analysis revealed that all five SNPs are oncogenic. MDS analysis revealed that all SNPs are deleterious due to the alterations they cause in the binding energy of the wild-type protein. Plasticity-induced instability caused by most of the mutations as indicated by the MDS results has been hypothesized to be the reason for this alteration. While in vivo or in vitro protocols are more conclusive, they are often more challenging and expensive. Hence, future research endeavors targeted at TPMT polymorphisms and/or their consequences in relevant disease progressions or treatments, through in vitro or in vivo means can give a higher priority to these SNPs rather than considering the massive pool of all SNPs of TPMT.
Collapse
Affiliation(s)
- Sidharth Saxena
- Department of Biotechnology, Ramaiah Institute of Technology, Bengaluru, Karnataka, 560054, India
| | - T P Krishna Murthy
- Department of Biotechnology, Ramaiah Institute of Technology, Bengaluru, Karnataka, 560054, India.
| | - C R Chandrashekhar
- Department of Biotechnology, Ramaiah Institute of Technology, Bengaluru, Karnataka, 560054, India
| | - Lavan S Patil
- Department of Biotechnology, Ramaiah Institute of Technology, Bengaluru, Karnataka, 560054, India
| | - Abhinav Aditya
- Department of Biotechnology, Ramaiah Institute of Technology, Bengaluru, Karnataka, 560054, India
| | - Rohit Shukla
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Solan, Himachal Pradesh, 173234, India
| | - Arvind Kumar Yadav
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Solan, Himachal Pradesh, 173234, India
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Solan, Himachal Pradesh, 173234, India
| | - Mahesh Samantaray
- Department of Bioinformatics, Pondicherry University, Pondicherry, 605014, India
| | - Amutha Ramaswamy
- Department of Bioinformatics, Pondicherry University, Pondicherry, 605014, India
| |
Collapse
|
106
|
An D, Song L, Li Y, Shen L, Miao P, Wang Y, Liu D, Jiang L, Wang F, Yang J. Comprehensive analysis of lysine lactylation in Frankliniella occidentalis. Front Genet 2022; 13:1014225. [PMID: 36386791 PMCID: PMC9663987 DOI: 10.3389/fgene.2022.1014225] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Western flower thrips (Frankliniella occidentalis) are among the most important pests globally that transmit destructive plant viruses and infest multiple commercial crops. Lysine lactylation (Klac) is a recently discovered novel post-translational modification (PTM). We used liquid chromatography-mass spectrometry to identify the global lactylated proteome of F. occidentalis, and further enriched the identified lactylated proteins using Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO). In the present study, we identified 1,458 Klac sites in 469 proteins from F. occidentalis. Bioinformatics analysis showed that Klac was widely distributed in F. occidentalis proteins, and these Klac modified proteins participated in multiple biological processes. GO and KEGG enrichment analysis revealed that Klac proteins were significantly enriched in multiple cellular compartments and metabolic pathways, such as the ribosome and carbon metabolism pathways. Two Klac proteins were found to be involved in the regulation of the TSWV (Tomato spotted wilt virus) transmission in F. occidentalis. This study provides a systematic report and a rich dataset of lactylation in F. occidentalis proteome for potential studies on the Klac protein of this notorious pest.
Collapse
Affiliation(s)
- Dong An
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
| | - Liyun Song
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
| | - Ying Li
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
| | - Lili Shen
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
| | - Pu Miao
- Luoyang City Company of Henan Province Tobacco Company, Luoyang, China
| | - Yujie Wang
- Luoyang City Company of Henan Province Tobacco Company, Luoyang, China
| | - Dongyang Liu
- Liangshan State Company of Sichuan Province Tobacco Company, Mile, China
| | - Lianqiang Jiang
- Liangshan State Company of Sichuan Province Tobacco Company, Mile, China
| | - Fenglong Wang
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
- *Correspondence: Fenglong Wang, ; Jinguang Yang,
| | - Jinguang Yang
- Key Laboratory of Tobacco Pest Monitoring, Controlling and Integrated Management, Tobacco Research Institute of Chinese Academy of Agricultural Sciences, Qingdao, China
- *Correspondence: Fenglong Wang, ; Jinguang Yang,
| |
Collapse
|
107
|
Amos RA, Atmodjo MA, Huang C, Gao Z, Venkat A, Taujale R, Kannan N, Moremen KW, Mohnen D. Polymerization of the backbone of the pectic polysaccharide rhamnogalacturonan I. NATURE PLANTS 2022; 8:1289-1303. [PMID: 36357524 PMCID: PMC10115348 DOI: 10.1038/s41477-022-01270-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 10/05/2022] [Indexed: 06/10/2023]
Abstract
Rhamnogalacturonan I (RG-I) is a major plant cell wall pectic polysaccharide defined by its repeating disaccharide backbone structure of [4)-α-D-GalA-(1,2)-α-L-Rha-(1,]. A family of RG-I:Rhamnosyltransferases (RRT) has previously been identified, but synthesis of the RG-I backbone has not been demonstrated in vitro because the identity of Rhamnogalacturonan I:Galaturonosyltransferase (RG-I:GalAT) was unknown. Here a putative glycosyltransferase, At1g28240/MUCI70, is shown to be an RG-I:GalAT. The name RGGAT1 is proposed to reflect the catalytic activity of this enzyme. When incubated together with the rhamnosyltransferase RRT4, the combined activities of RGGAT1 and RRT4 result in elongation of RG-I acceptors in vitro into a polymeric product. RGGAT1 is a member of a new GT family categorized as GT116, which does not group into existing GT-A clades and is phylogenetically distinct from the GALACTURONOSYLTRANSFERASE (GAUT) family of GalA transferases that synthesize the backbone of the pectin homogalacturonan. RGGAT1 has a predicted GT-A fold structure but employs a metal-independent catalytic mechanism that is rare among glycosyltransferases with this fold type. The identification of RGGAT1 and the 8-member Arabidopsis GT116 family provides a new avenue for studying the mechanism of RG-I synthesis and the function of RG-I in plants.
Collapse
Affiliation(s)
- Robert A Amos
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Melani A Atmodjo
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Chin Huang
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Zhongwei Gao
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Aarya Venkat
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Rahil Taujale
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Natarajan Kannan
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
| | - Kelley W Moremen
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA
| | - Debra Mohnen
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA.
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA.
| |
Collapse
|
108
|
Zhang H, Xu MS, Fan X, Chung WK, Shen Y. Predicting functional effect of missense variants using graph attention neural networks. NAT MACH INTELL 2022; 4:1017-1028. [PMID: 37484202 PMCID: PMC10361701 DOI: 10.1038/s42256-022-00561-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 10/07/2022] [Indexed: 11/16/2022]
Abstract
Accurate prediction of damaging missense variants is critically important for interpreting a genome sequence. Although many methods have been developed, their performance has been limited. Recent advances in machine learning and the availability of large-scale population genomic sequencing data provide new opportunities to considerably improve computational predictions. Here we describe the graphical missense variant pathogenicity predictor (gMVP), a new method based on graph attention neural networks. Its main component is a graph with nodes that capture predictive features of amino acids and edges weighted by co-evolution strength, enabling effective pooling of information from the local protein context and functionally correlated distal positions. Evaluation of deep mutational scan data shows that gMVP outperforms other published methods in identifying damaging variants in TP53, PTEN, BRCA1 and MSH2. Furthermore, it achieves the best separation of de novo missense variants in neuro developmental disorder cases from those in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Xiao Fan
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA
| |
Collapse
|
109
|
Pacheco-Olvera DL, Saint Remy-Hernández S, García-Valeriano MG, Rivera-Hernández T, López-Macías C. Bioinformatic Analysis of B- and T-cell Epitopes from SARS-CoV-2 Structural Proteins and their Potential Cross-reactivity with Emerging Variants and other Human Coronaviruses. Arch Med Res 2022; 53:694-710. [PMID: 36336501 PMCID: PMC9633039 DOI: 10.1016/j.arcmed.2022.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 08/23/2022] [Accepted: 10/21/2022] [Indexed: 11/06/2022]
Abstract
BACKGROUND The mutations in SARS-CoV-2 variants of concern (VOC) facilitate the virus' escape from the neutralizing antibodies induced by vaccines. However, the protection from hospitalization and death is not significantly diminished. Both vaccine boosters and infection improve immune responses and provide protection, suggesting that conserved and/or cross-reactive epitopes could be involved. While several important T- and B-cell epitopes have been identified, mainly in the S protein, the M and N proteins and their potential cross-reactive epitopes with other coronaviruses remain largely unexplored. AIMS To identify and map new potential B- and T-cell epitopes within the SARS-CoV-2 S, M and N proteins, as well as cross-reactive epitopes with human coronaviruses. METHODS Different bioinformatics tools were used to: i) Identify new and compile previously-reported B-and T-cell epitopes from SARS-CoV-2 S, M and N proteins; ii) Determine the mutations in S protein from VOC that affect B- and T-cell epitopes, and; iii) Identify cross-reactive epitopes with coronaviruses relevant to human health. RESULTS New, potential B- and T-cell epitopes from S, M and N proteins as well as cross-reactive epitopes with other coronaviruses were found and mapped within the proteins' structures. CONCLUSION Numerous potential B- and T-cell epitopes were found in S, M and N proteins, some of which are conserved between coronaviruses. VOCs present mutations within important epitopes in the S protein; however, a significant number of other epitopes remain unchanged. The epitopes identified here may contribute to augmenting the protective response to SARS-CoV-2 and its variants induced by infection and/or vaccination, and may also be used for the rational design of novel broad-spectrum coronavirus vaccines.
Collapse
Affiliation(s)
- Diana Laura Pacheco-Olvera
- Unidad de Investigación Médica en Inmunoquímica, Hospital de Especialidades del Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Stephanie Saint Remy-Hernández
- Unidad de Investigación Médica en Inmunoquímica, Hospital de Especialidades del Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - María Guadalupe García-Valeriano
- Unidad de Investigación Médica en Inmunoquímica, Hospital de Especialidades del Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México
| | - Tania Rivera-Hernández
- Unidad de Investigación Médica en Inmunoquímica, Hospital de Especialidades del Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México; Consejo Nacional de Ciencia y Tecnología, Ciudad de México, México
| | - Constantino López-Macías
- Unidad de Investigación Médica en Inmunoquímica, Hospital de Especialidades del Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Ciudad de México, México.
| |
Collapse
|
110
|
Thibau A, Vaca DJ, Bagowski M, Hipp K, Bender D, Ballhorn W, Linke D, Kempf VAJ. Adhesion of Bartonella henselae to Fibronectin Is Mediated via Repetitive Motifs Present in the Stalk of Bartonella Adhesin A. Microbiol Spectr 2022; 10:e0211722. [PMID: 36165788 PMCID: PMC9602544 DOI: 10.1128/spectrum.02117-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/06/2022] [Indexed: 12/31/2022] Open
Abstract
Adhesion to host cells is the first and most crucial step in infections with pathogenic Gram-negative bacteria and is often mediated by trimeric autotransporter adhesins (TAAs). Bartonella henselae targets the extracellular matrix glycoprotein fibronectin (Fn) via the Bartonella adhesin A (BadA) attaching the bacteria to the host cell. The TAA BadA is characterized by a highly repetitive passenger domain consisting of 30 neck/stalk domains with various degrees of similarity. To elucidate the motif sequences mediating Fn binding, we generated 10 modified BadA constructs and verified their expression via Western blotting, confocal laser scanning, and electron microscopy. We analyzed their ability to bind human plasma Fn using quantitative whole-cell enzyme-linked immunosorbent assays (ELISAs) and fluorescence microscopy. Polyclonal antibodies targeting a 15-mer amino acid motif sequence proved to reduce Fn binding. We suggest that BadA adheres to Fn in a cumulative effort with quick saturation primarily via unpaired β-strands appearing in motifs repeatedly present throughout the neck/stalk region. In addition, we demonstrated that the length of truncated BadA constructs correlates with the immunoreactivity of human patient sera. The identification of BadA-Fn binding regions will support the development of new "antiadhesive" compounds inhibiting the initial adherence of B. henselae and other TAA-expressing pathogens to host cells. IMPORTANCE Trimeric autotransporter adhesins (TAAs) are important virulence factors and are widely present in various pathogenic Gram-negative bacteria. TAA-expressing bacteria cause a wide spectrum of human diseases, such as cat scratch disease (Bartonella henselae), enterocolitis (Yersinia enterocolitica), meningitis (Neisseria meningitis), and bloodstream infections (multidrug-resistant Acinetobacter baumannii). TAA-targeted antiadhesive strategies (against, e.g., Bartonella adhesin A [BadA], Yersinia adhesin A [YadA], Neisseria adhesin A [NadA], and Acinetobacter trimeric autotransporter [Ata]) might represent a universal strategy to counteract such bacterial infections. BadA is one of the best characterized TAAs, and because of its high number of (sub)domains, it serves as an attractive adhesin to study the domain-function relationship of TAAs in the infection process. The identification of common binding motifs between TAAs (here, BadA) and their major binding partner (here, fibronectin) provides a basis toward the design of novel "antiadhesive" compounds preventing the initial adherence of Gram-negative bacteria in infections.
Collapse
Affiliation(s)
- Arno Thibau
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| | - Diana J. Vaca
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| | - Marlene Bagowski
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| | - Katharina Hipp
- Electron Microscopy Facility, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Daniela Bender
- Federal Institute for Vaccines and Biomedicines, Department of Virology, Paul-Ehrlich-Institut, Langen, Germany
| | - Wibke Ballhorn
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Volkhard A. J. Kempf
- Institute for Medical Microbiology and Infection Control, University Hospital, Goethe University, Frankfurt, Germany
| |
Collapse
|
111
|
Porosk L, Langel Ü. Approaches for evaluation of novel CPP-based cargo delivery systems. Front Pharmacol 2022; 13:1056467. [PMID: 36339538 PMCID: PMC9634181 DOI: 10.3389/fphar.2022.1056467] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 10/10/2022] [Indexed: 08/05/2023] Open
Abstract
Cell penetrating peptides (CPPs) can be broadly defined as relatively short synthetic, protein derived or chimeric peptides. Their most remarkable property is their ability to cross cell barriers and facilitate the translocation of cargo, such as drugs, nucleic acids, peptides, small molecules, dyes, and many others across the plasma membrane. Over the years there have been several approaches used, adapted, and developed for the evaluation of CPP efficacies as delivery systems, with the fluorophore attachment as the most widely used approach. It has become progressively evident, that the evaluation method, in order to lead to successful outcome, should concede with the specialties of the delivery. For characterization and assessment of CPP-cargo a combination of research tools of chemistry, physics, molecular biology, engineering, and other fields have been applied. In this review, we summarize the diverse, in silico, in vitro and in vivo approaches used for evaluation and characterization of CPP-based cargo delivery systems.
Collapse
Affiliation(s)
- Ly Porosk
- Laboratory of Drug Delivery, Institute of Technology, Faculty of Science and Technology, University of Tartu, Tartu, Estonia
| | - Ülo Langel
- Laboratory of Drug Delivery, Institute of Technology, Faculty of Science and Technology, University of Tartu, Tartu, Estonia
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
112
|
Howe JG, Stack G. Relationship between B-cell epitope structural properties and the immunogenicity of blood group antigens: Outlier properties of the Kell K1 antigen. Transfusion 2022; 62:2349-2362. [PMID: 36205403 DOI: 10.1111/trf.17110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/17/2022] [Accepted: 08/20/2022] [Indexed: 11/27/2022]
Abstract
BACKGROUND The immunogenicities of polypeptide blood group antigens vary, despite most being created by single amino acid (AA) substitutions. To study the basis of these differences, we employed an immunoinformatics approach to determine whether AA substitution sites of blood group antigens have structural features typical of B-cell epitopes and whether the extent of B-cell epitope properties is positively related to immunogenicity. STUDY DESIGN AND METHODS Fifteen structural property prediction programs were used to determine the likelihood of β-turns, surface accessibility, flexibility, hydrophilicity, particular AA composition and AA pairs, and other B-cell epitope properties at AA substitution sites of polypeptide blood group antigens. RESULTS AA substitution sites of Lua , Jka , E, c, M, Fya , C, and S were each located in regions with at least two structural features typical of B-cell epitopes. The substitution site of K, the most immunogenic non-ABO/D antigen, scored the lowest for most B-cell epitope properties and was the only one not predicted to be part of a linear B-cell epitope. The most immunogenic antigens studied (K, Jka , Lua , E) had B-cell epitope structural properties determined by the fewest programs; the least immunogenic antigens (e.g., Fya , S, C, c) had B-cell epitope properties according to the most programs. DISCUSSION Counter to prediction, the immunogenicity of polypeptide blood group antigens was not positively related to B-cell epitope structural features present at their AA-substitution sites. Instead, it tended to be negatively related. The AA-substitution site of the most immunogenic non-ABO/D antigen, K, had the least B-cell epitope features.
Collapse
Affiliation(s)
- John G Howe
- Department of Laboratory Medicine, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Gary Stack
- Department of Laboratory Medicine, Yale University School of Medicine, New Haven, Connecticut, USA.,Pathology and Laboratory Medicine Service, VA Connecticut Healthcare System, West Haven, Connecticut, USA
| |
Collapse
|
113
|
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7112-7127. [PMID: 34232869 DOI: 10.1109/tpami.2021.3095381] [Citation(s) in RCA: 450] [Impact Index Per Article: 225.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids. The protein LMs (pLMs) were trained on the Summit supercomputer using 5616 GPUs and TPU Pod up-to 1024 cores. Dimensionality reduction revealed that the raw pLM-embeddings from unlabeled data captured some biophysical features of protein sequences. We validated the advantage of using the embeddings as exclusive input for several subsequent tasks: (1) a per-residue (per-token) prediction of protein secondary structure (3-state accuracy Q3=81%-87%); (2) per-protein (pooling) predictions of protein sub-cellular location (ten-state accuracy: Q10=81%) and membrane versus water-soluble (2-state accuracy Q2=91%). For secondary structure, the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without multiple sequence alignments (MSAs) or evolutionary information thereby bypassing expensive database searches. Taken together, the results implied that pLMs learned some of the grammar of the language of life. All our models are available through https://github.com/agemagician/ProtTrans.
Collapse
|
114
|
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022. [PMID: 34232869 DOI: 10.1101/2020.07.12.199554] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids. The protein LMs (pLMs) were trained on the Summit supercomputer using 5616 GPUs and TPU Pod up-to 1024 cores. Dimensionality reduction revealed that the raw pLM-embeddings from unlabeled data captured some biophysical features of protein sequences. We validated the advantage of using the embeddings as exclusive input for several subsequent tasks: (1) a per-residue (per-token) prediction of protein secondary structure (3-state accuracy Q3=81%-87%); (2) per-protein (pooling) predictions of protein sub-cellular location (ten-state accuracy: Q10=81%) and membrane versus water-soluble (2-state accuracy Q2=91%). For secondary structure, the most informative embeddings (ProtT5) for the first time outperformed the state-of-the-art without multiple sequence alignments (MSAs) or evolutionary information thereby bypassing expensive database searches. Taken together, the results implied that pLMs learned some of the grammar of the language of life. All our models are available through https://github.com/agemagician/ProtTrans.
Collapse
|
115
|
Fujii Y, Masatani T, Nishiyama S, Okajima M, Izumi F, Okazaki K, Sakoda Y, Takada A, Ozawa M, Sugiyama M, Ito N. Molecular characterisation of a novel avian rotavirus A strain detected from a gull species ( Larus sp.). J Gen Virol 2022; 103. [PMID: 36223171 DOI: 10.1099/jgv.0.001792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A recent study demonstrated the possibility that migratory birds are responsible for the global spread of avian rotavirus A (RVA). However, little is known about what types of RVAs are retained in migratory birds. In this study, to obtain information on RVA strains in migratory birds, we characterised an RVA strain, Ho374, that was detected in a faecal sample from a gull species (Larus sp.). Genetic analysis revealed that all 11 genes of this strain were classified as new genotypes (G28-P[39]-I21-R14-C14-M13-A24-N14-T16-E21-H16). This clearly indicates that the genetic diversity of avian RVAs is greater than previously recognised. Our findings highlight the need for investigations of RVA strains retained in migratory birds, including gulls.
Collapse
Affiliation(s)
- Yuji Fujii
- Joint Graduate School of Veterinary Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Tatsunori Masatani
- Joint Graduate School of Veterinary Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan.,Laboratory of Zoonotic Diseases, Faculty of Applied Biological Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Shoko Nishiyama
- Laboratory of Zoonotic Diseases, Faculty of Applied Biological Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Misuzu Okajima
- Joint Graduate School of Veterinary Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Fumiki Izumi
- Joint Graduate School of Veterinary Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Katsunori Okazaki
- Laboratory of Microbiology and Immunology, Faculty of Pharmaceutical Sciences, Health Sciences University of Hokkaido, 1757 Kanazawa, Ishikari-Tobetsu, Hokkaido 061-0293, Japan
| | - Yoshihiro Sakoda
- Laboratory of Microbiology, Faculty of Veterinary Medicine, Hokkaido University, Kita-18, Nishi-9, Kita-ku, Sapporo, Hokkaido 060-0818, Japan
| | - Ayato Takada
- Division of Global Epidemiology, International Institute for Zoonosis Control, Hokkaido University, Kita-20, Nishi-10, Kita-ku, Sapporo, Hokkaido 001-0020, Japan.,International Collaboration Unit, International Institute for Zoonosis Control, Hokkaido University, Kita-20, Nishi-10, Kita-ku, Sapporo, Hokkaido 001-0020, Japan
| | - Makoto Ozawa
- Laboratory of Animal Hygiene, Joint Faculty of Veterinary Medicine, Kagoshima University, 1-21-24 Korimoto, Kagoshima, Kagoshima 890-0065, Japan
| | - Makoto Sugiyama
- Laboratory of Zoonotic Diseases, Faculty of Applied Biological Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| | - Naoto Ito
- Joint Graduate School of Veterinary Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan.,Laboratory of Zoonotic Diseases, Faculty of Applied Biological Sciences, Gifu University, 1-1 Yanagido, Gifu, Gifu 501-1193, Japan
| |
Collapse
|
116
|
Design of a multi-epitope vaccine against the pathogenic fungi Candida tropicalis using an in silico approach. J Genet Eng Biotechnol 2022; 20:140. [PMID: 36175808 PMCID: PMC9521867 DOI: 10.1186/s43141-022-00415-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 08/25/2022] [Indexed: 12/02/2022]
Abstract
Background Candida tropicalis causes tropical invasive fungal infections, with a high mortality. This fungus has been found to be resistant to antifungal classes such as azoles, echinocandins, and polyenes in several studies. As a result, it is vital to identify novel approaches to prevent and treat C. tropicalis infections. In this study, an in silico technique was utilized to deduce and evaluate a powerful multivalent epitope-based vaccine against C. tropicalis, which targets the secreted aspartic protease 2 (SAP2) protein. This protein is implicated in virulence and host invasion. Results By focusing on the Sap2 protein, 11 highly antigenic, non-allergic, non-toxic, and conserved epitopes were identified. These were subsequently paired with RS09 and flagellin adjuvants, as well as a pan HLA DR-binding epitope (PADRE) sequence to create a vaccine candidate that elicited both cell-mediated and humoral immune responses. It was projected that the vaccine design would be soluble, stable, antigenic, and non-allergic. Ramachandran plot analysis was applied to validate the vaccine construct’s 3-dimensional model. The vaccine construct was tested (at 100 ns) using molecular docking and molecular dynamics simulations, which demonstrated that it can stably connect with MHC-I and Toll-like receptor molecules. Based on in silico studies, we have shown that the vaccine construct can be expressed in E. coli. We surmise that the vaccine design is unrelated to any human proteins, indicating that it is safe to use. Conclusions The vaccine design looks to be an effective option for preventing C. tropicalis infections, based on the outcomes of the studies. A fungal vaccine can be proposed as prophylactic medicine and could provide initial protection as sometimes diagnosis of infection could be challenging. However, more in vitro and in vivo research is needed to prove the efficacy and safety of the proposed vaccine design.
Supplementary Information The online version contains supplementary material available at 10.1186/s43141-022-00415-3.
Collapse
|
117
|
Capel H, Weiler R, Dijkstra M, Vleugels R, Bloem P, Feenstra KA. ProteinGLUE multi-task benchmark suite for self-supervised protein modeling. Sci Rep 2022; 12:16047. [PMID: 36163232 PMCID: PMC9512797 DOI: 10.1038/s41598-022-19608-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 08/31/2022] [Indexed: 11/09/2022] Open
Abstract
Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue .
Collapse
Affiliation(s)
- Henriette Capel
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Robin Weiler
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Maurits Dijkstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Reinier Vleugels
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Peter Bloem
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
118
|
Lasker K, Boeynaems S, Lam V, Scholl D, Stainton E, Briner A, Jacquemyn M, Daelemans D, Deniz A, Villa E, Holehouse AS, Gitler AD, Shapiro L. The material properties of a bacterial-derived biomolecular condensate tune biological function in natural and synthetic systems. Nat Commun 2022; 13:5643. [PMID: 36163138 PMCID: PMC9512792 DOI: 10.1038/s41467-022-33221-z] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 09/09/2022] [Indexed: 11/17/2022] Open
Abstract
Intracellular phase separation is emerging as a universal principle for organizing biochemical reactions in time and space. It remains incompletely resolved how biological function is encoded in these assemblies and whether this depends on their material state. The conserved intrinsically disordered protein PopZ forms condensates at the poles of the bacterium Caulobacter crescentus, which in turn orchestrate cell-cycle regulating signaling cascades. Here we show that the material properties of these condensates are determined by a balance between attractive and repulsive forces mediated by a helical oligomerization domain and an expanded disordered region, respectively. A series of PopZ mutants disrupting this balance results in condensates that span the material properties spectrum, from liquid to solid. A narrow range of condensate material properties supports proper cell division, linking emergent properties to organismal fitness. We use these insights to repurpose PopZ as a modular platform for generating tunable synthetic condensates in human cells.
Collapse
Affiliation(s)
- Keren Lasker
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.
| | - Steven Boeynaems
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Vinson Lam
- Department of Molecular Biology, School of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Daniel Scholl
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Emma Stainton
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Adam Briner
- Clem Jones Centre for Ageing Dementia Research (CJCADR), Queensland Brain Institute (QBI), The University of Queensland, Brisbane, QLD, Australia
| | - Maarten Jacquemyn
- KU Leuven Department of Microbiology, Immunology, and Transplantation, Laboratory of Virology and Chemotherapy, Rega Institute, KU Leuven, Leuven, Belgium
| | - Dirk Daelemans
- KU Leuven Department of Microbiology, Immunology, and Transplantation, Laboratory of Virology and Chemotherapy, Rega Institute, KU Leuven, Leuven, Belgium
| | - Ashok Deniz
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Elizabeth Villa
- Department of Molecular Biology, School of Biological Sciences, University of California San Diego, La Jolla, CA, USA
- Howard Hughes Medical Institute, University of California San Diego, La Jolla, CA, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO, USA
- Center for Science and Engineering of Living Systems (CSELS), Washington University in St. Louis, St. Louis, MO, USA
| | - Aaron D Gitler
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| | - Lucy Shapiro
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
119
|
Geffen Y, Ofran Y, Unger R. DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts. Bioinformatics 2022; 38:ii95-ii98. [PMID: 36124789 DOI: 10.1093/bioinformatics/btac474] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
SUMMARY Recently, deep learning models, initially developed in the field of natural language processing (NLP), were applied successfully to analyze protein sequences. A major drawback of these models is their size in terms of the number of parameters needed to be fitted and the amount of computational resources they require. Recently, 'distilled' models using the concept of student and teacher networks have been widely used in NLP. Here, we adapted this concept to the problem of protein sequence analysis, by developing DistilProtBert, a distilled version of the successful ProtBert model. Implementing this approach, we reduced the size of the network and the running time by 50%, and the computational resources needed for pretraining by 98% relative to ProtBert model. Using two published tasks, we showed that the performance of the distilled model approaches that of the full model. We next tested the ability of DistilProtBert to distinguish between real and random protein sequences. The task is highly challenging if the composition is maintained on the level of singlet, doublet and triplet amino acids. Indeed, traditional machine-learning algorithms have difficulties with this task. Here, we show that DistilProtBert preforms very well on singlet, doublet and even triplet-shuffled versions of the human proteome, with AUC of 0.92, 0.91 and 0.87, respectively. Finally, we suggest that by examining the small number of false-positive classifications (i.e. shuffled sequences classified as proteins by DistilProtBert), we may be able to identify de novo potential natural-like proteins based on random shuffling of amino acid sequences. AVAILABILITY AND IMPLEMENTATION https://github.com/yarongef/DistilProtBert.
Collapse
Affiliation(s)
- Yaron Geffen
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Yanay Ofran
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Ron Unger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 5290002, Israel
| |
Collapse
|
120
|
Moutinho AF, Eyre-Walker A, Dutheil JY. Strong evidence for the adaptive walk model of gene evolution in Drosophila and Arabidopsis. PLoS Biol 2022; 20:e3001775. [PMID: 36099311 PMCID: PMC9470001 DOI: 10.1371/journal.pbio.3001775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 08/01/2022] [Indexed: 11/19/2022] Open
Abstract
Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by “walking” in a fitness landscape. This “adaptive walk” is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic data sets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene’s evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale. This study uses population genomic datasets from Arabidopsis and Drosophila to show that young genes adapt faster and are subject to mutations of larger fitness effects, providing strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.
Collapse
Affiliation(s)
- Ana Filipa Moutinho
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
- * E-mail:
| | - Adam Eyre-Walker
- School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Julien Y. Dutheil
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Unité Mixte de Recherche 5554 Institut des Sciences de l’Evolution, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| |
Collapse
|
121
|
Genotyping and In Silico Analysis of Delmarva (DMV/1639) Infectious Bronchitis Virus (IBV) Spike 1 (S1) Glycoprotein. Genes (Basel) 2022; 13:genes13091617. [PMID: 36140785 PMCID: PMC9498812 DOI: 10.3390/genes13091617] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/04/2022] [Accepted: 09/06/2022] [Indexed: 11/17/2022] Open
Abstract
Genetic diversity and evolution of infectious bronchitis virus (IBV) are mainly impacted by mutations in the spike 1 (S1) gene. This study focused on whole genome sequencing of an IBV isolate (IBV/Ck/Can/2558004), which represents strains highly prevalent in Canadian commercial poultry, especially concerning features related to its S1 gene and protein sequences. Based on the phylogeny of the S1 gene, IBV/Ck/Can/2558004 belongs to the GI-17 lineage. According to S1 gene and protein pairwise alignment, IBV/Ck/Can/2558004 had 99.44–99.63% and 98.88–99.25% nucleotide (nt) and deduced amino acid (aa) identities, respectively, with five Canadian Delmarva (DMV/1639) IBVs isolated in 2019, and it also shared 96.63–97.69% and 94.78–97.20% nt and aa similarities with US DMV/1639 IBVs isolated in 2011 and 2019, respectively. Further homology analysis of aa sequences showed the existence of some aa substitutions in the hypervariable regions (HVRs) of the S1 protein of IBV/Ck/Can/2558004 compared to US DMV/1639 isolates; most of these variant aa residues have been subjected to positive selection pressure. Predictive analysis of potential N-glycosylation and phosphorylation motifs showed either loss or acquisition in the S1 glycoprotein of IBV/Ck/Can/2558004 compared to S1 of US DMV/1639 IBV. Furthermore, bioinformatic analysis showed some of the aa changes within the S1 protein of IBV/Ck/Can/2558004 have been predicted to impact the function and structure of the S1 protein, potentially leading to a lower binding affinity of the S1 protein to its relevant ligand (sialic acid). In conclusion, these findings revealed that the DMV/1639 IBV isolates are under continuous evolution among Canadian poultry.
Collapse
|
122
|
Selvaraj C, Pravin MA, Alhoqail WA, Nayarisseri A, Singh SK. Intrinsically disordered proteins in viral pathogenesis and infections. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 132:221-242. [PMID: 36088077 DOI: 10.1016/bs.apcsb.2022.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Disordered proteins serve a crucial part in many biological processes that go beyond the capabilities of ordered proteins. A large number of virus-encoded proteins have extremely condensed proteomes and genomes, which results in highly disordered proteins. The presence of these IDPs allows them to rapidly adapt to changes in their biological environment and play a significant role in viral replication and down-regulation of host defense mechanisms. Since viruses undergo rapid evolution and have a high rate of mutation and accumulation in their proteome, IDPs' insights into viruses are critical for understanding how viruses hijack cells and cause disease. There are many conformational changes that IDPs can adopt in order to interact with different protein partners and thus stabilize the particular fold and withstand high mutation rates. This chapter explains the molecular mechanism behind viral IDPs, as well as the significance of recent research in the field of IDPs, with the goal of gaining a deeper comprehension of the essential roles and functions played by viral proteins.
Collapse
Affiliation(s)
- Chandrabose Selvaraj
- Computer Aided Drug Design and Molecular Modeling Lab, Department of Bioinformatics, Science Block, Alagappa University, Karaikudi, Tamil Nadu, India.
| | - Muthuraja Arun Pravin
- Computer Aided Drug Design and Molecular Modeling Lab, Department of Bioinformatics, Science Block, Alagappa University, Karaikudi, Tamil Nadu, India
| | - Wardah A Alhoqail
- Department of Biology, College of Education, Majmaah University, Al Majma'ah, Saudi Arabia
| | - Anuraj Nayarisseri
- In Silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India
| | - Sanjeev Kumar Singh
- Computer Aided Drug Design and Molecular Modeling Lab, Department of Bioinformatics, Science Block, Alagappa University, Karaikudi, Tamil Nadu, India.
| |
Collapse
|
123
|
Hong Y, Song J, Ko J, Lee J, Shin WH. S-Pred: protein structural property prediction using MSA transformer. Sci Rep 2022; 12:13891. [PMID: 35974061 PMCID: PMC9381718 DOI: 10.1038/s41598-022-18205-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/08/2022] [Indexed: 11/10/2022] Open
Abstract
Predicting the local structural features of a protein from its amino acid sequence helps its function prediction to be revealed and assists in three-dimensional structural modeling. As the sequence-structure gap increases, prediction methods have been developed to bridge this gap. Additionally, as the size of the structural database and computing power increase, the performance of these methods have also significantly improved. Herein, we present a powerful new tool called S-Pred, which can predict eight-state secondary structures (SS8), accessible surface areas (ASAs), and intrinsically disordered regions (IDRs) from a given sequence. For feature prediction, S-Pred uses multiple sequence alignment (MSA) of a query sequence as an input. The MSA input is converted to features by the MSA Transformer, which is a protein language model that uses an attention mechanism. A long short-term memory (LSTM) was employed to produce the final prediction. The performance of S-Pred was evaluated on several test sets, and the program consistently provided accurate predictions. The accuracy of the SS8 prediction was approximately 76%, and the Pearson’s correlation between the experimental and predicted ASAs was 0.84. Additionally, an IDR could be accurately predicted with an F1-score of 0.514. The program is freely available at https://github.com/arontier/S_Pred_Paper and https://ad3.io as a code and a web server.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co., Seoul, 06735, Republic of Korea
| | - Jinung Song
- Arontier Co., Seoul, 06735, Republic of Korea
| | - Junsu Ko
- Arontier Co., Seoul, 06735, Republic of Korea
| | - Juyong Lee
- Arontier Co., Seoul, 06735, Republic of Korea.,Division of Chemistry and Biochemistry, Department of Chemistry, Kangwon National University, Chuncheon, 24341, Republic of Korea
| | - Woong-Hee Shin
- Arontier Co., Seoul, 06735, Republic of Korea. .,Department of Chemistry Education, Sunchon National University, Suncheon, 57922, Republic of Korea. .,Department of Advanced Components and Materials Engineering, Sunchon National University, Suncheon, 57922, Republic of Korea.
| |
Collapse
|
124
|
Prediction of B cell epitopes in proteins using a novel sequence similarity-based method. Sci Rep 2022; 12:13739. [PMID: 35962028 PMCID: PMC9374694 DOI: 10.1038/s41598-022-18021-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 08/03/2022] [Indexed: 11/29/2022] Open
Abstract
Prediction of B cell epitopes that can replace the antigen for antibody production and detection is of great interest for research and the biotech industry. Here, we developed a novel BLAST-based method to predict linear B cell epitopes. To that end, we generated a BLAST-formatted database upon a dataset of 62,730 known linear B cell epitope sequences and considered as a B cell epitope any peptide sequence producing ungapped BLAST hits to this database with identity ≥ 80% and length ≥ 8. We examined B cell epitope predictions by this method in tenfold cross-validations in which we considered various types of non-B cell epitopes, including 62,730 peptide sequences with verified negative B cell assays. As a result, we obtained values of accuracy, specificity and sensitivity of 72.54 ± 0.27%, 81.59 ± 0.37% and 63.49 ± 0.43%, respectively. In an independent dataset incorporating 503 B cell epitopes, this method reached accuracy, specificity and sensitivity of 74.85%, 99.20% and 50.50%, respectively, outperforming state-of-the-art methods to predict linear B cell epitopes. We implemented this BLAST-based approach to predict B cell epitopes at http://imath.med.ucm.es/bepiblast.
Collapse
|
125
|
Debnath P, Khan U, Khan MS. Characterization and Structural Prediction of Proteins in SARS-CoV-2 Bangladeshi Variant Through Bioinformatics. Microbiol Insights 2022; 15:11786361221115595. [PMID: 35966939 PMCID: PMC9373114 DOI: 10.1177/11786361221115595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 06/30/2022] [Indexed: 11/15/2022] Open
Abstract
The renowned respiratory disease induced by the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) has become a global epidemic in just less than a year by the first half of 2020. The subsequent efficient human-to-human transmission of this virus eventually affected millions of people worldwide. The most devastating thing is that the infection rate is continuously uprising and resulting in significant mortality especially among the older age population and those with health co-morbidities. This enveloped, positive-sense RNA virus is chiefly responsible for the infection of the upper respiratory system. The virulence of the SARS-CoV-2 is mostly regulated by its proteins such as entry to the host cell through fusion mechanism, fusion of infected cells with neighboring uninfected cells to spread virus, inhibition of host gene expression, cellular differentiation, apoptosis, mitochondrial biogenesis, etc. But very little is known about the protein structures and functionalities. Therefore, the main purpose of this study is to learn more about these proteins through bioinformatics approaches. In this study, ORF10, ORF7b, ORF7a, ORF6, membrane glycoprotein, and envelope protein have been selected from a Bangladeshi Corona-virus strain G039392 and a number of bioinformatics tools (MEGA-X-V10.1.7, PONDR, ProtScale, ProtParam, SCRIBER, NetSurfP v2.0, IntFOLD, UCSF Chimera, and PyMol) and strategies were implemented for multiple sequence alignment and phylogeny analysis with 9 different variants, predicting hydropathicity, amino acid compositions, protein-binding propensity, protein disorders, and 2D and 3D protein modeling. Selected proteins were characterized as highly flexible, structurally and electrostatically extremely stable, ordered, biologically active, hydrophobic, and closely related to proteins of different variants. This detailed information regarding the characterization and structure of proteins of SARS-CoV-2 Bangladeshi variant was performed for the first time ever to unveil the deep mechanism behind the virulence features. And this robust appraisal also paves the future way for molecular docking, vaccine development targeting these characterized proteins.
Collapse
Affiliation(s)
- Pinky Debnath
- Chemical Biotechnology Department,
Technical University of Munich, Straubing, Germany
| | - Umama Khan
- Biotechnology and Genetic Engineering
Discipline, Khulna University, Bangladesh
| | | |
Collapse
|
126
|
A Neural Networks Approach for the Analysis of Reproducible Ribo–Seq Profiles. ALGORITHMS 2022. [DOI: 10.3390/a15080274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In recent years, the Ribosome profiling technique (Ribo–seq) has emerged as a powerful method for globally monitoring the translation process in vivo at single nucleotide resolution. Based on deep sequencing of mRNA fragments, Ribo–seq allows to obtain profiles that reflect the time spent by ribosomes in translating each part of an open reading frame. Unfortunately, the profiles produced by this method can vary significantly in different experimental setups, being characterized by a poor reproducibility. To address this problem, we have employed a statistical method for the identification of highly reproducible Ribo–seq profiles, which was tested on a set of E. coli genes. State-of-the-art artificial neural network models have been used to validate the quality of the produced sequences. Moreover, new insights into the dynamics of ribosome translation have been provided through a statistical analysis on the obtained sequences.
Collapse
|
127
|
Roca-Martinez J, Lazar T, Gavalda-Garcia J, Bickel D, Pancsa R, Dixit B, Tzavella K, Ramasamy P, Sanchez-Fornaris M, Grau I, Vranken WF. Challenges in describing the conformation and dynamics of proteins with ambiguous behavior. Front Mol Biosci 2022; 9:959956. [PMID: 35992270 PMCID: PMC9382080 DOI: 10.3389/fmolb.2022.959956] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
Abstract
Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.
Collapse
Affiliation(s)
- Joel Roca-Martinez
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Jose Gavalda-Garcia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - David Bickel
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Rita Pancsa
- Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary
| | - Bhawna Dixit
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- IBiTech-Biommeda, Universiteit Gent, Gent, Belgium
| | - Konstantina Tzavella
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, Universiteit Gent, Gent, Belgium
| | - Maite Sanchez-Fornaris
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Department of Computer Sciences, University of Camagüey, Camagüey, Cuba
| | - Isel Grau
- Information Systems, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Wim F. Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| |
Collapse
|
128
|
Holcomb DD, Jankowska KI, Hernandez N, Laurie K, Kames J, Hamasaki-Katagiri N, Komar AA, DiCuccio M, Kimchi-Sarfaty C. Protocol to identify host-viral protein interactions between coagulation-related proteins and their genetic variants with SARS-CoV-2 proteins. STAR Protoc 2022; 3:101648. [PMID: 36052345 PMCID: PMC9345850 DOI: 10.1016/j.xpro.2022.101648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Here, we describe a bioinformatics pipeline that evaluates the interactions between coagulation-related proteins and genetic variants with SARS-CoV-2 proteins. This pipeline searches for host proteins that may bind to viral protein and identifies and scores the protein genetic variants to predict the disease pathogenesis in specific subpopulations. Additionally, it is able to find structurally similar motifs and identify potential binding sites within the host-viral protein complexes to unveil viral impact on regulated biological processes and/or host-protein impact on viral invasion or reproduction. For complete details on the use and execution of this protocol, please refer to Holcomb et al. (2021).
Collapse
Affiliation(s)
- David D. Holcomb
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA,Corresponding author
| | - Katarzyna I. Jankowska
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Nancy Hernandez
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Kyle Laurie
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Jacob Kames
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Nobuko Hamasaki-Katagiri
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA
| | - Anton A. Komar
- Center for Gene Regulation in Health and Disease, Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA
| | - Michael DiCuccio
- National Center of Biotechnology Information, National Institutes of Health, Bethesda, MD, USA
| | - Chava Kimchi-Sarfaty
- Center for Biologics Evaluation and Research, Office of Tissues and Advanced Therapies, Division of Plasma Protein Therapeutics, Food and Drug Administration, Silver Spring, MD, USA,Corresponding author
| |
Collapse
|
129
|
Nelis JLD, Broadbent JA, Bose U, Anderson A, Colgrave ML. Targeted proteomics for rapid and robust peanut allergen quantification. Food Chem 2022; 383:132592. [PMID: 35413757 DOI: 10.1016/j.foodchem.2022.132592] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 02/01/2022] [Accepted: 02/26/2022] [Indexed: 11/26/2022]
Abstract
This study improves LC-MS-based trace level peanut allergen quantification in processed food by refining method robustness, total analysis time and method sensitivity. Extraction buffer (six compared) and peptide choice were optimised and found to profoundly affect method robustness. A rapid extraction and in-solution digestion method was developed omitting subsequent reduction, alkylation and sample clean-up steps effectively reducing total analysis time from the previously reported ∼5.5-20 h to ∼2.5 h. For the three best performing peptides, accurate quantification (CVs < 15%) with matrix-matched calibration curves (R2 = 0.99-0.97) was achieved for peanut muffin and ice-cream with excellent linearity (0.25-1000 mg kg-1). The best performing peptide enabled excellent recovery rates in ice-cream (106.0 ± 15.1%) and peanut muffin (72.7 ± 13.4%). Sensitivity (LOD = 0.25-0.5 mg kg-1; LOQ = 0.5-1.0 mg kg-1) was 2- to 20-fold improved compared to previous methods depending on the peptide. These methodological improvements contribute to robust peanut detection in food and can be translated to additional food-borne allergens.
Collapse
Affiliation(s)
- Joost L D Nelis
- CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia, QLD 4067, Australia.
| | - James A Broadbent
- CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia, QLD 4067, Australia
| | - Utpal Bose
- CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia, QLD 4067, Australia
| | - Alisha Anderson
- CSIRO Health & Biosecurity, Black Mountain, Canberra, ACT 2600, Australia
| | | |
Collapse
|
130
|
Niemann M, Matern BM, Spierings E. Snowflake: A deep learning-based human leukocyte antigen matching algorithm considering allele-specific surface accessibility. Front Immunol 2022; 13:937587. [PMID: 35967374 PMCID: PMC9372366 DOI: 10.3389/fimmu.2022.937587] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/11/2022] [Indexed: 12/12/2022] Open
Abstract
Histocompatibility in solid-organ transplantation has a strong impact on long-term graft survival. Although recent advances in matching of both B-cell epitopes and T-cell epitopes have improved understanding of allorecognition, the immunogenic determinants are still not fully understood. We hypothesized that HLA solvent accessibility is allele-specific, thus supporting refinement of HLA B-cell epitope prediction. We developed a computational pipeline named Snowflake to calculate solvent accessibility of HLA Class I proteins for deposited HLA crystal structures, supplemented by constructed HLA structures through the AlphaFold protein folding predictor and peptide binding predictions of the APE-Gen docking framework. This dataset trained a four-layer long short-term memory bidirectional recurrent neural network, which in turn inferred solvent accessibility of all known HLA Class I proteins. We extracted 676 HLA Class-I experimental structures from the Protein Data Bank and supplemented it by 37 Class-I alleles for which structures were predicted. For each of the predicted structures, 10 known binding peptides as reported by the Immune Epitope DataBase were rendered into the binding groove. Although HLA Class I proteins predominantly are folded similarly, we found higher variation in root mean square difference of solvent accessibility between experimental structures of different HLAs compared to structures with identical amino acid sequence, suggesting HLA’s solvent accessible surface is protein specific. Hence, residues may be surface-accessible on e.g. HLA-A*02:01, but not on HLA-A*01:01. Mapping these data to antibody-verified epitopes as defined by the HLA Epitope Registry reveals patterns of (1) consistently accessible residues, (2) only subsets of an epitope’s residues being consistently accessible and (3) varying surface accessibility of residues of epitopes. Our data suggest B-cell epitope definitions can be refined by considering allele-specific solvent-accessibility, rather than aggregating HLA protein surface maps by HLA class or locus. To support studies on epitope analyses in organ transplantation, the calculation of donor-allele-specific solvent-accessible amino acid mismatches was implemented as a cloud-based web service.
Collapse
Affiliation(s)
- Matthias Niemann
- Research and Development, PIRCHE AG, Berlin, Germany
- *Correspondence: Matthias Niemann,
| | - Benedict M. Matern
- Center for Translational Immunology, University Medical Center, Utrecht, Netherlands
| | - Eric Spierings
- Center for Translational Immunology, University Medical Center, Utrecht, Netherlands
- Central Diagnostic Laboratory, University Medical Center, Utrecht, Netherlands
| |
Collapse
|
131
|
Han H, Xu M, Wen L, Chen J, Liu Q, Wang J, Li MD, Yang Z. Identification of a Novel Functional Non-synonymous Single Nucleotide Polymorphism in Frizzled Class Receptor 6 Gene for Involvement in Depressive Symptoms. Front Mol Neurosci 2022; 15:882396. [PMID: 35875672 PMCID: PMC9302575 DOI: 10.3389/fnmol.2022.882396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/16/2022] [Indexed: 12/05/2022] Open
Abstract
Although numerous susceptibility loci for depression have been identified in recent years, their biological function and molecular mechanism remain largely unknown. By using an exome-wide association study for depressive symptoms assessed by the Center for Epidemiological Studies Depression (CES-D) score, we discovered a novel missense single nucleotide polymorphism (SNP), rs61753730 (Q152E), located in the fourth exon of the frizzled class receptor 6 gene (FZD6), which is a potential causal variant and is significantly associated with the CES-D score. Computer-based in silico analysis revealed that the protein configuration and stability, as well as the secondary structure of FZD6 differed greatly between the wild-type (WT) and Q152E mutant. We further found that rs61753730 significantly affected the luciferase activity and expression of FZD6 in an allele-specific way. Finally, we generated Fzd6-knockin (Fzd6-KI) mice with rs61753730 mutation using the CRISPR/Cas9 genome editing system and found that these mice presented greater immobility in the forced swimming test, less preference for sucrose in the sucrose preference test, as well as decreased center entries, center time, and distance traveled in the open filed test compared with WT mice after exposed to chronic social defeat stress. These results indicate the involvement of rs61753730 in depression. Taken together, our findings demonstrate that SNP rs61753730 is a novel functional variant and plays an important role in depressive symptoms.
Collapse
Affiliation(s)
- Haijun Han
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mengxiang Xu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Li Wen
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jiali Chen
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiang Liu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ju Wang
- Department of Medical Engineering, Tianjin Medical University, Tianjin, China
| | - Ming D. Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China
- *Correspondence: Ming D. Li,
| | - Zhongli Yang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
- Zhongli Yang,
| |
Collapse
|
132
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit-explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring "the state of the art" in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI-PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI-PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI-PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the "state of the art" on research in the AI-PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| |
Collapse
|
133
|
Yasmin T. In silico comprehensive analysis of coding and non-coding SNPs in human mTOR protein. PLoS One 2022; 17:e0270919. [PMID: 35788771 PMCID: PMC9255762 DOI: 10.1371/journal.pone.0270919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 06/17/2022] [Indexed: 11/21/2022] Open
Abstract
The mammalian/mechanistic target of rapamycin (mTOR) protein is an important growth regulator and has been linked with multiple diseases including cancer and diabetes. Non-synonymous mutations of this gene have already been found in patients with renal clear cell carcinoma, melanoma, and acute lymphoid leukemia among many others. Such mutations can potentially affect a protein’s structure and hence its functions. In this study, therefore, the most deleterious SNPs of mTOR protein have been determined to identify potential biomarkers for various disease treatments. The aim is to generate a structured dataset of the mTOR gene’s SNPs that may prove to be an asset for the identification and treatment of multiple diseases associated with the target gene. Both sequence and structure-based approaches were adopted and a wide variety of bioinformatics tools were applied to analyze the SNPs of mTOR protein. In total 11 nsSNPs have been filtered out of 2178 nsSNPs along with two non-coding variations. All of the nsSNPs were found to destabilize the protein structure and disrupt its function. While R619C, A1513D, and T1977R mutations were shown to alter C alpha distances and bond angles of the mTOR protein, L509Q, R619C and N2043S were predicted to disrupt the mTOR protein’s interaction with NBS1 protein and FKBP1A/rapamycin complex. In addition, one of the non-coding SNPs was shown to alter miRNA binding sites. Characterizing nsSNPs and non-coding SNPs and their harmful effects on a protein’s structure and functions will enable researchers to understand the critical impact of mutations on the molecular mechanisms of various diseases. This will ultimately lead to the identification of potential targets for disease diagnosis and therapeutic interventions.
Collapse
Affiliation(s)
- Tahirah Yasmin
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
134
|
Venkata Subbiah H, Ramesh Babu P, Subbiah U. Determination of deleterious single-nucleotide polymorphisms of human LYZ C gene: an in silico study. J Genet Eng Biotechnol 2022; 20:92. [PMID: 35776277 PMCID: PMC9247897 DOI: 10.1186/s43141-022-00383-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 06/14/2022] [Indexed: 11/26/2022]
Abstract
Background Single-nucleotide polymorphisms (SNPs) have a crucial function in affecting the susceptibility of individuals to diseases and also determine how an individual responds to different treatment options. The present study aimed to predict and characterize deleterious missense nonsynonymous SNPs (nsSNPs) of lysozyme C (LYZ C) gene using different computational methods. Lyz C is an important antimicrobial peptide capable of damaging the peptidoglycan layer of bacteria leading to osmotic shock and cell death. The nsSNPs were first analyzed by SIFT and PolyPhen v2 tools. The nsSNPs predicted as deleterious were then assessed by other in silico tools — SNAP, PROVEAN, PhD-SNP, and SNPs & GO. These SNPs were further examined by I-Mutant 3.0 and ConSurf. GeneMANIA and STRING tools were used to study the interaction network of the LYZ C gene. NetSurfP 2.0 was used to predict the secondary structure of Lyz C protein. The impact of variations on the structural characteristics of the protein was studied by HOPE analysis. The structures of wild type and variants were predicted by SWISS-MODEL web server, and energy minimization was carried out using XenoPlot software. TM-align tool was used to predict root-mean-square deviation (RMSD) and template modeling (TM) scores. Results Eight missense nsSNPs (T88N, I74T, F75I, D67H, W82R, D85H, R80C, and R116S) were found to be potentially deleterious. I-Mutant 3.0 determined that the variants decreased the stability of the protein. ConSurf predicted rs121913547, rs121913549, and rs387906536 nsSNPs to be conserved. Interaction network tools showed that LYZ C protein interacted with lactoferrin (LTF). HOPE tool analyzed differences in physicochemical properties between wild type and variants. TM-align tool predicted the alignment score, and the protein folding was found to be identical. PyMOL was used to visualize the superimposition of variants over wild type. Conclusion This study ascertained the deleterious missense nsSNPs of the LYZ C gene and could be used in further experimental analysis. These high-risk nsSNPs could be used as molecular targets for diagnostic and therapeutic interventions.
Collapse
Affiliation(s)
- Harini Venkata Subbiah
- Human Genetics Research Centre, Sree Balaji Dental College & Hospital, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, India
| | - Polani Ramesh Babu
- Center for Materials Engineering and Regenerative Medicine, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, India
| | - Usha Subbiah
- Human Genetics Research Centre, Sree Balaji Dental College & Hospital, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, India.
| |
Collapse
|
135
|
Boßelmann CM, Hedrich UBS, Müller P, Sonnenberg L, Parthasarathy S, Helbig I, Lerche H, Pfeifer N. Predicting the functional effects of voltage-gated potassium channel missense variants with multi-task learning. EBioMedicine 2022; 81:104115. [PMID: 35759918 PMCID: PMC9250003 DOI: 10.1016/j.ebiom.2022.104115] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/30/2022] [Accepted: 05/31/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Variants in genes encoding voltage-gated potassium channels are associated with a broad spectrum of neurological diseases including epilepsy, ataxia, and intellectual disability. Knowledge of the resulting functional changes, characterized as overall ion channel gain- or loss-of-function, is essential to guide clinical management including precision medicine therapies. However, for an increasing number of variants, little to no experimental data is available. New tools are needed to evaluate variant functional effects. METHODS We catalogued a comprehensive dataset of 959 functional experiments across 19 voltage-gated potassium channels, leveraging data from 782 unique disease-associated and synthetic variants. We used these data to train a taxonomy-based multi-task learning support vector machine (MTL-SVM), and compared performance to several baseline methods. FINDINGS MTL-SVM maintains channel family structure during model training, improving overall predictive performance (mean balanced accuracy 0·718 ± 0·041, AU-ROC 0·761 ± 0·063) over baseline (mean balanced accuracy 0·620 ± 0·045, AU-ROC 0·711 ± 0·022). We can obtain meaningful predictions even for channels with few known variants (KCNC1, KCNQ5). INTERPRETATION Our model enables functional variant prediction for voltage-gated potassium channels. It may assist in tailoring current and future precision therapies for the increasing number of patients with ion channel disorders. FUNDING This work was supported by intramural funding of the Medical Faculty, University of Tuebingen (PATE F.1315137.1), the Federal Ministry for Education and Research (Treat-ION, 01GM1907A/B/G/H) and the German Research Foundation (FOR-2715, Le1030/16-2, He8155/1-2).
Collapse
Affiliation(s)
- Christian Malte Boßelmann
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany; Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Sand 14, D-72076 Tuebingen, Germany
| | - Ulrike B S Hedrich
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany
| | - Peter Müller
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany
| | - Lukas Sonnenberg
- Institute for Neurobiology, University of Tuebingen, Tuebingen, Germany
| | - Shridhar Parthasarathy
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Holger Lerche
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany.
| | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Sand 14, D-72076 Tuebingen, Germany; Interfaculty Institute for Biomedical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany; Faculty of Medicine, University of Tuebingen, Tuebingen, Germany; German Center for Infection Research, Partner Site Tuebingen, Tuebingen, Germany.
| |
Collapse
|
136
|
Towards the First Multiepitope Vaccine Candidate against Neospora caninum in Mouse Model: Immunoinformatic Standpoint. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2644667. [PMID: 35722460 PMCID: PMC9204498 DOI: 10.1155/2022/2644667] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 05/09/2022] [Indexed: 11/28/2022]
Abstract
Neospora caninum is an economically significant parasite among livestock, particularly in dairy cattle herds, causing storm abortions. Vaccination seems necessary to limit the infection and its harsh consequences. This is the first steps towards developing a multiepitope vaccine candidate against N. caninum using in silico approaches. High-ranked mouse MHC-binding and shared linear B-cell epitopes from six proteins (SRS2, MIC3, MIC6, GRA1, IMP-1, and profilin) as well as IFN-γ-inducing epitopes (from SAG1) were predicted, screened, and connected together through appropriate linkers. Finally, RS-09 protein (TLR4 agonist) and histidine tag were added to N- and C-terminal of the vaccine sequence, yielding 486 residues in length. Physicochemical properties showed a stable (instability index: 27.23), highly soluble, antigenic (VaxiJen score: 0.9554), and nonallergenic candidate. Secondary structure of the multiepitope protein included 58.85% random coil, 20.99% extended strand, and 20.16% alpha helix. Also, the tertiary structure was predicted, and further analyses validated a stable interaction between the vaccine model and mouse TLR4 (binding score: -1261.6). Virtual simulation of immune profile demonstrated potently stimulated humoral (IgG+IgM) and cell-mediated (IFN-γ) responses upon multiepitope vaccine injection. Altogether, a potentially immunogenic vaccine candidate was developed using several N. caninum proteins, with the capability to elicit IFN-γ upsurge and other components of cellular immunity, and can be used in prophylactic purposes against neosporosis.
Collapse
|
137
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
138
|
Prasanna D, Runthala A. Computationally Decoding NudF Residues To Enhance the Yield of the DXP Pathway. ACS OMEGA 2022; 7:19898-19912. [PMID: 35721994 PMCID: PMC9202048 DOI: 10.1021/acsomega.2c01677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/18/2022] [Indexed: 06/15/2023]
Abstract
Terpenoids form a large pool of highly diverse organic compounds possessing several economically important properties, including nutritional, aromatic, and pharmacological properties. The 1-deoxy-d-xylulose 5-phosphate (DXP) pathway's end enzyme, nuclear distribution protein (NudF), interacting with isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP), is critical for the synthesis of isoprenol/prenol/downstream compounds. The enzyme is yet to be thoroughly investigated to increase the overall yield of terpenoids in the Bacillus subtilis, which is widely used in industry and is generally regarded as a safe (GRAS) bacterium. The study aims to analyze the evolutionary conservation across the active site for mapping the key residues for mutagenesis studies. The 37-sequence data set, extracted from 103 Bacillus subtilis entries, shows a high phylogenetic divergence, and only six one-motif sequences ASB92783.1, ASB69297.1, ASB56714.1, AOR97677.1, AOL97023.1, and OAZ71765.1 show a monophyly relationship, unlike a complete polyphyly relationship between the other 31 three-motif sequences. Furthermore, only 47 of 179 residues of the representative sequence CUB50584.1 are observed to be significantly conserved. Docking analysis suggests a preferential bias of adenosine diphosphate (ADP)-ribose pyrophosphatase toward IPP, and a nearly threefold energetic difference is observed between IPP and DMAPP. The loops are hereby shown to play a regulatory role in guiding the promiscuity of NudF toward a specific ligand. Computational saturation mutagenesis of the seven hotspot residues identifies two key positions LYS78 and PHE116, orderly encoded within loop1 and loop7, majorly interacting with the ligands DMAPP and IPP, and their mutants K78I/K78L and PHE116D/PHE116E are found to stabilize the overall conformation. Molecular dynamics analysis shows that the IPP complex is significantly more stable than the DMAPP complex, and the NudF structure is very unstable. Besides showing a promiscuous binding of NudF with ligands, the analysis suggests its rate-limiting nature. The study would allow us to customize the metabolic load toward the synthesis of any of the downstream molecules. The findings would pave the way for the development of catalytically improved NudF mutants for the large-scale production of specific terpenoids with significant nutraceutical or commercial value.
Collapse
|
139
|
Vaish S, Parveen R, Gupta D, Basantani MK. Genome-wide identification and characterization of glutathione S-transferase gene family in Musa acuminata L. AAA group and gaining an insight to their role in banana fruit development. J Appl Genet 2022; 63:609-631. [PMID: 35689012 DOI: 10.1007/s13353-022-00707-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 05/31/2022] [Accepted: 06/02/2022] [Indexed: 10/18/2022]
Abstract
Glutathione S-transferases are a multifunctional protein superfamily that is involved in diverse plant functions such as defense mechanisms, signaling, stress response, secondary metabolism, and plant growth and development. Although the banana whole-genome sequence is available, the distribution of GST genes on banana chromosomes, their subcellular localization, gene structure, their evolutionary relation with each other, conserved motifs, and their roles in banana are still unknown. A total of 62 full-length GST genes with the canonical thioredoxin fold have been identified belonging to nine GST classes, namely tau, phi, theta, zeta, lambda, DHAR, EF1G, GHR, and TCHQD. The 62 GST genes were distributed into 11 banana chromosomes. All the MaGSTs were majorly localized in the cytoplasm. Gene architecture showed the conservation of exon numbers in individual GST classes. Multiple Em for Motif Elicitation analyses revealed few class-specific motifs and many motifs were found in all the GST classes. Multiple sequence alignment of banana GST amino acid sequences with rice, Arabidopsis, and soybean sequences revealed the Ser and Cys as conserved catalytic residues. Gene duplication analyses showed the tandem duplication as a driving force for GST gene family expansion in banana. Cis-regulatory element analysis showed the dominance of light-responsive element followed by stress- and hormone-responsive elements. Expression profiling analyses were also done by RNA-seq data. It was observed that MaGSTs are involved in various stages of fruit development. MaGSTU1 was highly upregulated. The comprehensive and organized studies of MaGST gene family provide groundwork for further functional analysis of MaGST genes in banana at molecular level and further for plant breeding approaches.
Collapse
Affiliation(s)
- Swati Vaish
- Faculty of Biosciences, Institute of Biosciences and Technology, Shri Ramswaroop Memorial University, Lucknow-Deva Road, Barabanki, 225003, Uttar Pradesh, India
| | - Reshma Parveen
- Faculty of Biosciences, Institute of Biosciences and Technology, Shri Ramswaroop Memorial University, Lucknow-Deva Road, Barabanki, 225003, Uttar Pradesh, India
| | - Divya Gupta
- Faculty of Biosciences, Institute of Biosciences and Technology, Shri Ramswaroop Memorial University, Lucknow-Deva Road, Barabanki, 225003, Uttar Pradesh, India
| | - Mahesh Kumar Basantani
- Faculty of Biosciences, Institute of Biosciences and Technology, Shri Ramswaroop Memorial University, Lucknow-Deva Road, Barabanki, 225003, Uttar Pradesh, India.
| |
Collapse
|
140
|
Høie MH, Kiehl EN, Petersen B, Nielsen M, Winther O, Nielsen H, Hallgren J, Marcatili P. NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning. Nucleic Acids Res 2022; 50:W510-W515. [PMID: 35648435 PMCID: PMC9252760 DOI: 10.1093/nar/gkac439] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 05/04/2022] [Accepted: 05/27/2022] [Indexed: 11/23/2022] Open
Abstract
Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.
Collapse
Affiliation(s)
- Magnus Haraldson Høie
- Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark
| | - Erik Nicolas Kiehl
- Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark
| | - Bent Petersen
- Center for Evolutionary Hologenomics, GLOBE Institute, University of Copenhagen, Denmark.,Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
| | - Morten Nielsen
- Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark
| | - Ole Winther
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark (DTU), Denmark.,Center for Genomic Medicine, Rigshospitalet (Copenhagen University Hospital), Copenhagen, Denmark.,Department of Biology, Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
| | - Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark
| | | | - Paolo Marcatili
- Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark
| |
Collapse
|
141
|
Yang Y, Wu S, Zhu Y, Yang J, Liu J. Global Profiling of Lysine Succinylation in Human Lungs. Proteomics 2022; 22:e2100381. [DOI: 10.1002/pmic.202100381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 05/11/2022] [Accepted: 05/25/2022] [Indexed: 11/08/2022]
Affiliation(s)
- Ye‐Hong Yang
- State Key Laboratory of Medical Molecular Biology Department of Biochemistry and Molecular Biology Institute of Basic Medical Sciences Chinese Academy of Medical Sciences & Peking Union Medical College Beijing 100005 China
| | - Song‐Feng Wu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Institute of Lifeomics Research Unit of Proteomics & Research and Development of New Drug of Chinese Academy of Medical Sciences Beijing 102206 China
| | - Yun‐Ping Zhu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Institute of Lifeomics Research Unit of Proteomics & Research and Development of New Drug of Chinese Academy of Medical Sciences Beijing 102206 China
| | - Jun‐Tao Yang
- State Key Laboratory of Medical Molecular Biology Department of Biochemistry and Molecular Biology Institute of Basic Medical Sciences Chinese Academy of Medical Sciences & Peking Union Medical College Beijing 100005 China
| | - Jiang‐Feng Liu
- State Key Laboratory of Medical Molecular Biology Department of Biochemistry and Molecular Biology Institute of Basic Medical Sciences Chinese Academy of Medical Sciences & Peking Union Medical College Beijing 100005 China
| |
Collapse
|
142
|
Al-Numan HH, Jan RM, Al-Saud NBS, Rashidi OM, Alrayes NM, Alsufyani HA, Mujalli A, Shaik NA, Mosli MH, Elango R, Saadah OI, Banaganapalli B. Exome Sequencing Identifies the Extremely Rare ITGAV and FN1 Variants in Early Onset Inflammatory Bowel Disease Patients. Front Pediatr 2022; 10:895074. [PMID: 35692981 PMCID: PMC9178107 DOI: 10.3389/fped.2022.895074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 04/04/2022] [Indexed: 12/12/2022] Open
Abstract
Background Molecular diagnosis of early onset inflammatory bowel disease (IBD) is very important for adopting suitable treatment strategies. Owing to the sparse data available, this study aims to identify the molecular basis of early onset IBD in Arab patients. Methods A consanguineous Arab family with monozygotic twins presenting early onset IBD was screened by whole exome sequencing (WES). The variants functional characterization was performed by a series of computational biology methods. The IBD variants were further screened in in-house whole exome data of 100 Saudi cohorts ensure their rare prevalence in the population. Results Genetic screening has identified the digenic autosomal recessive mode of inheritance of ITGAV (G58V) and FN1 (G313V) variants in IBD twins with early onset IBD. Findings from pathogenicity predictions, stability and molecular dynamics have confirmed the deleterious nature of both variants on structural features of the corresponding proteins. Functional biology data suggested that both genes show abundant expression in gastrointestinal tract and immune organs, involved in immune cell restriction, regulation of different immune related pathways. Data from knockout mouse models for ITGAV gene has revealed that the dysregulated expression of this gene impacts intestinal immune homeostasis. The defective ITGAV and FN1 involved in integrin pathway, are likely to induce intestinal inflammation by disturbing immune homeostasis. Conclusions Our findings provide novel insights into the molecular etiology of pediatric onset IBD and may likely pave way in developing genomic medicine.
Collapse
Affiliation(s)
- Huda Husain Al-Numan
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Rana Mohammed Jan
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Najla bint Saud Al-Saud
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | - Nuha Mohammad Alrayes
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hadeel A. Alsufyani
- Department of Medical Physiology, Faculty of Medicine, King Abdulaziz University Hospital, Jeddah, Saudi Arabia
| | - Abdulrahman Mujalli
- Department of Laboratory Medicine, Faculty of Applied Medical Sciences, Umm Al-Qura University, Makkah, Saudi Arabia
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Noor Ahmad Shaik
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmoud Hisham Mosli
- Department of Internal Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
- Inflammatory Bowel Disease Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ramu Elango
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Omar I. Saadah
- Inflammatory Bowel Disease Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Pediatrics, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Babajan Banaganapalli
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
143
|
Trulsson F, Akimov V, Robu M, van Overbeek N, Berrocal DAP, Shah RG, Cox J, Shah GM, Blagoev B, Vertegaal ACO. Deubiquitinating enzymes and the proteasome regulate preferential sets of ubiquitin substrates. Nat Commun 2022; 13:2736. [PMID: 35585066 PMCID: PMC9117253 DOI: 10.1038/s41467-022-30376-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 04/27/2022] [Indexed: 12/24/2022] Open
Abstract
The ubiquitin-proteasome axis has been extensively explored at a system-wide level, but the impact of deubiquitinating enzymes (DUBs) on the ubiquitinome remains largely unknown. Here, we compare the contributions of the proteasome and DUBs on the global ubiquitinome, using UbiSite technology, inhibitors and mass spectrometry. We uncover large dynamic ubiquitin signalling networks with substrates and sites preferentially regulated by DUBs or by the proteasome, highlighting the role of DUBs in degradation-independent ubiquitination. DUBs regulate substrates via at least 40,000 unique sites. Regulated networks of ubiquitin substrates are involved in autophagy, apoptosis, genome integrity, telomere integrity, cell cycle progression, mitochondrial function, vesicle transport, signal transduction, transcription, pre-mRNA splicing and many other cellular processes. Moreover, we show that ubiquitin conjugated to SUMO2/3 forms a strong proteasomal degradation signal. Interestingly, PARP1 is hyper-ubiquitinated in response to DUB inhibition, which increases its enzymatic activity. Our study uncovers key regulatory roles of DUBs and provides a resource of endogenous ubiquitination sites to aid the analysis of substrate specific ubiquitin signalling.
Collapse
Affiliation(s)
- Fredrik Trulsson
- Cell and Chemical Biology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Vyacheslav Akimov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Mihaela Robu
- Laboratory for Skin Cancer Research, CHU de Québec Laval University Hospital Research Centre, Québec, QC, Canada
| | - Nila van Overbeek
- Cell and Chemical Biology, Leiden University Medical Centre, Leiden, The Netherlands
| | | | - Rashmi G Shah
- Laboratory for Skin Cancer Research, CHU de Québec Laval University Hospital Research Centre, Québec, QC, Canada
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Girish M Shah
- Laboratory for Skin Cancer Research, CHU de Québec Laval University Hospital Research Centre, Québec, QC, Canada
| | - Blagoy Blagoev
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.
| | - Alfred C O Vertegaal
- Cell and Chemical Biology, Leiden University Medical Centre, Leiden, The Netherlands.
| |
Collapse
|
144
|
Bzówka M, Mitusińska K, Raczyńska A, Skalski T, Samol A, Bagrowska W, Magdziarz T, Góra A. Evolution of tunnels in α/β-hydrolase fold proteins—What can we learn from studying epoxide hydrolases? PLoS Comput Biol 2022; 18:e1010119. [PMID: 35580137 PMCID: PMC9140254 DOI: 10.1371/journal.pcbi.1010119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 05/27/2022] [Accepted: 04/19/2022] [Indexed: 12/27/2022] Open
Abstract
The evolutionary variability of a protein’s residues is highly dependent on protein region and function. Solvent-exposed residues, excluding those at interaction interfaces, are more variable than buried residues whereas active site residues are considered to be conserved. The abovementioned rules apply also to α/β-hydrolase fold proteins—one of the oldest and the biggest superfamily of enzymes with buried active sites equipped with tunnels linking the reaction site with the exterior. We selected soluble epoxide hydrolases as representative of this family to conduct the first systematic study on the evolution of tunnels. We hypothesised that tunnels are lined by mostly conserved residues, and are equipped with a number of specific variable residues that are able to respond to evolutionary pressure. The hypothesis was confirmed, and we suggested a general and detailed way of the tunnels’ evolution analysis based on entropy values calculated for tunnels’ residues. We also found three different cases of entropy distribution among tunnel-lining residues. These observations can be applied for protein reengineering mimicking the natural evolution process. We propose a ‘perforation’ mechanism for new tunnels design via the merging of internal cavities or protein surface perforation. Based on the literature data, such a strategy of new tunnel design could significantly improve the enzyme’s performance and can be applied widely for enzymes with buried active sites. So far very little is known about proteins tunnels evolution. The goal of this study is to evaluate the evolution of tunnels in the family of soluble epoxide hydrolases—representatives of numerous α/β-hydrolase fold enzymes. As a result two types of tunnels evolution analysis were proposed (a general and a detailed approach), as well as a ‘perforation’ mechanism which can mimic native evolution in proteins and can be used as an additional strategy for enzymes redesign.
Collapse
Affiliation(s)
- Maria Bzówka
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Karolina Mitusińska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Agata Raczyńska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Skalski
- Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Aleksandra Samol
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Weronika Bagrowska
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Tomasz Magdziarz
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
| | - Artur Góra
- Tunneling Group, Biotechnology Centre, Silesian University of Technology, Gliwice, Poland
- * E-mail:
| |
Collapse
|
145
|
Elkhaligy H, Balbin CA, Siltberg-Liberles J. Comparative Analysis of Structural Features in SLiMs from Eukaryotes, Bacteria, and Viruses with Importance for Host-Pathogen Interactions. Pathogens 2022; 11:pathogens11050583. [PMID: 35631103 PMCID: PMC9147284 DOI: 10.3390/pathogens11050583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 05/06/2022] [Accepted: 05/11/2022] [Indexed: 11/19/2022] Open
Abstract
Protein-protein interactions drive functions in eukaryotes that can be described by short linear motifs (SLiMs). Conservation of SLiMs help illuminate functional SLiMs in eukaryotic protein families. However, the simplicity of eukaryotic SLiMs makes them appear by chance due to mutational processes not only in eukaryotes but also in pathogenic bacteria and viruses. Further, functional eukaryotic SLiMs are often found in disordered regions. Although proteomes from pathogenic bacteria and viruses have less disorder than eukaryotic proteomes, their proteins can successfully mimic eukaryotic SLiMs and disrupt host cellular function. Identifying important SLiMs in pathogens is difficult but essential for understanding potential host-pathogen interactions. We performed a comparative analysis of structural features for experimentally verified SLiMs from the Eukaryotic Linear Motif (ELM) database across viruses, bacteria, and eukaryotes. Our results revealed that many viral SLiMs and specific motifs found across viruses and eukaryotes, such as some glycosylation motifs, have less disorder. Analyzing the disorder and coil properties of equivalent SLiMs from pathogens and eukaryotes revealed that some motifs are more structured in pathogens than their eukaryotic counterparts and vice versa. These results support a varying mechanism of interaction between pathogens and their eukaryotic hosts for some of the same motifs.
Collapse
|
146
|
Singh J, Paliwal K, Litfin T, Singh J, Zhou Y. Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment. Sci Rep 2022; 12:7607. [PMID: 35534620 PMCID: PMC9085874 DOI: 10.1038/s41598-022-11684-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 04/25/2022] [Indexed: 11/09/2022] Open
Abstract
Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) for the input and yields a leap in accuracy over single-sequence-based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers for all six test sets (TEST2018, TEST2020, Neff1-2020, CASP12-FM, CASP13-FM and CASP14-FM). More significantly, it has a performance comparable to profile-based methods for those proteins with homologous sequences. For example, the accuracy for three-state secondary structure (SS3) prediction for TEST2018 and TEST2020 proteins are 86.7% and 79.8% by SPOT-1D-LM, compared to 74.3% and 73.4% by the single-sequence-based method SPOT-1D-Single and 86.2% and 80.5% by the profile-based method SPOT-1D, respectively. For proteins without homologous sequences (Neff1-2020) SS3 is 80.41% by SPOT-1D-LM which is 3.8% and 8.3% higher than SPOT-1D-Single and SPOT-1D, respectively. SPOT-1D-LM is expected to be useful for genome-wide analysis given its fast performance. Moreover, high-accuracy prediction of both secondary and tertiary structural properties such as backbone angles and solvent accessibility without sequence alignment suggests that highly accurate prediction of protein structures may be made without homologous sequences, the remaining obstacle in the post AlphaFold2 era.
Collapse
Affiliation(s)
- Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Thomas Litfin
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, Goldcoast, QLD, 4222, Australia.
- Shenzhen Bay Laboratory, Institute for Systems and Physical Biology, Shenzhen, 518055, People's Republic of China.
- Peking University Shenzhen Graduate School, Shenzhen, 518055, People's Republic of China.
| |
Collapse
|
147
|
Shinwari K, Rehman HM, Liu G, Bolkov MA, Tuzankina IA, Chereshnev VA. Novel Disease-Associated Missense Single-Nucleotide Polymorphisms Variants Predication by Algorithms Tools and Molecular Dynamics Simulation of Human TCIRG1 Gene Causing Congenital Neutropenia and Osteopetrosis. Front Mol Biosci 2022; 9:879875. [PMID: 35573728 PMCID: PMC9095858 DOI: 10.3389/fmolb.2022.879875] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 04/04/2022] [Indexed: 12/16/2022] Open
Abstract
T Cell Immune Regulator 1, ATPase H + Transporting V0 Subunit A3 (TCIRG1 gene provides instructions for making one part, the a3 subunit, of a large protein complex known as a vacuolar H + -ATPase (V-ATPase). V-ATPases are a group of similar complexes that act as pumps to move positively charged hydrogen atoms (protons) across membranes. Single amino acid changes in highly conserved areas of the TCIRG1 protein have been linked to autosomal recessive osteopetrosis and severe congenital neutropenia. We used multiple computational approaches to classify disease-prone single nucleotide polymorphisms (SNPs) in TCIRG1. We used molecular dynamics analysis to identify the deleterious nsSNPs, build mutant protein structures, and assess the impact of mutation. Our results show that fifteen nsSNPs (rs199902030, rs200149541, rs372499913, rs267605221, rs374941368, rs375717418, rs80008675, rs149792489, rs116675104, rs121908250, rs121908251, rs121908251, rs149792489 and rs116675104) variants are likely to be highly deleterious mutations as by incorporating them into wild protein they destabilize the wild protein structure and function. They are also located in the V-ATPase I domain, which may destabilize the structure and impair TCIRG1 protein activation, as well as reduce its ATPase effectiveness. These mutants have not yet been identified in patients suffering from CN and osteopetrosis while (G405R, R444L, and D517N) reported in our study are already associated with osteopetrosis. Mutation V52L reported in our study was identified in a patient suspected for CN. Finally, these mutants can help to further understand the broad pool of illness susceptibilities associated with TCIRG1 catalytic kinase domain activation and aid in the development of an effective treatment for associated diseases.
Collapse
Affiliation(s)
- Khyber Shinwari
- Institute of Chemical Engineering, Department of Immunochemistry, Ural Federal University, Yekaterinburg, Russia
| | - Hafiz Muzzammel Rehman
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore, Pakistan
- Alnoorians Group of Institutes, Shad Bagh, Lahore, Pakistan
| | - Guojun Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Mikhail A. Bolkov
- Institute of Chemical Engineering, Department of Immunochemistry, Ural Federal University, Yekaterinburg, Russia
- Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia
| | - Irina A. Tuzankina
- Institute of Chemical Engineering, Department of Immunochemistry, Ural Federal University, Yekaterinburg, Russia
- Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia
| | - Valery. A. Chereshnev
- Institute of Chemical Engineering, Department of Immunochemistry, Ural Federal University, Yekaterinburg, Russia
- Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia
| |
Collapse
|
148
|
Shishir TA, Jannat T, Naser IB. An in-silico study of the mutation-associated effects on the spike protein of SARS-CoV-2, Omicron variant. PLoS One 2022; 17:e0266844. [PMID: 35446879 PMCID: PMC9022835 DOI: 10.1371/journal.pone.0266844] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/28/2022] [Indexed: 01/16/2023] Open
Abstract
The emergence of Omicron (B.1.1.529), a new Variant of Concern in the COVID-19 pandemic, while accompanied by the ongoing Delta variant infection, has once again fueled fears of a new infection wave and global health concern. In the Omicron variant, the receptor-binding domain (RBD) of its spike glycoprotein is heavily mutated, a feature critical for the transmission rate of the virus by interacting with hACE2. In this study, we used a combination of conventional and advanced neural network-based in silico approaches to predict how these mutations would affect the spike protein. The results demonstrated a decrease in the electrostatic potentials of residues corresponding to receptor recognition sites, an increase in the alkalinity of the protein, a change in hydrophobicity, variations in functional residues, and an increase in the percentage of alpha-helix structure. Moreover, several mutations were found to modulate the immunologic properties of the potential epitopes predicted from the spike protein. Our next step was to predict the structural changes of the spike and their effect on its interaction with the hACE2. The results revealed that the RBD of the Omicron variant had a higher affinity than the reference. Moreover, all-atom molecular dynamics simulations concluded that the RBD of the Omicron variant exhibits a more dispersed interaction network since mutations resulted in an increased number of hydrophobic interactions and hydrogen bonds with hACE2.
Collapse
Affiliation(s)
- Tushar Ahmed Shishir
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
- Rangamati General Hospital, Chattogram, Bangladesh
| | - Taslimun Jannat
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
| | - Iftekhar Bin Naser
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
149
|
Erath J, Djuranovic S. Association of the receptor for activated C-kinase 1 with ribosomes in Plasmodium falciparum. J Biol Chem 2022; 298:101954. [PMID: 35452681 PMCID: PMC9120242 DOI: 10.1016/j.jbc.2022.101954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 03/31/2022] [Accepted: 04/13/2022] [Indexed: 11/18/2022] Open
Abstract
The receptor for activated C-kinase 1 (RACK1), a highly conserved eukaryotic protein, is known to have many varying biological roles and functions. Previous work has established RACK1 as a ribosomal protein, with defined regions important for ribosome binding in eukaryotic cells. In Plasmodium falciparum, RACK1 has been shown to be required for parasite growth, however, conflicting evidence has been presented about RACK1 ribosome binding and its role in mRNA translation. Given the importance of RACK1 as a regulatory component of mRNA translation and ribosome quality control, the case could be made in parasites that RACK1 either binds or does not bind the ribosome. Here, we used bioinformatics and transcription analyses to further characterize the P. falciparum RACK1 protein. Based on homology modeling and structural analyses, we generated a model of P. falciparum RACK1. We then explored mutant and chimeric human and P. falciparum RACK1 protein binding properties to the human and P. falciparum ribosome. We found that WT, chimeric, and mutant RACK1 exhibit distinct ribosome interactions suggesting different binding characteristics for P. falciparum and human RACK1 proteins. The ribosomal binding of RACK1 variants in human and parasite cells shown here demonstrates that although RACK1 proteins have highly conserved sequences and structures across species, ribosomal binding is affected by species-specific alterations to this protein. In conclusion, we show that in the case of P. falciparum, contrary to the structural data, RACK1 is found to bind ribosomes and actively translating polysomes in parasite cells.
Collapse
Affiliation(s)
- Jessey Erath
- Department of Cell Biology and Physiology, Washington University School of Medicine, St Louis, Missouri, USA
| | - Sergej Djuranovic
- Department of Cell Biology and Physiology, Washington University School of Medicine, St Louis, Missouri, USA.
| |
Collapse
|
150
|
Cardoch S, Timneanu N, Caleman C, Scheicher RH. Distinguishing between Similar Miniproteins with Single-Molecule Nanopore Sensing: A Computational Study. ACS NANOSCIENCE AU 2022; 2:119-127. [PMID: 37101662 PMCID: PMC10125149 DOI: 10.1021/acsnanoscienceau.1c00022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A nanopore is a tool in single-molecule sensing biotechnology that offers label-free identification with high throughput. Nanopores have been successfully applied to sequence DNA and show potential in the study of proteins. Nevertheless, the task remains challenging due to the large variability in size, charges, and folds of proteins. Miniproteins have a small number of residues, limited secondary structure, and stable tertiary structure, which can offer a systematic way to reduce complexity. In this computational work, we theoretically evaluated sensing two miniproteins found in the human body using a silicon nitride nanopore. We employed molecular dynamics methods to compute occupied-pore ionic current magnitudes and electronic structure calculations to obtain interaction strengths between pore wall and miniprotein. From the interaction strength, we derived dwell times using a mix of combinatorics and numerical solutions. This latter approach circumvents typical computational demands needed to simulate translocation events using molecular dynamics. We focused on two miniproteins potentially difficult to distinguish owing to their isotropic geometry, similar number of residues, and overall comparable structure. We found that the occupied-pore current magnitudes not to vary significantly, but their dwell times differ by 1 order of magnitude. Together, these results suggest a successful identification protocol for similar miniproteins.
Collapse
Affiliation(s)
- Sebastian Cardoch
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden
| | - Nicusor Timneanu
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden
| | - Carl Caleman
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden
- Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany
| | - Ralph H. Scheicher
- Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20 Uppsala, Sweden
| |
Collapse
|