1
|
Pražnikar J. Using graphlet degree vectors to predict atomic displacement parameters in protein structures. Acta Crystallogr D Struct Biol 2023; 79:1109-1119. [PMID: 37987168 PMCID: PMC10833351 DOI: 10.1107/s2059798323009142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/17/2023] [Indexed: 11/22/2023] Open
Abstract
In structural biology, atomic displacement parameters, commonly used in the form of B values, describe uncertainties in atomic positions. Their distribution over the structure can provide hints on local structural reliability and mobility. A spatial macromolecular model can be represented by a graph whose nodes are atoms and whose edges correspond to all interatomic contacts within a certain distance. Small connected subgraphs, called graphlets, provide information about the wiring of a particular atom. The multiple linear regression approach based on this information aims to predict a distribution of values of isotropic atomic displacement parameters (B values) within a protein structure, given the atomic coordinates and molecular packing. By modeling the dynamic component of atomic uncertainties, this method allows the B values obtained from experimental crystallographic or cryo-electron microscopy studies to be reproduced relatively well.
Collapse
Affiliation(s)
- Jure Pražnikar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, Koper, Slovenia
- Department of Biochemistry, Molecular and Structural Biology, Institute Jožef Stefan, Jamova 39, Ljubljana, Slovenia
| |
Collapse
|
2
|
Tang YJ, Yan K, Zhang X, Tian Y, Liu B. Protein intrinsically disordered region prediction by combining neural architecture search and multi-objective genetic algorithm. BMC Biol 2023; 21:188. [PMID: 37674132 PMCID: PMC10483879 DOI: 10.1186/s12915-023-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Intrinsically disordered regions (IDRs) are widely distributed in proteins and related to many important biological functions. Accurately identifying IDRs is of great significance for protein structure and function analysis. Because the long disordered regions (LDRs) and short disordered regions (SDRs) share different characteristics, the existing predictors fail to achieve better and more stable performance on datasets with different ratios between LDRs and SDRs. There are two main reasons. First, the existing predictors construct network structures based on their own experiences such as convolutional neural network (CNN) which is used to extract the feature of neighboring residues in protein, and long short-term memory (LSTM) is used to extract the long-distance dependencies feature of protein residues. But these networks cannot capture the hidden feature associated with the length-dependent between residues. Second, many algorithms based on deep learning have been proposed but the complementarity of the existing predictors is not fully explored and used. RESULTS In this study, the neural architecture search (NAS) algorithm was employed to automatically construct the network structures so as to capture the hidden features in protein sequences. In order to stably predict both the LDRs and SDRs, the model constructed by NAS was combined with length-dependent models for capturing the unique features of SDRs or LDRs and general models for capturing the common features between LDRs and SDRs. A new predictor called IDP-Fusion was proposed. CONCLUSIONS Experimental results showed that IDP-Fusion can achieve more stable performance than the other existing predictors on independent test sets with different ratios between SDRs and LDRs.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China
| | - Xingyi Zhang
- School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Ye Tian
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
3
|
Wang W, Su X, Liu D, Zhang H, Wang X, Zhou Y. Predicting DNA-binding protein and coronavirus protein flexibility using protein dihedral angle and sequence feature. Proteins 2023; 91:497-507. [PMID: 36321218 PMCID: PMC9877568 DOI: 10.1002/prot.26443] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 09/07/2022] [Accepted: 10/20/2022] [Indexed: 11/07/2022]
Abstract
The flexibility of protein structure is related to various biological processes, such as molecular recognition, allosteric regulation, catalytic activity, and protein stability. At the molecular level, protein dynamics and flexibility are important factors to understand protein function. DNA-binding proteins and Coronavirus proteins are of great concern and relatively unique proteins. However, exploring the flexibility of DNA-binding proteins and Coronavirus proteins through experiments or calculations is a difficult process. Since protein dihedral rotational motion can be used to predict protein structural changes, it provides key information about protein local conformation. Therefore, this paper introduces a method to improve the accuracy of protein flexibility prediction, DihProFle (Prediction of DNA-binding proteins and Coronavirus proteins flexibility introduces the calculated dihedral Angle information). Based on protein dihedral Angle information, protein evolution information, and amino acid physical and chemical properties, DihProFle realizes the prediction of protein flexibility in two cases on DNA-binding proteins and Coronavirus proteins, and assigns flexibility class to each protein sequence position. In this study, compared with the flexible prediction using sequence evolution information, and physicochemical properties of amino acids, the flexible prediction accuracy based on protein dihedral Angle information, sequence evolution information and physicochemical properties of amino acids improved by 2.2% and 3.1% in the nonstrict and strict conditions, respectively. And DihProFle achieves better performance than previous methods for protein flexibility analysis. In addition, we further analyzed the correlation of amino acid properties and protein dihedral angles with residues flexibility. The results show that the charged hydrophilic residues have higher proportion in the flexible region, and the rigid region tends to be in the angular range of the protein dihedral angle (such as the ψ angle of amino acid residues is more flexible than rigid in the range of 91°-120°). Therefore, the results indicate that hydrophilic residues and protein dihedral angle information play an important role in protein flexibility.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China
| | - Xili Su
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Hongjun Zhang
- School of Computer Science and Technology, Anyang University, Anyang, China
| | - Xianfang Wang
- College of Computer Science and Technology Engineering, Henan Institute of Technology, Xinxiang, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| |
Collapse
|
4
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
5
|
Rozano L, Mukuka YM, Hane JK, Mancera RL. Ab Initio Modelling of the Structure of ToxA-like and MAX Fungal Effector Proteins. Int J Mol Sci 2023; 24:ijms24076262. [PMID: 37047233 PMCID: PMC10094246 DOI: 10.3390/ijms24076262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/09/2023] [Accepted: 03/21/2023] [Indexed: 03/29/2023] Open
Abstract
Pathogenic fungal diseases in crops are mediated by the release of effector proteins that facilitate infection. Characterising the structure of these fungal effectors is vital to understanding their virulence mechanisms and interactions with their hosts, which is crucial in the breeding of plant cultivars for disease resistance. Several effectors have been identified and validated experimentally; however, their lack of sequence conservation often impedes the identification and prediction of their structure using sequence similarity approaches. Structural similarity has, nonetheless, been observed within fungal effector protein families, creating interest in validating the use of computational methods to predict their tertiary structure from their sequence. We used Rosetta ab initio modelling to predict the structures of members of the ToxA-like and MAX effector families for which experimental structures are known to validate this method. An optimised approach was then used to predict the structures of phenotypically validated effectors lacking known structures. Rosetta was found to successfully predict the structure of fungal effectors in the ToxA-like and MAX families, as well as phenotypically validated but structurally unconfirmed effector sequences. Interestingly, potential new effector structural families were identified on the basis of comparisons with structural homologues and the identification of associated protein domains.
Collapse
|
6
|
Gu J, Xu Y, Nie Y. Role of distal sites in enzyme engineering. Biotechnol Adv 2023; 63:108094. [PMID: 36621725 DOI: 10.1016/j.biotechadv.2023.108094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 11/15/2022] [Accepted: 01/01/2023] [Indexed: 01/06/2023]
Abstract
The limitations associated with natural enzyme catalysis have triggered the rise of the field of protein engineering. Traditional rational design was based on the analysis of protein structural information and catalytic mechanisms to identify key active sites or ligand binding sites to reshape the substrate pocket. The role and significance of functional sites in the active center have been studied extensively. With a deeper understanding of the structure-catalysis relationship map, the entire protein molecule can be filled with residues that play a substantial role in its structure and function. However, the catalytic mechanism underlying distal mutations remains unclear. The aim of this review was to highlight the criticality of the distal site in enzyme engineering based on the following three aspects: What can distal mutations exert on function from mutability landscape? How do distal sites influence enzyme function? How to predict and design distal mutations? This review provides insights into the catalytic mechanism of enzymes from the global interaction network, knowledge from sequence-structure-dynamics-function relationships, and strategies for distal mutation-based protein engineering.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| |
Collapse
|
7
|
Graf F, Zehentner B, Fellner L, Scherer S, Neuhaus K. Three Novel Antisense Overlapping Genes in E. coli O157:H7 EDL933. Microbiol Spectr 2023; 11:e0235122. [PMID: 36533921 PMCID: PMC9927249 DOI: 10.1128/spectrum.02351-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 12/03/2022] [Indexed: 12/23/2022] Open
Abstract
The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated. To date, only a few examples of such genes are fully established. Using RNA sequencing and ribosome profiling, we found expression of novel overlapping open reading frames in Escherichia coli O157:H7 EDL933 (EHEC). Indeed, the overlapping candidate genes are equipped with typical structural elements required for transcription and translation, i.e., promoters, transcription start sites, as well as terminators, all of which were experimentally verified. Translationally arrested mutants, unable to produce the overlapping encoded protein, were found to have a growth disadvantage when grown competitively against the wild type. Thus, the phenotypes found imply biological functionality of the genes at the level of proteins produced. The addition of 3 more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes emphasizes the underestimated coding capacity of bacterial genomes. IMPORTANCE The abundance of long overlapping genes in prokaryotic genomes is likely to be significantly underestimated, since such genes are not allowed in genome annotations. However, ribosome profiling catches mRNA in the moment of being template for protein production. Using this technique and subsequent experiments, we verified 3 novel overlapping genes encoded in antisense of known genes. This adds more examples of prokaryotic overlapping genes to the currently limited, yet constantly growing pool of such genes.
Collapse
Affiliation(s)
- Franziska Graf
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Barbara Zehentner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Lea Fellner
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Siegfried Scherer
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technische Universität München, Freising, Germany
- Chair for Microbial Ecology, TUM School of Life Sciences, Technische Universität München, Freising, Germany
| |
Collapse
|
8
|
Gordeeva TL, Borshchevskaya LN, Sineoky SP. Biochemical characterisation of glycosylated and deglycosylated forms of phytase from Cronobacter turicensis expressed in Pichia pastoris. Enzyme Microb Technol 2023; 162:110136. [DOI: 10.1016/j.enzmictec.2022.110136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/07/2022] [Accepted: 09/24/2022] [Indexed: 11/25/2022]
|
9
|
An in silico reverse vaccinology study of Brachyspira pilosicoli, the causative organism of intestinal spirochaetosis, to identify putative vaccine candidates. Process Biochem 2022. [DOI: 10.1016/j.procbio.2022.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Chen R, Li X, Yang Y, Song X, Wang C, Qiao D. Prediction of protein-protein interaction sites in intrinsically disordered proteins. Front Mol Biosci 2022; 9:985022. [PMID: 36250006 PMCID: PMC9567019 DOI: 10.3389/fmolb.2022.985022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) participate in many biological processes by interacting with other proteins, including the regulation of transcription, translation, and the cell cycle. With the increasing amount of disorder sequence data available, it is thus crucial to identify the IDP binding sites for functional annotation of these proteins. Over the decades, many computational approaches have been developed to predict protein-protein binding sites of IDP (IDP-PPIS) based on protein sequence information. Moreover, there are new IDP-PPIS predictors developed every year with the rapid development of artificial intelligence. It is thus necessary to provide an up-to-date overview of these methods in this field. In this paper, we collected 30 representative predictors published recently and summarized the databases, features and algorithms. We described the procedure how the features were generated based on public data and used for the prediction of IDP-PPIS, along with the methods to generate the feature representations. All the predictors were divided into three categories: scoring functions, machine learning-based prediction, and consensus approaches. For each category, we described the details of algorithms and their performances. Hopefully, our manuscript will not only provide a full picture of the status quo of IDP binding prediction, but also a guide for selecting different methods. More importantly, it will shed light on the inspirations for future development trends and principles.
Collapse
Affiliation(s)
- Ranran Chen
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xinlu Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Yaqing Yang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xixi Song
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
- *Correspondence: Cheng Wang, ; Dongdong Qiao,
| | - Dongdong Qiao
- Shandong Mental Health Center, Shandong University, Jinan, China
- *Correspondence: Cheng Wang, ; Dongdong Qiao,
| |
Collapse
|
11
|
Prediction of B cell epitopes in proteins using a novel sequence similarity-based method. Sci Rep 2022; 12:13739. [PMID: 35962028 PMCID: PMC9374694 DOI: 10.1038/s41598-022-18021-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 08/03/2022] [Indexed: 11/29/2022] Open
Abstract
Prediction of B cell epitopes that can replace the antigen for antibody production and detection is of great interest for research and the biotech industry. Here, we developed a novel BLAST-based method to predict linear B cell epitopes. To that end, we generated a BLAST-formatted database upon a dataset of 62,730 known linear B cell epitope sequences and considered as a B cell epitope any peptide sequence producing ungapped BLAST hits to this database with identity ≥ 80% and length ≥ 8. We examined B cell epitope predictions by this method in tenfold cross-validations in which we considered various types of non-B cell epitopes, including 62,730 peptide sequences with verified negative B cell assays. As a result, we obtained values of accuracy, specificity and sensitivity of 72.54 ± 0.27%, 81.59 ± 0.37% and 63.49 ± 0.43%, respectively. In an independent dataset incorporating 503 B cell epitopes, this method reached accuracy, specificity and sensitivity of 74.85%, 99.20% and 50.50%, respectively, outperforming state-of-the-art methods to predict linear B cell epitopes. We implemented this BLAST-based approach to predict B cell epitopes at http://imath.med.ucm.es/bepiblast.
Collapse
|
12
|
In-Silico Design of a Multi‑epitope Construct Against Influenza A Based on Nucleoprotein Gene. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10418-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
13
|
Chakrabarti P, Chakravarty D. Intrinsically disordered proteins/regions and insight into their biomolecular interactions. Biophys Chem 2022; 283:106769. [DOI: 10.1016/j.bpc.2022.106769] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/26/2022] [Accepted: 01/26/2022] [Indexed: 12/20/2022]
|
14
|
Tang YJ, Pang YH, Liu B. DeepIDP-2L: protein intrinsically disordered region prediction by combining convolutional attention network and hierarchical attention network. Bioinformatics 2022; 38:1252-1260. [PMID: 34864847 DOI: 10.1093/bioinformatics/btab810] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 11/02/2021] [Accepted: 11/26/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. The IDRs are divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. Previous studies have shown that LDRs and SDRs have different proprieties. However, the existing computational methods fail to extract different features for LDRs and SDRs separately. As a result, they achieve unstable performance on datasets with different ratios of LDRs and SDRs. RESULTS In this study, a two-layer predictor was proposed called DeepIDP-2L. In the first layer, two kinds of attention-based models are used to extract different features for LDRs and SDRs, respectively. The hierarchical attention network is used to capture the distribution pattern features of LDRs, and convolutional attention network is used to capture the local correlation features of SDRs. The second layer of DeepIDP-2L maps the feature extracted in the first layer into a new feature space. Convolutional network and bidirectional long short term memory are used to capture the local and long-range information for predicting both SDRs and LDRs. Experimental results show that DeepIDP-2L can achieve more stable performance than other exiting predictors on independent test sets with different ratios of SDRs and LDRs. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the new predictor has been established at http://bliulab.net/DeepIDP-2L/. It is anticipated that DeepIDP-2L will become a very useful tool for identification of intrinsically disordered regions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
15
|
Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022; 2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the last two decades it has become increasingly evident that a large number of proteins adopt either a fully or a partially disordered conformation. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded by the amino acid sequence, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting protein disorder and identifying intrinsically disordered binding sites.
Collapse
Affiliation(s)
- Ketty C Tamburrini
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Giulia Pesce
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Juliet Nilsson
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Frank Gondelaud
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Université Montpellier, Montpellier, France
| | - Jean-Guy Berrin
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Sonia Longhi
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France.
| |
Collapse
|
16
|
Mansour H, Banaganapalli B, Nasser KK, Al-Aama JY, Shaik NA, Saadah OI, Elango R. Genome-Wide Association Study-Guided Exome Rare Variant Burden Analysis Identifies IL1R1 and CD3E as Potential Autoimmunity Risk Genes for Celiac Disease. Front Pediatr 2022; 10:837957. [PMID: 35237542 PMCID: PMC8882628 DOI: 10.3389/fped.2022.837957] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 01/04/2022] [Indexed: 12/14/2022] Open
Abstract
Celiac disease (CeD) is a multifactorial autoimmune enteropathy characterized by the overactivation of the immune system in response to dietary gluten. The molecular etiology of CeD is still not well-understood. Therefore, this study aims to identify potential candidate genes involved in CeD pathogenesis by applying multilayered system biology approaches. Initially, we identified rare coding variants shared between the affected siblings in two rare Arab CeD families by whole-exome sequencing (WES). Then we used the STRING database to construct a protein network of rare variants and genome-wide association study (GWAS) loci to explore their molecular interactions in CeD. Furthermore, the hub genes identified based on network topology parameters were subjected to a series of computational validation analyses like pathway enrichment, gene expression, knockout mouse model, and variant pathogenicity predictions. Our findings have shown the absence of rare variants showing classical Mendelian inheritance in both families. However, interactome analysis of rare WES variants and GWAS loci has identified a total of 11 hub genes. The multidimensional computational analysis of hub genes has prioritized IL1R1 for family A and CD3E for family B as potential genes. These genes were connected to CeD pathogenesis pathways of T-cell selection, cytokine signaling, and adaptive immune response. Future multi-omics studies may uncover the roles of IL1R1 and CD3E in gluten sensitivity. The present investigation lays forth a novel approach integrating next-generation sequencing (NGS) of familial cases, GWAS, and computational analysis for solving the complex genetic architecture of CeD.
Collapse
Affiliation(s)
- Haifa Mansour
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Babajan Banaganapalli
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Khalidah Khalid Nasser
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia.,Pediatric Gastroenterology Unit, Department of Pediatrics, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Jumana Yousuf Al-Aama
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Noor Ahmad Shaik
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Omar Ibrahim Saadah
- Pediatric Gastroenterology Unit, Department of Pediatrics, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Centre of Artificial Intelligence in Precision Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ramu Elango
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.,Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
17
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
18
|
Ras-Carmona A, Pelaez-Prestel HF, Lafuente EM, Reche PA. BCEPS: A Web Server to Predict Linear B Cell Epitopes with Enhanced Immunogenicity and Cross-Reactivity. Cells 2021; 10:cells10102744. [PMID: 34685724 PMCID: PMC8534968 DOI: 10.3390/cells10102744] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/11/2021] [Accepted: 10/12/2021] [Indexed: 02/06/2023] Open
Abstract
Prediction of linear B cell epitopes is of interest for the production of antigen-specific antibodies and the design of peptide-based vaccines. Here, we present BCEPS, a web server for predicting linear B cell epitopes tailored to select epitopes that are immunogenic and capable of inducing cross-reactive antibodies with native antigens. BCEPS implements various machine learning models trained on a dataset including 555 linearized conformational B cell epitopes that were mined from antibody–antigen protein structures. The best performing model, based on a support vector machine, reached an accuracy of 75.38% ± 5.02. In an independent dataset consisting of B cell epitopes retrieved from the Immune Epitope Database (IEDB), this model achieved an accuracy of 67.05%. In BCEPS, predicted epitopes can be ranked according to properties such as flexibility, accessibility and hydrophilicity, and with regard to immunogenicity, as judged by their predicted presentation by MHC II molecules. BCEPS also detects if predicted epitopes are located in ectodomains of membrane proteins and if they possess N-glycosylation sites hindering antibody recognition. Finally, we exemplified the use of BCEPS in the SARS-CoV-2 Spike protein, showing that it can identify B cell epitopes targeted by neutralizing antibodies.
Collapse
|
19
|
O’Donoghue SI, Schafferhans A, Sikta N, Stolte C, Kaur S, Ho BK, Anderson S, Procter JB, Dallago C, Bordin N, Adcock M, Rost B. SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms. Mol Syst Biol 2021; 17:e10079. [PMID: 34519429 PMCID: PMC8438690 DOI: 10.15252/msb.202010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 08/05/2021] [Accepted: 08/06/2021] [Indexed: 01/18/2023] Open
Abstract
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.
Collapse
MESH Headings
- Amino Acid Transport Systems, Neutral/chemistry
- Amino Acid Transport Systems, Neutral/genetics
- Amino Acid Transport Systems, Neutral/metabolism
- Angiotensin-Converting Enzyme 2/chemistry
- Angiotensin-Converting Enzyme 2/genetics
- Angiotensin-Converting Enzyme 2/metabolism
- Binding Sites
- COVID-19/genetics
- COVID-19/metabolism
- COVID-19/virology
- Computational Biology/methods
- Coronavirus Envelope Proteins/chemistry
- Coronavirus Envelope Proteins/genetics
- Coronavirus Envelope Proteins/metabolism
- Coronavirus Nucleocapsid Proteins/chemistry
- Coronavirus Nucleocapsid Proteins/genetics
- Coronavirus Nucleocapsid Proteins/metabolism
- Host-Pathogen Interactions/genetics
- Humans
- Mitochondrial Membrane Transport Proteins/chemistry
- Mitochondrial Membrane Transport Proteins/genetics
- Mitochondrial Membrane Transport Proteins/metabolism
- Mitochondrial Precursor Protein Import Complex Proteins
- Models, Molecular
- Molecular Mimicry
- Neuropilin-1/chemistry
- Neuropilin-1/genetics
- Neuropilin-1/metabolism
- Phosphoproteins/chemistry
- Phosphoproteins/genetics
- Phosphoproteins/metabolism
- Protein Binding
- Protein Conformation, alpha-Helical
- Protein Conformation, beta-Strand
- Protein Interaction Domains and Motifs
- Protein Interaction Mapping/methods
- Protein Multimerization
- Protein Processing, Post-Translational
- SARS-CoV-2/chemistry
- SARS-CoV-2/genetics
- SARS-CoV-2/metabolism
- Spike Glycoprotein, Coronavirus/chemistry
- Spike Glycoprotein, Coronavirus/genetics
- Spike Glycoprotein, Coronavirus/metabolism
- Viral Matrix Proteins/chemistry
- Viral Matrix Proteins/genetics
- Viral Matrix Proteins/metabolism
- Viroporin Proteins/chemistry
- Viroporin Proteins/genetics
- Viroporin Proteins/metabolism
- Virus Replication
Collapse
Affiliation(s)
- Seán I O’Donoghue
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- CSIRO Data61CanberraACTAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Andrea Schafferhans
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- Department of Bioengineering SciencesWeihenstephan‐Tr. University of Applied SciencesFreisingGermany
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Neblina Sikta
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | - Sandeep Kaur
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Bosco K Ho
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | | | - Christian Dallago
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Nicola Bordin
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | | | - Burkhard Rost
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| |
Collapse
|
20
|
He H, Zhou Y, Chi Y, He J. Prediction of MoRFs based on sequence properties and convolutional neural networks. BioData Min 2021; 14:39. [PMID: 34391457 PMCID: PMC8364704 DOI: 10.1186/s13040-021-00275-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 08/08/2021] [Indexed: 12/02/2022] Open
Abstract
Background Intrinsically disordered proteins possess flexible 3-D structures, which makes them play an important role in a variety of biological functions. Molecular recognition features (MoRFs) act as an important type of functional regions, which are located within longer intrinsically disordered regions and undergo disorder-to-order transitions upon binding their interaction partners. Results We develop a method, MoRFCNN, to predict MoRFs based on sequence properties and convolutional neural networks (CNNs). The sequence properties contain structural and physicochemical properties which are used to describe the differences between MoRFs and non-MoRFs. Especially, to highlight the correlation between the target residue and adjacent residues, three windows are selected to preprocess the selected properties. After that, these calculated properties are combined into the feature matrix to predict MoRFs through the constructed CNN. Comparing with other existing methods, MoRFCNN obtains better performance. Conclusions MoRFCNN is a new individual MoRFs prediction method which just uses protein sequence properties without evolutionary information. The simulation results show that MoRFCNN is effective and competitive.
Collapse
Affiliation(s)
- Hao He
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Yatong Zhou
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China.
| | - Yue Chi
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Jingfei He
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| |
Collapse
|
21
|
Bernhofer M, Dallago C, Karl T, Satagopam V, Heinzinger M, Littmann M, Olenyi T, Qiu J, Schütze K, Yachdav G, Ashkenazy H, Ben-Tal N, Bromberg Y, Goldberg T, Kajan L, O’Donoghue S, Sander C, Schafferhans A, Schlessinger A, Vriend G, Mirdita M, Gawron P, Gu W, Jarosz Y, Trefois C, Steinegger M, Schneider R, Rost B. PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res 2021; 49:W535-W540. [PMID: 33999203 PMCID: PMC8265159 DOI: 10.1093/nar/gkab354] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/06/2021] [Accepted: 05/10/2021] [Indexed: 12/12/2022] Open
Abstract
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.
Collapse
Affiliation(s)
- Michael Bernhofer
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tim Karl
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Venkata Satagopam
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tobias Olenyi
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Jiajun Qiu
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Department of Otolaryngology Head & Neck Surgery, The Ninth People's Hospital & Ear Institute, School of Medicine & Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai Jiao Tong University, Shanghai, China
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Guy Yachdav
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Haim Ashkenazy
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Nir Ben-Tal
- Department of Biochemistry & Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - Tatyana Goldberg
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Laszlo Kajan
- Roche Polska Sp. z o.o., Domaniewska 39B, 02–672 Warsaw, Poland
| | | | - Chris Sander
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Boston, MA 02142, USA
| | - Andrea Schafferhans
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- HSWT (Hochschule Weihenstephan Triesdorf | University of Applied Sciences), Department of Bioengineering Sciences, Am Hofgarten 10, 85354 Freising, Germany
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Piotr Gawron
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Yohan Jarosz
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Christophe Trefois
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Reinhard Schneider
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
22
|
Computational Study on Temperature Driven Structure-Function Relationship of Polysaccharide Producing Bacterial Glycosyl Transferase Enzyme. Polymers (Basel) 2021; 13:polym13111771. [PMID: 34071348 PMCID: PMC8198650 DOI: 10.3390/polym13111771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 05/24/2021] [Accepted: 05/26/2021] [Indexed: 12/12/2022] Open
Abstract
Glycosyltransferase (GTs) is a wide class of enzymes that transfer sugar moiety, playing a key role in the synthesis of bacterial exopolysaccharide (EPS) biopolymer. In recent years, increased demand for bacterial EPSs has been observed in pharmaceutical, food, and other industries. The application of the EPSs largely depends upon their thermal stability, as any industrial application is mainly reliant on slow thermal degradation. Keeping this in context, EPS producing GT enzymes from three different bacterial sources based on growth temperature (mesophile, thermophile, and hyperthermophile) are considered for in silico analysis of the structural–functional relationship. From the present study, it was observed that the structural integrity of GT increases significantly from mesophile to thermophile to hyperthermophile. In contrast, the structural plasticity runs in an opposite direction towards mesophile. This interesting temperature-dependent structural property has directed the GT–UDP-glucose interactions in a way that thermophile has finally demonstrated better binding affinity (−5.57 to −10.70) with an increased number of hydrogen bonds (355) and stabilizing amino acids (Phe, Ala, Glu, Tyr, and Ser). The results from this study may direct utilization of thermophile-origin GT as best for industrial-level bacterial polysaccharide production.
Collapse
|
23
|
Wang YP, Wu EJ, Lurwanu Y, Ding JP, He DC, Waheed A, Nkurikiyimfura O, Liu ST, Li WY, Wang ZH, Yang L, Zhan J. Evidence for a synergistic effect of post-translational modifications and genomic composition of eEF-1α on the adaptation of Phytophthora infestans. Ecol Evol 2021; 11:5484-5496. [PMID: 34026022 PMCID: PMC8131795 DOI: 10.1002/ece3.7442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 02/19/2021] [Accepted: 02/21/2021] [Indexed: 12/18/2022] Open
Abstract
Genetic variation plays a fundamental role in pathogen's adaptation to environmental stresses. Pathogens with low genetic variation tend to survive and proliferate more poorly due to their lack of genotypic/phenotypic polymorphisms in responding to fluctuating environments. Evolutionary theory hypothesizes that the adaptive disadvantage of genes with low genomic variation can be compensated for structural diversity of proteins through post-translation modification (PTM) but this theory is rarely tested experimentally and its implication to sustainable disease management is hardly discussed. In this study, we analyzed nucleotide characteristics of eukaryotic translation elongation factor-1α (eEF-lα) gene from 165 Phytophthora infestans isolates and the physical and chemical properties of its derived proteins. We found a low sequence variation of eEF-lα protein, possibly attributable to purifying selection and a lack of intra-genic recombination rather than reduced mutation. In the only two isoforms detected by the study, the major one accounted for >95% of the pathogen collection and displayed a significantly higher fitness than the minor one. High lysine representation enhances the opportunity of the eEF-1α protein to be methylated and the absence of disulfide bonds is consistent with the structural prediction showing that many disordered regions are existed in the protein. Methylation, structural disordering, and possibly other PTMs ensure the ability of the protein to modify its functions during biological, cellular and biochemical processes, and compensate for its adaptive disadvantage caused by sequence conservation. Our results indicate that PTMs may function synergistically with nucleotide codes to regulate the adaptive landscape of eEF-1α, possibly as well as other housekeeping genes, in P. infestans. Compensatory evolution between pre- and post-translational phase in eEF-1α could enable pathogens quickly adapting to disease management strategies while efficiently maintaining critical roles of the protein playing in biological, cellular, and biochemical activities. Implications of these results to sustainable plant disease management are discussed.
Collapse
Affiliation(s)
- Yan-Ping Wang
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - E-Jiao Wu
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Yahuza Lurwanu
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
- Department of Crop Protection Bayero University Kano Kano Nigeria
| | - Ji-Peng Ding
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Dun-Chun He
- School of Economics and Trade Fujian Jiangxia University Fuzhou China
| | - Abdul Waheed
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Oswald Nkurikiyimfura
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Shi-Ting Liu
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Wen-Yang Li
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
| | - Zong-Hua Wang
- Fujian University Key Laboratory for Plant-Microbe Interaction College of Life Sciences Fujian Agriculture and Forestry University Fuzhou China
- Institute of Oceanography Minjiang University Fuzhou China
| | - Lina Yang
- Key lab for Bio pesticide and Chemical Biology Ministry of Education Fujian Agriculture and Forestry University Fuzhou China
- Institute of Oceanography Minjiang University Fuzhou China
| | - Jiasui Zhan
- Department of Forest Mycology and Plant Pathology Swedish University of Agricultural Sciences Uppsala Sweden
| |
Collapse
|
24
|
Vander Meersche Y, Cretin G, de Brevern AG, Gelly JC, Galochkina T. MEDUSA: Prediction of Protein Flexibility from Sequence. J Mol Biol 2021; 433:166882. [PMID: 33972018 DOI: 10.1016/j.jmb.2021.166882] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 02/12/2021] [Accepted: 02/13/2021] [Indexed: 12/11/2022]
Abstract
Information on the protein flexibility is essential to understand crucial molecular mechanisms such as protein stability, interactions with other molecules and protein functions in general. B-factor obtained in the X-ray crystallography experiments is the most common flexibility descriptor available for the majority of the resolved protein structures. Since the gap between the number of the resolved protein structures and available protein sequences is continuously growing, it is important to provide computational tools for protein flexibility prediction from amino acid sequence. In the current study, we report a Deep Learning based protein flexibility prediction tool MEDUSA (https://www.dsimb.inserm.fr/MEDUSA). MEDUSA uses evolutionary information extracted from protein homologous sequences and amino acid physico-chemical properties as input for a convolutional neural network to assign a flexibility class to each protein sequence position. Trained on a non-redundant dataset of X-ray structures, MEDUSA provides flexibility prediction in two, three and five classes. MEDUSA is freely available as a web-server providing a clear visualization of the prediction results as well as a standalone utility (https://github.com/DSIMB/medusa). Analysis of the MEDUSA output allows a user to identify the potentially highly deformable protein regions and general dynamic properties of the protein.
Collapse
Affiliation(s)
- Yann Vander Meersche
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Gabriel Cretin
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G de Brevern
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France.
| | - Tatiana Galochkina
- Université de Paris, Inserm UMR_S 1134 - BIGR, INTS, 6 rue Alexandre Cabanel, 75015 Paris, France; Laboratoire d'Excellence GR-Ex, 75015 Paris, France.
| |
Collapse
|
25
|
Tang YJ, Pang YH, Liu B. IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning. Bioinformatics 2021; 36:5177-5186. [PMID: 32702119 DOI: 10.1093/bioinformatics/btaa667] [Citation(s) in RCA: 83] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/21/2020] [Accepted: 07/17/2020] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization. RESULTS In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
26
|
Seoane B, Carbone A. The complexity of protein interactions unravelled from structural disorder. PLoS Comput Biol 2021; 17:e1008546. [PMID: 33417598 PMCID: PMC7846008 DOI: 10.1371/journal.pcbi.1008546] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 01/29/2021] [Accepted: 11/18/2020] [Indexed: 11/19/2022] Open
Abstract
The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved protein structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. At variance with previous works, our conclusions rely exclusively on a large-scale analysis of all the 134337 X-ray crystallographic structures of the Protein Data Bank averaged over clusters of almost identical protein sequences. In this work, we explore the complexity of the organisation of all the interaction interfaces observed when a protein lies in alternative complexes, showing that interfaces progressively add up in a hierarchical way, which is reflected in a logarithmic law for the size of the union of the interface regions on the number of distinct interfaces. We further investigate the connection of this complexity with different measures of structural disorder: the standard missing residues and a new definition, called "soft disorder", that covers all the flexible and structurally amorphous residues of a protein. We show evidences that both the interaction interfaces and the soft disordered regions tend to involve roughly the same amino-acids of the protein, and preliminary results suggesting that soft disorder spots those surface regions where new interfaces are progressively accommodated by complex formation. In fact, our results suggest that structurally disordered regions not only carry crucial information about the location of alternative interfaces within complexes, but also about the order of the assembly. We verify these hypotheses in several examples, such as the DNA binding domains of P53 and P73, the C3 exoenzyme, and two known biological orders of assembly. We finally compare our measures of structural disorder with several disorder bioinformatics predictors, showing that these latter are optimised to predict the residues that are missing in all the alternative structures of a protein and they are not able to catch the progressive evolution of the disordered regions upon complex formation. Yet, the predicted residues, when not missing, tend to be characterised as soft disordered regions.
Collapse
Affiliation(s)
- Beatriz Seoane
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Paris, France
- Sorbonne Université, Institut des Sciences du Calcul et des Données, Paris, France
- Departamento de Física Teórica, Universidad Complutense, Madrid, Spain
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Paris, France
| |
Collapse
|
27
|
Razmara E, Azimi H, Tavasoli AR, Fallahi E, Sheida SV, Eidi M, Bitaraf A, Farjami Z, Daneshmand MA, Garshasbi M. Novel neuroclinical findings of autosomal recessive primary microcephaly 15 in a consanguineous Iranian family. Eur J Med Genet 2020; 63:104096. [PMID: 33186761 DOI: 10.1016/j.ejmg.2020.104096] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/31/2022]
Abstract
Major facilitator superfamily domain-containing 2A (MFSD2A) is required for brain uptake of Docosahexaenoic acid and Lysophosphatidylcholine, both are essential for the normal neural development and function. Mutations in MFSD2A dysregulate the activity of this transporter in brain endothelial cells and can lead to microcephaly. In this study, we describe an 11-year-old male who is affected by autosomal recessive primary microcephaly 15. This patient also shows severe intellectual disability, recurrent respiratory and renal infections, low birth weight, and developmental delay. After doing clinical and neuroimaging evaluations, due to heterogeneity of neurogenetic disorders, no narrow clinical diagnosis was possible, therefore, we utilized targeted-exome sequencing to identify any causative genetic factors. This revealed a homozygous in-frame deletion (NM_001136493.1: c.241_243del; p.(Val81del)) in the MFSD2A gene as the most likely disease-susceptibility variant which was confirmed by Sanger sequencing. Neuroimaging revealed lateral ventricular asymmetry, corpus callosum hypoplasia, type B of cisterna magna, and widening of Sylvian fissures. All of these novel phenotypes are associated with autosomal recessive primary microcephaly-15 (MCPH15). According to the genotype-phenotype data, p.(Val81del) can be considered a likely pathogenic variant leading to non-lethal microcephaly. However, further cumulative data and molecular approaches are required to accurately identify genotype-phenotype correlations in MFSD2A.
Collapse
Affiliation(s)
- Ehsan Razmara
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Homeyra Azimi
- Pediatrician-official Genetic Counselor, Dr. Azimi Genetic Counseling Center, Arak, Iran
| | - Ali Reza Tavasoli
- Myelin Disorders Clinic, Pediatric Neurology Division, Children's Medical Center, Pediatrics Center of Excellence, Tehran University of Medical Sciences, Tehran, Iran
| | - Elnaz Fallahi
- Department of Biology, North Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Sadaf Valeh Sheida
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Milad Eidi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Amirreza Bitaraf
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Zahra Farjami
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | | | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
28
|
Qiu J, Nechaev D, Rost B. Protein-protein and protein-nucleic acid binding residues important for common and rare sequence variants in human. BMC Bioinformatics 2020; 21:452. [PMID: 33050876 PMCID: PMC7557062 DOI: 10.1186/s12859-020-03759-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Any two unrelated people differ by about 20,000 missense mutations (also referred to as SAVs: Single Amino acid Variants or missense SNV). Many SAVs have been predicted to strongly affect molecular protein function. Common SAVs (> 5% of population) were predicted to have, on average, more effect on molecular protein function than rare SAVs (< 1% of population). We hypothesized that the prevalence of effect in common over rare SAVs might partially be caused by common SAVs more often occurring at interfaces of proteins with other proteins, DNA, or RNA, thereby creating subgroup-specific phenotypes. We analyzed SAVs from 60,706 people through the lens of two prediction methods, one (SNAP2) predicting the effects of SAVs on molecular protein function, the other (ProNA2020) predicting residues in DNA-, RNA- and protein-binding interfaces. RESULTS Three results stood out. Firstly, SAVs predicted to occur at binding interfaces were predicted to more likely affect molecular function than those predicted as not binding (p value < 2.2 × 10-16). Secondly, for SAVs predicted to occur at binding interfaces, common SAVs were predicted more strongly with effect on protein function than rare SAVs (p value < 2.2 × 10-16). Restriction to SAVs with experimental annotations confirmed all results, although the resulting subsets were too small to establish statistical significance for any result. Thirdly, the fraction of SAVs predicted at binding interfaces differed significantly between tissues, e.g. urinary bladder tissue was found abundant in SAVs predicted at protein-binding interfaces, and reproductive tissues (ovary, testis, vagina, seminal vesicle and endometrium) in SAVs predicted at DNA-binding interfaces. CONCLUSIONS Overall, the results suggested that residues at protein-, DNA-, and RNA-binding interfaces contributed toward predicting that common SAVs more likely affect molecular function than rare SAVs.
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany. .,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany. .,Biobank of Ninth People's Hospital, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200125, China.
| | - Dmitrii Nechaev
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching, Munich, Germany.,Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354, Freising, Germany
| |
Collapse
|
29
|
de Brevern AG. Impact of protein dynamics on secondary structure prediction. Biochimie 2020; 179:14-22. [PMID: 32946990 DOI: 10.1016/j.biochi.2020.09.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/04/2020] [Accepted: 09/10/2020] [Indexed: 02/08/2023]
Abstract
Protein 3D structures support their biological functions. As the number of protein structures is negligible in regards to the number of available protein sequences, prediction methodologies relying only on protein sequences are essential tools. In this field, protein secondary structure prediction (PSSPs) is a mature area, and is considered to have reached a plateau. Nonetheless, proteins are highly dynamical macromolecules, a property that could impact the PSSP methods. Indeed, in a previous study, the stability of local protein conformations was evaluated demonstrating that some regions easily changed to another type of secondary structure. The protein sequences of this dataset were used by PSSPs and their results compared to molecular dynamics to investigate their potential impact on the quality of the secondary structure prediction. Interestingly, a direct link is observed between the quality of the prediction and the stability of the assignment to the secondary structure state. The more stable a local protein conformation is, the better the prediction will be. The secondary structure assignment not taken from the crystallized structures but from the conformations observed during the dynamics slightly increase the quality of the secondary structure prediction. These results show that evaluation of PSSPs can be done differently, but also that the notion of dynamics can be included in development of PSSPs and other approaches such as de novo approaches.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Biologie Intégrée Du Globule Rouge UMR_S1134, Inserm, Université de Paris, Univ. de la Réunion, Univ. des Antilles, F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; IBL, F-75015, Paris, France.
| |
Collapse
|
30
|
Cervantes-Montelongo JA, Silva-Martínez GA, Pliego-Arreaga R, Guevara-Olvera L, Ruiz-Herrera J. The UMAG_00031 gene from Ustilago maydis encodes a putative membrane protein involved in pH control and morphogenesis. Arch Microbiol 2020; 202:2221-2232. [PMID: 32529509 DOI: 10.1007/s00203-020-01936-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 03/18/2020] [Accepted: 06/04/2020] [Indexed: 12/11/2022]
Abstract
We report the characterization of the gene UMAG_00031 from Ustilago maydis, previously identified as upregulated at alkaline pH. This gene is located on chromosome 1 and contains an ORF of 1539 bp that encodes a putative protein of 512 amino acids with an MW of 54.8 kDa. The protein is predicted to contain seven transmembrane domains (TMDs) and a signal peptide suggesting that is located in the cell membrane. Null ΔUMAG_00031 mutants were constructed, and their phenotype was analyzed. The mutant displayed a pleiotropic phenotype suggesting its participation in processes of alkaline pH adaptation independent of the Pal/Rim pathway. Also, it was involved in the dimorphic process induced by fatty acids. These results indicate that the protein encoded by the UMAG_00031 gene possibly functions as a receptor of different signals in the cell membrane of the fungus.
Collapse
Affiliation(s)
- Juan Antonio Cervantes-Montelongo
- Laboratorio de Biología Molecular, Departamento de Ingeniería Bioquímica, Tecnológico Nacional de México en Celaya, Ave. Tecnológico y Antonio García Cubas S/N, col. FOVISSSTE, 38010, Celaya, Gto, Mexico
| | | | - Raquel Pliego-Arreaga
- Escuela de Medicina de La Universidad de Celaya, Carretera Panamericana, Rancho Pinto km 269, 38080, Celaya, Gto, Mexico
| | - Lorenzo Guevara-Olvera
- Laboratorio de Biología Molecular, Departamento de Ingeniería Bioquímica, Tecnológico Nacional de México en Celaya, Ave. Tecnológico y Antonio García Cubas S/N, col. FOVISSSTE, 38010, Celaya, Gto, Mexico
| | - José Ruiz-Herrera
- Departamento de Ingeniería Genética, Unidad Irapuato, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Apartado Postal 629, 36500, Irapuato, Gto, Mexico.
| |
Collapse
|
31
|
Akhila MV, Narwani TJ, Floch A, Maljković M, Bisoo S, Shinada NK, Kranjc A, Gelly JC, Srinivasan N, Mitić N, de Brevern AG. A structural entropy index to analyse local conformations in intrinsically disordered proteins. J Struct Biol 2020; 210:107464. [DOI: 10.1016/j.jsb.2020.107464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Revised: 01/06/2020] [Accepted: 01/15/2020] [Indexed: 10/25/2022]
|
32
|
Quinzo MJ, Lafuente EM, Zuluaga P, Flower DR, Reche PA. Computational assembly of a human Cytomegalovirus vaccine upon experimental epitope legacy. BMC Bioinformatics 2019; 20:476. [PMID: 31823715 PMCID: PMC6905002 DOI: 10.1186/s12859-019-3052-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 08/23/2019] [Indexed: 01/05/2023] Open
Abstract
Background Human Cytomegalovirus (HCMV) is a ubiquitous herpesvirus affecting approximately 90% of the world population. HCMV causes disease in immunologically naive and immunosuppressed patients. The prevention, diagnosis and therapy of HCMV infection are thus crucial to public health. The availability of effective prophylactic and therapeutic treatments remain a significant challenge and no vaccine is currently available. Here, we sought to define an epitope-based vaccine against HCMV, eliciting B and T cell responses, from experimentally defined HCMV-specific epitopes. Results We selected 398 and 790 experimentally validated HCMV-specific B and T cell epitopes, respectively, from available epitope resources and apply a knowledge-based approach in combination with immunoinformatic predictions to ensemble a universal vaccine against HCMV. The T cell component consists of 6 CD8 and 6 CD4 T cell epitopes that are conserved among HCMV strains. All CD8 T cell epitopes were reported to induce cytotoxic activity, are derived from early expressed genes and are predicted to provide population protection coverage over 97%. The CD4 T cell epitopes are derived from HCMV structural proteins and provide a population protection coverage over 92%. The B cell component consists of just 3 B cell epitopes from the ectodomain of glycoproteins L and H that are highly flexible and exposed to the solvent. Conclusions We have defined a multiantigenic epitope vaccine ensemble against the HCMV that should elicit T and B cell responses in the entire population. Importantly, although we arrived to this epitope ensemble with the help of computational predictions, the actual epitopes are not predicted but are known to be immunogenic.
Collapse
Affiliation(s)
- Monica J Quinzo
- Faculty of Medicine, University Complutense of Madrid, Pza Ramon y Cajal, s/n, 28040, Madrid, Spain
| | - Esther M Lafuente
- Faculty of Medicine, University Complutense of Madrid, Pza Ramon y Cajal, s/n, 28040, Madrid, Spain
| | - Pilar Zuluaga
- Faculty of Medicine, University Complutense of Madrid, Pza Ramon y Cajal, s/n, 28040, Madrid, Spain
| | - Darren R Flower
- School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET, UK
| | - Pedro A Reche
- Faculty of Medicine, University Complutense of Madrid, Pza Ramon y Cajal, s/n, 28040, Madrid, Spain.
| |
Collapse
|
33
|
Perry GML. 'Fat's chances': Loci for phenotypic dispersion in plasma leptin in mouse models of diabetes mellitus. PLoS One 2019; 14:e0222654. [PMID: 31661517 PMCID: PMC6818960 DOI: 10.1371/journal.pone.0222654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 09/04/2019] [Indexed: 01/29/2023] Open
Abstract
Background Leptin, a critical mediator of feeding, metabolism and diabetes, is expressed on an incidental basis according to satiety. The genetic regulation of leptin should similarly be episodic. Methodology Data from three mouse cohorts hosted by the Jackson Laboratory– 402 (174F, 228M) F2 Dilute Brown non-Agouti (DBA/2)×DU6i intercrosses, 142 Non Obese Diabetic (NOD/ShiLtJ×(NOD/ShiLtJ×129S1/SvImJ.H2g7) N2 backcross females, and 204 male Nonobese Nondiabetic (NON)×New Zealand Obese (NZO/HlLtJ) reciprocal backcrosses–were used to test for loci associated with absolute residuals in plasma leptin and arcsin-transformed percent fat (‘phenotypic dispersion’; PDpLep and PDAFP). Individual data from 1,780 mice from 43 inbred strains was also used to estimate genetic variances and covariances for dispersion in each trait. Principal findings Several loci for PDpLep were detected, including possibly syntenic Chr 17 loci, but there was only a single position on Chr 6 for PDAFP. Coding SNP in genes linked to the consensus Chr 17 PDpLep locus occurred in immunological and cancer genes, genes linked to diabetes and energy regulation, post-transcriptional processors and vomeronasal variants. There was evidence of intersexual differences in the genetic architecture of PDpLep. PDpLep had moderate heritability (hs2=0.29) and PDAFP low heritability (hs2=0.12); dispersion in these traits was highly genetically correlated r = 0.8). Conclusions Greater genetic variance for dispersion in plasma leptin, a physiological trait, may reflect its more ephemeral nature compared to body fat, an accrued progressive character. Genetic effects on incidental phenotypes such as leptin might be effectively characterized with randomization-detection methodologies in addition to classical approaches, helping identify incipient or borderline cases or providing new therapeutic targets.
Collapse
Affiliation(s)
- Guy M. L. Perry
- Department of Biology, University of Prince Edward Island, Charlottetown, PEI, Canada
- * E-mail:
| |
Collapse
|
34
|
He H, Zhao J, Sun G. Computational prediction of MoRFs based on protein sequences and minimax probability machine. BMC Bioinformatics 2019; 20:529. [PMID: 31660849 PMCID: PMC6819637 DOI: 10.1186/s12859-019-3111-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 09/20/2019] [Indexed: 11/25/2022] Open
Abstract
Background Molecular recognition features (MoRFs) are one important type of disordered segments that can promote specific protein-protein interactions. They are located within longer intrinsically disordered regions (IDRs), and undergo disorder-to-order transitions upon binding to their interaction partners. The functional importance of MoRFs and the limitation of experimental identification make it necessary to predict MoRFs accurately with computational methods. Results In this study, a new sequence-based method, named as MoRFMPM, is proposed for predicting MoRFs. MoRFMPM uses minimax probability machine (MPM) to predict MoRFs based on 16 features and 3 different windows, which neither relying on other predictors nor calculating the properties of the surrounding regions of MoRFs separately. Comparing with ANCHOR, MoRFpred and MoRFCHiBi on the same test sets, MoRFMPM not only obtains higher AUC, but also obtains higher TPR at low FPR. Conclusions The features used in MoRFMPM can effectively predict MoRFs, especially after preprocessing. Besides, MoRFMPM uses a linear classification algorithm and does not rely on results of other predictors which makes it accessible and repeatable.
Collapse
Affiliation(s)
- Hao He
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China
| | - Jiaxiang Zhao
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China.
| | - Guiling Sun
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, China
| |
Collapse
|
35
|
Narwani TJ, Etchebest C, Craveur P, Léonard S, Rebehmed J, Srinivasan N, Bornot A, Gelly JC, de Brevern AG. In silico prediction of protein flexibility with local structure approach. Biochimie 2019; 165:150-155. [PMID: 31377194 DOI: 10.1016/j.biochi.2019.07.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 07/26/2019] [Indexed: 12/30/2022]
Abstract
Flexibility is an intrinsic essential feature of protein structures, directly linked to their functions. To this day, most of the prediction methods use the crystallographic data (namely B-factors) as the only indicator of protein's inner flexibility and predicts them as rigid or flexible. PredyFlexy stands differently from other approaches as it relies on the definition of protein flexibility (i) not only taken from crystallographic data, but also (ii) from Root Mean Square Fluctuation (RMSFs) observed in Molecular Dynamics simulations. It also uses a specific representation of protein structures, named Long Structural Prototypes (LSPs). From Position-Specific Scoring Matrix, the 120 LSPs are predicted with a good accuracy and directly used to predict (i) the protein flexibility in three categories (flexible, intermediate and rigid), (ii) the normalized B-factors, (iii) the normalized RMSFs, and (iv) a confidence index. Prediction accuracy among these three classes is equivalent to the best two class prediction methods, while the normalized B-factors and normalized RMSFs have a good correlation with experimental and in silico values. Thus, PredyFlexy is a unique approach, which is of major utility for the scientific community. It support parallelization features and can be run on a local cluster using multiple cores.
Collapse
Affiliation(s)
- Tarun J Narwani
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France
| | - Catherine Etchebest
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France
| | - Pierrick Craveur
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Molecular Graphics Laboratory, Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Sylvain Léonard
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France
| | - Joseph Rebehmed
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Department of Computer Science and Mathematics, Lebanese American University, Byblos 1h401 2010, Lebanon
| | | | - Aurélie Bornot
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France
| | - Jean-Christophe Gelly
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France
| | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, Univ Paris, Univ de La Réunion, Univ des Antilles, F-75739, Paris, France; Institut National de La Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Molecular Graphics Laboratory, Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| |
Collapse
|
36
|
Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Brief Bioinform 2019; 20:330-346. [PMID: 30657889 DOI: 10.1093/bib/bbx126] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 01/06/2023] Open
Abstract
Intrinsically disordered proteins and regions are widely distributed in proteins, which are associated with many biological processes and diseases. Accurate prediction of intrinsically disordered proteins and regions is critical for both basic research (such as protein structure and function prediction) and practical applications (such as drug development). During the past decades, many computational approaches have been proposed, which have greatly facilitated the development of this important field. Therefore, a comprehensive and updated review is highly required. In this regard, we give a review on the computational methods for intrinsically disordered protein and region prediction, especially focusing on the recent development in this field. These computational approaches are divided into four categories based on their methodologies, including physicochemical-based method, machine-learning-based method, template-based method and meta method. Furthermore, their advantages and disadvantages are also discussed. The performance of 40 state-of-the-art predictors is directly compared on the target proteins in the task of disordered region prediction in the 10th Critical Assessment of protein Structure Prediction. A more comprehensive performance comparison of 45 different predictors is conducted based on seven widely used benchmark data sets. Finally, some open problems and perspectives are discussed.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, China
| |
Collapse
|
37
|
He H, Zhao J, Sun G. Prediction of MoRFs in Protein Sequences with MLPs Based on Sequence Properties and Evolution Information. ENTROPY 2019; 21:e21070635. [PMID: 33267349 PMCID: PMC7515128 DOI: 10.3390/e21070635] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 06/26/2019] [Accepted: 06/26/2019] [Indexed: 02/03/2023]
Abstract
Molecular recognition features (MoRFs) are one important type of intrinsically disordered proteins functional regions that can undergo a disorder-to-order transition through binding to their interaction partners. Prediction of MoRFs is crucial, as the functions of MoRFs are associated with many diseases and can therefore become the potential drug targets. In this paper, a method of predicting MoRFs is developed based on the sequence properties and evolutionary information. To this end, we design two distinct multi-layer perceptron (MLP) neural networks and present a procedure to train them. We develop a preprocessing process which exploits different sizes of sliding windows to capture various properties related to MoRFs. We then use the Bayes rule together with the outputs of two trained MLP neural networks to predict MoRFs. In comparison to several state-of-the-art methods, the simulation results show that our method is competitive.
Collapse
|
38
|
Identification of Intrinsically Disordered Proteins and Regions by Length-Dependent Predictors Based on Conditional Random Fields. MOLECULAR THERAPY-NUCLEIC ACIDS 2019; 17:396-404. [PMID: 31307006 PMCID: PMC6626971 DOI: 10.1016/j.omtn.2019.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 06/06/2019] [Accepted: 06/07/2019] [Indexed: 01/24/2023]
Abstract
Accurate identification of intrinsically disordered proteins/regions (IDPs/IDRs) is critical for predicting protein structure and function. Previous studies have shown that IDRs of different lengths have different characteristics, and several classification-based predictors have been proposed for predicting different types of IDRs. Compared with these classification-based predictors, the previously proposed predictor IDP-CRF exhibits state-of-the-art performance for predicting IDPs/IDRs, which is a sequence labeling model based on conditional random fields (CRFs). Motivated by these methods, we propose a predictor called IDP-FSP, which is an ensemble of three CRF-based predictors called IDP-FSP-L, IDP-FSP-S, and IDP-FSP-G. These three predictors are specially designed to predict long, short, and generic disordered regions, respectively, and they are constructed based on different features. To the best of our knowledge, IDP-FSP is the first predictor that combines a sequence labeling algorithm with IDRs of different lengths. Experimental results using two independent test datasets show that IDP-FSP achieves better or at least comparable predictive performance with 26 existing state-of-the-art methods in this field, proving the effectiveness of IDP-FSP.
Collapse
|
39
|
Salomon O, Barel O, Eyal E, Ganor RS, Kleinbaum Y, Shohat M. c.259A>C in the fibrinogen gene of alpha chain ( FGA) is a fibrinogen with thrombotic phenotype. APPLICATION OF CLINICAL GENETICS 2019; 12:27-33. [PMID: 30881084 PMCID: PMC6400116 DOI: 10.2147/tacg.s190599] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Introduction Dysfibrinogenemia is a rare inherited disease that results from mutation in one of the three fibrinogen genes. Diagnosis can be misleading since it may present as a bleeding tendency or thrombosis and a specific coagulation test for diagnosis is not routinely available Aim To search for a new candidate gene of thrombophilia in a family with three generations of arterial and venous thrombosis. Methods Whole exome sequencing followed by Sanger validation and segregation analysis was carried out. In addition, structural modeling was performed. Screening for thrombophilia along with blood counts, prothrombin time, activated partial thromboplastin, thrombin, reptilase time, and fibrinogen was done in each patient. Results and discussion A missense c.259A>C, p.K87Q (g.chr4: 155510050A-C) (rs764281241) in FGA gene was found in all three siblings without any other known thrombophilia marker to explain thrombosis in all three siblings. It is expected to be damaging by six out of seven prediction programs and is very rare in the entire population with Exac=0.000008. Conclusion The occurrence of the c.259A>C mutation in FGA may well explain the thrombosis phenotype of the affected family and is suggested as a new marker for thrombophilia phenotype.
Collapse
Affiliation(s)
- Ophira Salomon
- Institute of Thrombosis and Hemostasis, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel,
| | - Ortal Barel
- Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran Eyal
- Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Reut Shnerb Ganor
- The Bert W. Strassburger Lipid Center, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Yeroham Kleinbaum
- Diagnostic Imaging, Department of Radiology, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Mordechai Shohat
- Cancer Research Center, Wohl Institute of Translational Medicine, Sheba Medical Center, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
40
|
Sun Z, Liu Q, Qu G, Feng Y, Reetz MT. Utility of B-Factors in Protein Science: Interpreting Rigidity, Flexibility, and Internal Motion and Engineering Thermostability. Chem Rev 2019; 119:1626-1665. [PMID: 30698416 DOI: 10.1021/acs.chemrev.8b00290] [Citation(s) in RCA: 278] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Zhoutong Sun
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Qian Liu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ge Qu
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Manfred T. Reetz
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, 32 West Seventh Avenue, Tianjin Airport Economic Area, Tianjin 300308, China
- Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 Mülheim an der Ruhr, Germany
- Chemistry Department, Philipps-University, Hans-Meerwein-Strasse 4, 35032 Marburg, Germany
| |
Collapse
|
41
|
Abstract
Intrinsically disordered proteins and regions are involved in a wide range of cellular functions, and they often facilitate protein-protein interactions. Molecular recognition features (MoRFs) are segments of intrinsically disordered regions that bind to partner proteins, where binding is concomitant with a transition to a structured conformation. MoRFs facilitate translation, transport, signaling, and regulatory processes and are found across all domains of life. A popular computational tool, MoRFpred, accurately predicts MoRFs in protein sequences. MoRFpred is implemented as a user-friendly web server that is freely available at http://biomine.cs.vcu.edu/servers/MoRFpred/ . We describe this predictor, explain how to run the web server, and show how to interpret the results it generates. We also demonstrate the utility of this web server based on two case studies, focusing on the relevance of evolutionary conservation of MoRF regions.
Collapse
Affiliation(s)
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Institute for Biological Instrumentation, Russian Academy of Sciences, Moscow Region, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
42
|
The novel EHEC gene asa overlaps the TEGT transporter gene in antisense and is regulated by NaCl and growth phase. Sci Rep 2018; 8:17875. [PMID: 30552341 PMCID: PMC6294744 DOI: 10.1038/s41598-018-35756-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 11/08/2018] [Indexed: 12/02/2022] Open
Abstract
Only a few overlapping gene pairs are known in the best-analyzed bacterial model organism Escherichia coli. Automatic annotation programs usually annotate only one out of six reading frames at a locus, allowing only small overlaps between protein-coding sequences. However, both RNAseq and RIBOseq show signals corresponding to non-trivially overlapping reading frames in antisense to annotated genes, which may constitute protein-coding genes. The transcription and translation of the novel 264 nt gene asa, which overlaps in antisense to a putative TEGT (Testis-Enhanced Gene Transfer) transporter gene is detected in pathogenic E. coli, but not in two apathogenic E. coli strains. The gene in E. coli O157:H7 (EHEC) was further analyzed. An overexpression phenotype was identified in two stress conditions, i.e. excess in salt or arginine. For this, EHEC overexpressing asa was grown competitively against EHEC with a translationally arrested asa mutant gene. RT-qPCR revealed conditional expression dependent on growth phase, sodium chloride, and arginine. Two potential promoters were computationally identified and experimentally verified by reporter gene expression and determination of the transcription start site. The protein Asa was verified by Western blot. Close homologues of asa have not been found in protein databases, but bioinformatic analyses showed that it may be membrane associated, having a largely disordered structure.
Collapse
|
43
|
The state-of-the-art strategies of protein engineering for enzyme stabilization. Biotechnol Adv 2018; 37:530-537. [PMID: 31138425 DOI: 10.1016/j.biotechadv.2018.10.011] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 10/12/2018] [Accepted: 10/25/2018] [Indexed: 12/11/2022]
Abstract
Enzymes generated by natural recruitment and protein engineering have greatly contribute in various sets of applications. However, their insufficient stability is a bottleneck that limit the rapid development of biocatalysis. Novel approaches based on precise and global structural dissection, advanced gene manipulation, and combination with the multidisciplinary techniques open a new horizon to generate stable enzymes efficiently. Here, we comprehensively introduced emerging advances of protein engineering strategies for enzyme stabilization. Then, we highlighted practical cases to show importance of enzyme stabilization in pharmaceutical and industrial applications. Combining computational enzyme design with molecular evolution will hold considerable promise in this field.
Collapse
|
44
|
Bai Z, Kong X. Extension of the mutation spectrum of PAX6 from three Chinese congenital aniridia families and identification of male gonadal mosaicism. Mol Genet Genomic Med 2018; 6:1053-1067. [PMID: 30334364 PMCID: PMC6305634 DOI: 10.1002/mgg3.481] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 08/19/2018] [Accepted: 08/21/2018] [Indexed: 12/29/2022] Open
Abstract
Background Congenital aniridia is a severe autosomal dominant binocular developmental disorder, the primary feature of which is congenital absence or hypoplasia of the iris. PAX6 is the main disease‐causing gene of congenital aniridia; inheritance is autosomal dominant. But the current mutations do not fully explain this disorder. Methods We investigated the mutation profile of genes related in three Chinese families with congenital aniridia through targeted sequencing technology. And we validated the candidate variants by PCR‐based Sanger sequencing. Different degree impairments of islet function were observed in the patients with aniridia by carbohydrate tolerance butter and insulin release tests in our study. Results We identified four novel mutations of PAX6 from three Chinese families with congenital aniridia, which included heterozygous double mutation c.879_880delCA (p.S294Cfs*46) and c.1124C>G (p.P375R) in Family 1 with three patients, heterozygous frameshift mutation c.308delG (p.P103Qfs*21) in Family 2 with one patient, and c.1192delT (p.S398Pfs*126) in Family 3 with two patients. The three frameshift mutations of PAX6 are co‐segregated with the aniridia from controls in the families, but the novel missense mutation is not co‐segregated with the phenotype. The frameshift mutations in Family 1 and Family 2 have effects to truncate the protein, but the frameshift mutation in Family 3 will prolong it. We confirmed the phenomenon of male gonadal mosaicism of PAX6 by the sequencing of two linked novel mutations in Family 1. Most of the patients with isolated aniridia have different degrees of islet damage through related clinical tests. Conclusion It is therefore noteworthy that we found different types of pathogenic mutation, which have effects of truncating or prolonging protein leaded by frameshift mutation. Our results of this study extended the pathogenic mutation spectrum of PAX6 for congenital aniridia and demonstrated the male germline chimerism by molecular experiments.
Collapse
Affiliation(s)
- Zhouxian Bai
- Genetic and Prenatal Diagnosis Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xiangdong Kong
- Genetic and Prenatal Diagnosis Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| |
Collapse
|
45
|
Vera R, Synsmir-Zizzamia M, Ojinnaka S, Snyder DA. Prediction of protein flexibility using a conformationally restrained contact map. Proteins 2018; 86:1111-1116. [DOI: 10.1002/prot.25591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 06/29/2018] [Accepted: 08/05/2018] [Indexed: 12/30/2022]
Affiliation(s)
- Rebecca Vera
- Department of Biology and Physical Sciences; Passaic County Community College; Paterson New Jersey
- Department of Biological Sciences; Rutgers University; Newark New Jersey
| | - Melissa Synsmir-Zizzamia
- Department of Chemistry; Union County College; Cranford New Jersey
- Department of Chemistry; The College of New Jersey; Ewing Township New Jersey
| | - Sarah Ojinnaka
- Department of Chemistry, College of Science and Health; William Paterson University of New Jersey; Wayne New Jersey
| | - David A. Snyder
- Department of Chemistry, College of Science and Health; William Paterson University of New Jersey; Wayne New Jersey
| |
Collapse
|
46
|
Liu Y, Wang X, Liu B. IDP⁻CRF: Intrinsically Disordered Protein/Region Identification Based on Conditional Random Fields. Int J Mol Sci 2018; 19:E2483. [PMID: 30135358 PMCID: PMC6164615 DOI: 10.3390/ijms19092483] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 08/14/2018] [Accepted: 08/18/2018] [Indexed: 12/16/2022] Open
Abstract
Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP⁻CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP⁻CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP⁻CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP⁻CRF will facilitate the development of protein sequence analysis.
Collapse
Affiliation(s)
- Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, Guangdong, China.
| |
Collapse
|
47
|
Cao Y, Liu D, Zhang WB. Supercharging SpyCatcher toward an intrinsically disordered protein with stimuli-responsive chemical reactivity. Chem Commun (Camb) 2018; 53:8830-8833. [PMID: 28692103 DOI: 10.1039/c7cc04507g] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We report a supercharged, intrinsically disordered protein, SpyCatcher(-), possessing stimuli-responsive reactivity toward SpyTag with tunable yields ranging from 4% to 98% depending on pH, temperature, ionic strength, etc. The CD and NMR studies reveal that the reaction occurs through a folded intermediate formed probably via a different mechanism from that of SpyCatcher.
Collapse
Affiliation(s)
- Yang Cao
- Key Laboratory of Polymer Chemistry & Physics of Ministry of Education, Center for Soft Matter Science and Engineering, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, P. R. China.
| | | | | |
Collapse
|
48
|
Dai W, Usami Y, Wu Y, Göttlinger H. A Long Cytoplasmic Loop Governs the Sensitivity of the Anti-viral Host Protein SERINC5 to HIV-1 Nef. Cell Rep 2018; 22:869-875. [PMID: 29386131 PMCID: PMC5810964 DOI: 10.1016/j.celrep.2017.12.082] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 12/05/2017] [Accepted: 12/22/2017] [Indexed: 12/14/2022] Open
Abstract
We recently identified the multipass transmembrane protein SERINC5 as an antiviral protein that can potently inhibit HIV-1 infectivity and is counteracted by HIV-1 Nef. We now report that the anti-HIV-1 activity, but not the sensitivity to Nef, is conserved among vertebrate SERINC5 proteins. However, a Nef-resistant SERINC5 became Nef sensitive when its intracellular loop 4 (ICL4) was replaced by that of Nef-sensitive human SERINC5. Conversely, human SERINC5 became resistant to Nef when its ICL4 was replaced by that of a Nef-resistant SERINC5. In general, ICL4 regions from SERINCs that exhibited resistance to a given Nef conferred resistance to the same Nef when transferred to a sensitive SERINC, and vice versa. Our results establish that human SERINC5 can be modified to restrict HIV-1 infectivity even in the presence of Nef.
Collapse
Affiliation(s)
- Weiwei Dai
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Yoshiko Usami
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Yuanfei Wu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Heinrich Göttlinger
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| |
Collapse
|
49
|
Hanson J, Yang Y, Paliwal K, Zhou Y. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 2017; 33:685-692. [PMID: 28011771 DOI: 10.1093/bioinformatics/btw678] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 10/26/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. Results The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. Availability and Implementation SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . Contact j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary information Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane 4122, Australia
| | - Yuedong Yang
- Institute for Glycomics, Griffith University, Gold Coast 4215, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane 4122, Australia
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast 4215, Australia
| |
Collapse
|
50
|
Meng F, Uversky VN, Kurgan L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 2017; 74:3069-3090. [PMID: 28589442 PMCID: PMC11107660 DOI: 10.1007/s00018-017-2555-4] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 06/01/2017] [Indexed: 12/19/2022]
Abstract
Computational prediction of intrinsic disorder in protein sequences dates back to late 1970 and has flourished in the last two decades. We provide a brief historical overview, and we review over 30 recent predictors of disorder. We are the first to also cover predictors of molecular functions of disorder, including 13 methods that focus on disordered linkers and disordered protein-protein, protein-RNA, and protein-DNA binding regions. We overview their predictive models, usability, and predictive performance. We highlight newest methods and predictors that offer strong predictive performance measured based on recent comparative assessments. We conclude that the modern predictors are relatively accurate, enjoy widespread use, and many of them are fast. Their predictions are conveniently accessible to the end users, via web servers and databases that store pre-computed predictions for millions of proteins. However, research into methods that predict many not yet addressed functions of intrinsic disorder remains an outstanding challenge.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, USA.
| |
Collapse
|