1
|
Boeck L, Burbaud S, Skwark M, Pearson WH, Sangen J, Wuest AW, Marshall EKP, Weimann A, Everall I, Bryant JM, Malhotra S, Bannerman BP, Kierdorf K, Blundell TL, Dionne MS, Parkhill J, Andres Floto R. Mycobacterium abscessus pathogenesis identified by phenogenomic analyses. Nat Microbiol 2022; 7:1431-1441. [PMID: 36008617 PMCID: PMC9418003 DOI: 10.1038/s41564-022-01204-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 07/19/2022] [Indexed: 12/12/2022]
Abstract
The medical and scientific response to emerging and established pathogens is often severely hampered by ignorance of the genetic determinants of virulence, drug resistance and clinical outcomes that could be used to identify therapeutic drug targets and forecast patient trajectories. Taking the newly emergent multidrug-resistant bacteria Mycobacterium abscessus as an example, we show that combining high-dimensional phenotyping with whole-genome sequencing in a phenogenomic analysis can rapidly reveal actionable systems-level insights into bacterial pathobiology. Through phenotyping of 331 clinical isolates, we discovered three distinct clusters of isolates, each with different virulence traits and associated with a different clinical outcome. We combined genome-wide association studies with proteome-wide computational structural modelling to define likely causal variants, and employed direct coupling analysis to identify co-evolving, and therefore potentially epistatic, gene networks. We then used in vivo CRISPR-based silencing to validate our findings and discover clinically relevant M. abscessus virulence factors including a secretion system, thus illustrating how phenogenomics can reveal critical pathways within emerging pathogenic bacteria.
Collapse
Affiliation(s)
- Lucas Boeck
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
- Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Sophie Burbaud
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
| | - Marcin Skwark
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Will H Pearson
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Jasper Sangen
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
| | - Andreas W Wuest
- Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Eleanor K P Marshall
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Aaron Weimann
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
| | | | - Josephine M Bryant
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
| | - Sony Malhotra
- Department of Biochemistry, University of Cambridge, Cambridge, UK
- Scientific Computing Department, Science and Technology Facilities Council, Harwell, UK
| | - Bridget P Bannerman
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK
- Cambridge Centre for AI in Medicine, Cambridge, UK
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Katrin Kierdorf
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
- Institute of Neuropathology, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Marc S Dionne
- MRC Centre for Molecular Bacteriology and Infection, Imperial College London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - R Andres Floto
- Molecular Immunity Unit, University of Cambridge Department of Medicine, MRC Laboratory of Molecular Biology, Cambridge, UK.
- Cambridge Centre for AI in Medicine, Cambridge, UK.
- Cambridge Centre for Lung Infection, Royal Papworth Hospital, Cambridge, UK.
| |
Collapse
|
2
|
Katuwawala A, Ghadermarzi S, Hu G, Wu Z, Kurgan L. QUARTERplus: Accurate disorder predictions integrated with interpretable residue-level quality assessment scores. Comput Struct Biotechnol J 2021; 19:2597-2606. [PMID: 34025946 PMCID: PMC8122155 DOI: 10.1016/j.csbj.2021.04.066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/24/2021] [Accepted: 04/24/2021] [Indexed: 12/13/2022] Open
Abstract
A recent advance in the disorder prediction field is the development of the quality assessment (QA) scores. QA scores complement the propensities produced by the disorder predictors by identifying regions where these predictions are more likely to be correct. We develop, empirically test and release a new QA tool, QUARTERplus, that addresses several key drawbacks of the current QA method, QUARTER. QUARTERplus is the first solution that utilizes QA scores and the associated input disorder predictions to produce very accurate disorder predictions with the help of a modern deep learning meta-model. The deep neural network utilizes the QA scores to identify and fix the regions where the original/input disorder predictions are poor. More importantly, the accurate QUATERplus's predictions are accompanied by easy to interpret residue-level QA scores that reliably quantify their residue-level predictive quality. We provide these interpretable QA scores for QUARTERplus and 10 other popular disorder predictors. Empirical tests on a large and independent (low similarity) test dataset show that QUARTERplus predictions secure AUC = 0.93 and are statistically more accurate than the results of twelve state-of-the-art disorder predictors. We also demonstrate that the new QA scores produced by QUARTERplus are highly correlated with the actual predictive quality and that they can be effectively used to identify regions of correct disorder predictions. This feature empowers the users to easily identify which parts of the predictions generated by the modern disorder predictors are more trustworthy. QUARTERplus is available as a convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTERplus/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
3
|
Studer G, Rempfer C, Waterhouse AM, Gumienny R, Haas J, Schwede T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020; 36:1765-1771. [PMID: 31697312 PMCID: PMC7075525 DOI: 10.1093/bioinformatics/btz828] [Citation(s) in RCA: 447] [Impact Index Per Article: 111.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 10/24/2019] [Accepted: 11/06/2019] [Indexed: 01/13/2023] Open
Abstract
Motivation Methods that estimate the quality of a 3D protein structure model in absence of an experimental reference structure are crucial to determine a model’s utility and potential applications. Single model methods assess individual models whereas consensus methods require an ensemble of models as input. In this work, we extend the single model composite score QMEAN that employs statistical potentials of mean force and agreement terms by introducing a consensus-based distance constraint (DisCo) score. Results DisCo exploits distance distributions from experimentally determined protein structures that are homologous to the model being assessed. Feed-forward neural networks are trained to adaptively weigh contributions by the multi-template DisCo score and classical single model QMEAN parameters. The result is the composite score QMEANDisCo, which combines the accuracy of consensus methods with the broad applicability of single model approaches. We also demonstrate that, despite being the de-facto standard for structure prediction benchmarking, CASP models are not the ideal data source to train predictive methods for model quality estimation. For performance assessment, QMEANDisCo is continuously benchmarked within the CAMEO project and participated in CASP13. For both, it ranks among the top performers and excels with low response times. Availability and implementation QMEANDisCo is available as web-server at https://swissmodel.expasy.org/qmean. The source code can be downloaded from https://git.scicore.unibas.ch/schwede/QMEAN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Rempfer
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Rafal Gumienny
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| |
Collapse
|
4
|
Hu G, Wu Z, Oldfield CJ, Wang C, Kurgan L. Quality assessment for the putative intrinsic disorder in proteins. Bioinformatics 2020; 35:1692-1700. [PMID: 30329008 DOI: 10.1093/bioinformatics/bty881] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 09/19/2018] [Accepted: 10/15/2018] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION While putative intrinsic disorder is widely used, none of the predictors provides quality assessment (QA) scores. QA scores estimate the likelihood that predictions are correct at a residue level and have been applied in other bioinformatics areas. We recently reported that QA scores derived from putative disorder propensities perform relatively poorly for native disordered residues. Here we design and validate a general approach to construct QA predictors for disorder predictions. RESULTS The QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions) toolbox of methods accommodates a diverse set of ten disorder predictors. It builds upon several innovative design elements including use and scaling of selected physicochemical properties of the input sequence, post-processing of disorder propensity scores, and a feature selection that optimizes the predictive models to a specific disorder predictor. We empirically establish that each one of these elements contributes to the overall predictive performance of our tool and that QUARTER's outputs significantly outperform QA scores derived from the outputs generated the disorder predictors. The best performing QA scores for a single disorder predictor identify 13% of residues that are predicted with 98% precision. QA scores computed by combining results of the ten disorder predictors cover 40% of residues with 95% precision. Case studies are used to show how to interpret the QA scores. QA scores based on the high precision combined predictions are applied to analyze disorder in the human proteome. AVAILABILITY AND IMPLEMENTATION http://biomine.cs.vcu.edu/servers/QUARTER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People's Republic of China
| | | | - Chen Wang
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
5
|
Abstract
Intrinsically disordered regions (IDRs) are estimated to be highly abundant in nature. While only several thousand proteins are annotated with experimentally derived IDRs, computational methods can be used to predict IDRs for the millions of currently uncharacterized protein chains. Several dozen disorder predictors were developed over the last few decades. While some of these methods provide accurate predictions, unavoidably they also make some mistakes. Consequently, one of the challenges facing users of these methods is how to decide which predictions can be trusted and which are likely incorrect. This practical problem can be solved using quality assessment (QA) scores that predict correctness of the underlying (disorder) predictions at a residue level. We motivate and describe a first-of-its-kind toolbox of QA methods, QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions), which provides the scores for a diverse set of ten disorder predictors. QUARTER is available to the end users as a free and convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTER/ . We briefly describe the predictive architecture of QUARTER and provide detailed instructions on how to use the webserver. We also explain how to interpret results produced by QUARTER with the help of a case study.
Collapse
|
6
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
7
|
Skwark MJ, Torres PHM, Copoiu L, Bannerman B, Floto RA, Blundell TL. Mabellini: a genome-wide database for understanding the structural proteome and evaluating prospective antimicrobial targets of the emerging pathogen Mycobacterium abscessus. Database (Oxford) 2019; 2019:5611286. [PMID: 31681953 PMCID: PMC6853642 DOI: 10.1093/database/baz113] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 07/31/2019] [Accepted: 08/28/2019] [Indexed: 02/02/2023]
Abstract
Mycobacterium abscessus, a rapid growing, multidrug resistant, nontuberculous mycobacteria, can cause a wide range of opportunistic infections, particularly in immunocompromised individuals. M. abscessus has emerged as a growing threat to patients with cystic fibrosis, where it causes accelerated inflammatory lung damage, is difficult and sometimes impossible to treat and can prevent safe transplantation. There is therefore an urgent unmet need to develop new therapeutic strategies. The elucidation of the M. abscessus genome in 2009 opened a wide range of research possibilities in the field of drug discovery that can be more effectively exploited upon the characterization of the structural proteome. Where there are no experimental structures, we have used the available amino acid sequences to create 3D models of the majority of the remaining proteins that constitute the M. abscessus proteome (3394 proteins and over 13 000 models) using a range of up-to-date computational tools, many developed by our own group. The models are freely available for download in an on-line database, together with quality data and functional annotation. Furthermore, we have developed an intuitive and user-friendly web interface (http://www.mabellinidb.science) that enables easy browsing, querying and retrieval of the proteins of interest. We believe that this resource will be of use in evaluating the prospective targets for design of antimicrobial agents and will serve as a cornerstone to support the development of new molecules to treat M. abscessus infections.
Collapse
Affiliation(s)
- Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Pedro H M Torres
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Liviu Copoiu
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Bridget Bannerman
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - R Andres Floto
- Molecular Immunity Unit, Department of Medicine University of Cambridge, MRC-Laboratory of Molecular Biology, Cambridge CB2 0QH, UK
and,Cambridge Centre for Lung Infection, Royal Papworth Hospital, Cambridge CB23 3RE, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: Tel: +44 1223 333628; Fax: +44 1223 766002;
| |
Collapse
|
8
|
Sato R, Ishida T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS One 2019; 14:e0221347. [PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/05/2019] [Indexed: 11/23/2022] Open
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366–0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.
Collapse
Affiliation(s)
- Rin Sato
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
9
|
Katuwawala A, Peng Z, Yang J, Kurgan L. Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions. Comput Struct Biotechnol J 2019; 17:454-462. [PMID: 31007871 PMCID: PMC6453775 DOI: 10.1016/j.csbj.2019.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 03/22/2019] [Accepted: 03/23/2019] [Indexed: 12/28/2022] Open
Abstract
Molecular recognition features (MoRFs) are short protein-binding regions that undergo disorder-to-order transitions (induced folding) upon binding protein partners. These regions are abundant in nature and can be predicted from protein sequences based on their distinctive sequence signatures. This first-of-its-kind survey covers 14 MoRF predictors and six related methods for the prediction of short protein-binding linear motifs, disordered protein-binding regions and semi-disordered regions. We show that the development of MoRF predictors has accelerated in the recent years. These predictors depend on machine learning-derived models that were generated using training datasets where MoRFs are annotated using putative disorder. Our analysis reveals that they generate accurate predictions. We identified eight methods that offer area under the ROC curve (AUC) ≥ 0.7 on experimentally-validated test datasets. We show that modern MoRF predictors accurately find experimentally annotated MoRFs even though they were trained using the putative disorder annotations. They are relatively highly-cited, particularly the methods available as webservers that on average secure three times more citations than methods without this option. MoRF predictions contribute to the experimental discovery of protein-protein interactions, annotation of protein functions and computational analysis of a variety of proteomes, protein families, and pathways. We outline future development and application directions for these tools, stressing the importance to develop novel tools that would target interactions of disordered regions with other types of partners.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
10
|
Uziela K, Menéndez Hurtado D, Shu N, Wallner B, Elofsson A. Improved protein model quality assessments by changing the target function. Proteins 2018. [PMID: 29524250 DOI: 10.1002/prot.25492] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.
Collapse
Affiliation(s)
- Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - David Menéndez Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Nanjiang Shu
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden.,Bioinformatics Short-term Support and Infrastructure (BILS), Science for Life Laboratory, Solna, Sweden
| | - Björn Wallner
- Department of Physics, Chemistry and Biology (IFM)/Bioinformatics, Linköping University, Linköping, Sweden
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
11
|
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 2018; 33:2496-2503. [PMID: 28419290 DOI: 10.1093/bioinformatics/btx222] [Citation(s) in RCA: 130] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/12/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation The accurate ranking of predicted structural models and selecting the best model from a given candidate pool remain as open problems in the field of structural bioinformatics. The quality assessment (QA) methods used to address these problems can be grouped into two categories: consensus methods and single-model methods. Consensus methods in general perform better and attain higher correlation between predicted and true quality measures. However, these methods frequently fail to generate proper quality scores for native-like structures which are distinct from the rest of the pool. Conversely, single-model methods do not suffer from this drawback and are better suited for real-life applications where many models from various sources may not be readily available. Results In this study, we developed a support-vector-machine-based single-model global quality assessment (SVMQA) method. For a given protein model, the SVMQA method predicts TM-score and GDT_TS score based on a feature vector containing statistical potential energy terms and consistency-based terms between the actual structural features (extracted from the three-dimensional coordinates) and predicted values (from primary sequence). We trained SVMQA using CASP8, CASP9 and CASP10 targets and determined the machine parameters by 10-fold cross-validation. We evaluated the performance of our SVMQA method on various benchmarking datasets. Results show that SVMQA outperformed the existing best single-model QA methods both in ranking provided protein models and in selecting the best model from the pool. According to the CASP12 assessment, SVMQA was the best method in selecting good-quality models from decoys in terms of GDTloss. Availability and implementation SVMQA method can be freely downloaded from http://lee.kias.re.kr/SVMQA/SVMQA_eval.tar.gz. Contact jlee@kias.re.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| |
Collapse
|
12
|
Elofsson A, Joo K, Keasar C, Lee J, Maghrabi AHA, Manavalan B, McGuffin LJ, Ménendez Hurtado D, Mirabello C, Pilstål R, Sidi T, Uziela K, Wallner B. Methods for estimation of model accuracy in CASP12. Proteins 2017; 86 Suppl 1:361-373. [DOI: 10.1002/prot.25395] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 09/25/2017] [Accepted: 10/03/2017] [Indexed: 12/28/2022]
Affiliation(s)
- Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Keehyoung Joo
- Center for In Silico Protein Science and Center for Advanced Computation; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Chen Keasar
- Department of Computer Science; Ben Gurion University of the Negev; Israel
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Ali H. A. Maghrabi
- School of Biological Sciences; University of Reading, Whiteknights, Reading; RG6 6AS United Kingdom
| | - Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences; Korea Institute for Advanced Study; Seoul 130-722 Korea
| | - Liam J. McGuffin
- School of Biological Sciences; University of Reading, Whiteknights, Reading; RG6 6AS United Kingdom
| | - David Ménendez Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Claudio Mirabello
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| | - Robert Pilstål
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| | - Tomer Sidi
- Department of Computer Science; Ben Gurion University of the Negev; Israel
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory; Stockholm University, Box 1031; Solna 171 21 Sweden
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division; Linköping University; Linköping 581 83 Sweden
| |
Collapse
|
13
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
14
|
Wu Z, Hu G, Wang K, Kurgan L. Exploratory Analysis of Quality Assessment of Putative Intrinsic Disorder in Proteins. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING 2017. [DOI: 10.1007/978-3-319-59063-9_65] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
15
|
Adamczak R, Meller J. UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data. BMC Bioinformatics 2016; 17:546. [PMID: 28031034 PMCID: PMC5198500 DOI: 10.1186/s12859-016-1381-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 11/23/2016] [Indexed: 12/01/2022] Open
Abstract
Background Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. Results uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust. Conclusion uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1381-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rafal Adamczak
- Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland.
| | - Jarek Meller
- Department of Informatics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100, Torun, Poland. .,Departments of Environmental Health and Electrical Engineering & Computing Systems, University of Cincinnati, Cincinnati, USA. .,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA.
| |
Collapse
|
16
|
Jing X, Wang K, Lu R, Dong Q. Sorting protein decoys by machine-learning-to-rank. Sci Rep 2016; 6:31571. [PMID: 27530967 PMCID: PMC4987638 DOI: 10.1038/srep31571] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/26/2016] [Indexed: 11/18/2022] Open
Abstract
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Kai Wang
- College of Animal Science and Technology, Jilin Agricultural University, Changchun 130118, People’s Republic of China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Qiwen Dong
- Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, People’s Republic of China
| |
Collapse
|
17
|
Dyrka W, Kurczyńska M, Konopka BM, Kotulska M. Fast assessment of structural models of ion channels based on their predicted current-voltage characteristics. Proteins 2015; 84:217-31. [PMID: 26650347 DOI: 10.1002/prot.24967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 11/19/2015] [Accepted: 11/29/2015] [Indexed: 11/11/2022]
Abstract
Computational prediction of protein structures is a difficult task, which involves fast and accurate evaluation of candidate model structures. We propose to enhance single-model quality assessment with a functionality evaluation phase for proteins whose quantitative functional characteristics are known. In particular, this idea can be applied to evaluation of structural models of ion channels, whose main function - conducting ions - can be quantitatively measured with the patch-clamp technique providing the current-voltage characteristics. The study was performed on a set of KcsA channel models obtained from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for calculating the current-voltage characteristics of structural models. We found that the computed charge selectivity and total current were sensitive to structural and electrostatic quality of models. In practical terms, we show that evaluating predicted conductance values is an appropriate method to eliminate models with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in highly accurate models. Tests on three other ion channels indicate that, in addition to being a proof of the concept, our function-oriented single-model quality assessment method can be directly applied to evaluation of structural models of some classes of protein channels. Finally, our work raises an important question whether a computational validation of functionality should be included in the evaluation process of structural models, whenever possible.
Collapse
Affiliation(s)
- Witold Dyrka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Monika Kurczyńska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Bogumił M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Małgorzata Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| |
Collapse
|
18
|
Kryshtafovych A, Barbato A, Monastyrskyy B, Fidelis K, Schwede T, Tramontano A. Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11. Proteins 2015; 84 Suppl 1:349-69. [PMID: 26344049 DOI: 10.1002/prot.24919] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 07/30/2015] [Accepted: 08/28/2015] [Indexed: 12/27/2022]
Abstract
The article presents assessment of the model accuracy estimation methods participating in CASP11. The results of the assessment are expected to be useful to both-developers of the methods and users who way too often are presented with structural models without annotations of accuracy. The main emphasis is placed on the ability of techniques to identify the best models from among several available. Bivariate descriptive statistics and ROC analysis are used to additionally assess the overall correctness of the predicted model accuracy scores, the correlation between the predicted and observed accuracy of models, the effectiveness in distinguishing between good and bad models, the ability to discriminate between reliable and unreliable regions in models, and the accuracy of the coordinate error self-estimates. A rigid-body measure (GDT_TS) and three local-structure-based scores (LDDT, CADaa, and SphereGrinder) are used as reference measures for evaluating methods' performance. Consensus methods, taking advantage of the availability of several models for the same target protein, perform well on the majority of tasks. Methods that predict accuracy on the basis of a single model perform comparably to consensus methods in picking the best models and in the estimation of how accurate is the local structure. More groups than in previous experiments submitted reasonable error estimates of their own models, most likely in response to a recommendation from CASP and the increasing demand from users. Proteins 2016; 84(Suppl 1):349-369. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Alessandro Barbato
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | | | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
19
|
Simoncini D, Nakata H, Ogata K, Nakamura S, Zhang KY. Quality Assessment of Predicted Protein Models Using Energies Calculated by the Fragment Molecular Orbital Method. Mol Inform 2015; 34:97-104. [PMID: 27490032 DOI: 10.1002/minf.201400108] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 10/13/2014] [Indexed: 12/12/2022]
Abstract
Protein structure prediction directly from sequences is a very challenging problem in computational biology. One of the most successful approaches employs stochastic conformational sampling to search an empirically derived energy function landscape for the global energy minimum state. Due to the errors in the empirically derived energy function, the lowest energy conformation may not be the best model. We have evaluated the use of energy calculated by the fragment molecular orbital method (FMO energy) to assess the quality of predicted models and its ability to identify the best model among an ensemble of predicted models. The fragment molecular orbital method implemented in GAMESS was used to calculate the FMO energy of predicted models. When tested on eight protein targets, we found that the model ranking based on FMO energies is better than that based on empirically derived energies when there is sufficient diversity among these models. This model diversity can be estimated prior to the FMO energy calculations. Our result demonstrates that the FMO energy calculated by the fragment molecular orbital method is a practical and promising measure for the assessment of protein model quality and the selection of the best protein model among many generated.
Collapse
Affiliation(s)
- David Simoncini
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa 230-0045, Japan phone: +81(0)45-503-9560/fax: +81(0)45-503-9559.,Present address: Mathématiques et Informatique Appliquées de Toulouse, Unité de Recherche 875, Institut National de la Recherche Agronomique, F-31320 Castanet-Tolosan, France
| | - Hiroya Nakata
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503.,Department of Biomolecular Engineering, Tokyo Institute of Technology, 4259 Nagatsutacho, Midori-ku, Yokohama, Kanagawa 226-8501, Japan.,Japan Society for the Promotion of Science, Kojimachi Business Center Building, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Koji Ogata
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503
| | - Shinichiro Nakamura
- RIKEN Research Cluster for Innovation, 2-1 Hirosawa, Wako, Saitama 351-0198 Japan phone/fax: +81(0)48-467-9477/+81(0)48-467-8503.
| | - Kam Yj Zhang
- Structural Bioinformatics Team, Division of Structural and Synthetic Biology, Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Yokohama, Kanagawa 230-0045, Japan phone: +81(0)45-503-9560/fax: +81(0)45-503-9559.
| |
Collapse
|
20
|
Studer G, Biasini M, Schwede T. Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). ACTA ACUST UNITED AC 2015; 30:i505-11. [PMID: 25161240 PMCID: PMC4147910 DOI: 10.1093/bioinformatics/btu457] [Citation(s) in RCA: 105] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Motivation: Membrane proteins are an important class of biological macromolecules involved in many cellular key processes including signalling and transport. They account for one third of genes in the human genome and >50% of current drug targets. Despite their importance, experimental structural data are sparse, resulting in high expectations for computational modelling tools to help fill this gap. However, as many empirical methods have been trained on experimental structural data, which is biased towards soluble globular proteins, their accuracy for transmembrane proteins is often limited. Results: We developed a local model quality estimation method for membrane proteins (‘QMEANBrane’) by combining statistical potentials trained on membrane protein structures with a per-residue weighting scheme. The increasing number of available experimental membrane protein structures allowed us to train membrane-specific statistical potentials that approach statistical saturation. We show that reliable local quality estimation of membrane protein models is possible, thereby extending local quality estimation to these biologically relevant molecules. Availability and implementation: Source code and datasets are available on request. Contact:torsten.schwede@unibas.ch Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland
| | - Marco Biasini
- Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland Biozentrum, University of Basel, Basel, 4056, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, 4056, Switzerland
| |
Collapse
|
21
|
Ge C, Gómez-Llobregat J, Skwark MJ, Ruysschaert JM, Wieslander A, Lindén M. Membrane remodeling capacity of a vesicle-inducing glycosyltransferase. FEBS J 2014; 281:3667-84. [PMID: 24961908 DOI: 10.1111/febs.12889] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Revised: 05/21/2014] [Accepted: 06/19/2014] [Indexed: 11/28/2022]
Abstract
Intracellular vesicles are abundant in eukaryotic cells but absent in the Gram-negative bacterium Escherichia coli. However, strong overexpression of a monotopic glycolipid-synthesizing enzyme, monoglucosyldiacylglycerol synthase from Acholeplasma laidlawii (alMGS), leads to massive formation of vesicles in the cytoplasm of E. coli. More importantly, alMGS provides a model system for the regulation of membrane properties by membrane-bound enzymes, which is critical for maintaining cellular integrity. Both phenomena depend on how alMGS binds to cell membranes, which is not well understood. Here, we carry out a comprehensive investigation of the membrane binding of alMGS by combining bioinformatics methods with extensive biochemical studies, structural modeling and molecular dynamics simulations. We find that alMGS binds to the membrane in a fairly upright manner, mainly by residues in the N-terminal domain, and in a way that induces local enrichment of anionic lipids and a local curvature deformation. Furthermore, several alMGS variants resulting from substitution of residues in the membrane anchoring segment are still able to generate vesicles, regardless of enzymatic activity. These results clarify earlier theories about the driving forces for vesicle formation, and shed new light on the membrane binding properties and enzymatic mechanism of alMGS and related monotopic GT-B fold glycosyltransferases.
Collapse
Affiliation(s)
- Changrong Ge
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm University, Sweden; Laboratory for the Structure and Function of Biological Membranes, Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles, Belgium; Medical Inflammation Research, Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, Sweden
| | | | | | | | | | | |
Collapse
|
22
|
Computational modeling of protein-RNA complex structures. Methods 2013; 65:310-9. [PMID: 24083976 DOI: 10.1016/j.ymeth.2013.09.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 09/17/2013] [Accepted: 09/19/2013] [Indexed: 12/26/2022] Open
Abstract
Protein-RNA interactions play fundamental roles in many biological processes, such as regulation of gene expression, RNA splicing, and protein synthesis. The understanding of these processes improves as new structures of protein-RNA complexes are solved and the molecular details of interactions analyzed. However, experimental determination of protein-RNA complex structures by high-resolution methods is tedious and difficult. Therefore, studies on protein-RNA recognition and complex formation present major technical challenges for macromolecular structural biology. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental measurements, theoretical models of macromolecular structures can be sufficiently accurate to prompt functional hypotheses and guide e.g. identification of important amino acid or nucleotide residues. In this article we present an overview of strategies and methods for computational modeling of protein-RNA complexes, including software developed in our laboratory, and illustrate it with practical examples of structural predictions.
Collapse
|