1
|
Liu J, Wu T, Guo Z, Hou J, Cheng J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins 2021; 90:58-72. [PMID: 34291486 PMCID: PMC8671168 DOI: 10.1002/prot.26186] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/21/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]
Abstract
Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
2
|
Pang WC, Ramli ANM, Hamid AAA. Comparative modelling studies of fruit bromelain using molecular dynamics simulation. J Mol Model 2020; 26:142. [PMID: 32417971 DOI: 10.1007/s00894-020-04398-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 04/28/2020] [Indexed: 12/25/2022]
Abstract
Fruit bromelain is a cysteine protease accumulated in pineapple fruits. This proteolytic enzyme has received high demand for industrial and therapeutic applications. In this study, fruit bromelain sequences QIM61759, QIM61760 and QIM61761 were retrieved from the National Center for Biotechnology Information (NCBI) Genbank Database. The tertiary structure of fruit bromelain QIM61759, QIM61760 and QIM61761 was generated by using MODELLER. The result revealed that the local stereochemical quality of the generated models was improved by using multiple templates during modelling process. Moreover, by comparing with the available papain model, structural analysis provides an insight on how pro-peptide functions as a scaffold in fruit bromelain folding and contributing to inactivation of mature protein. The structural analysis also disclosed the similarities and differences between these models. Lastly, thermal stability of fruit bromelain was studied. Molecular dynamics simulation of fruit bromelain structures at several selected temperatures demonstrated how fruit bromelain responds to elevation of temperature.
Collapse
Affiliation(s)
- Wei Cheng Pang
- Faculty of Industrial Science & Technology, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Kuantan, Pahang Darul Makmur, Malaysia
| | - Aizi Nor Mazila Ramli
- Faculty of Industrial Science & Technology, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Kuantan, Pahang Darul Makmur, Malaysia. .,Bio Aromatic Research Centre of Excellence, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Kuantan, Pahang Darul Makmur, Malaysia.
| | - Azzmer Azzar Abdul Hamid
- Department of Biotechnology, Kulliyyah of Science, International Islamic University Malaysia (IIUM), Bandar Indera Mahkota, 25200, Kuantan, Pahang, Malaysia.,Research Unit for Bioinformatics and Computational Biology (RUBIC), Kulliyyah of Science, International Islamic University Malaysia (IIUM), Bandar Indera Mahkota, 25200, Kuantan, Pahang, Malaysia
| |
Collapse
|
3
|
Si D, Moritz SA, Pfab J, Hou J, Cao R, Wang L, Wu T, Cheng J. Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps. Sci Rep 2020; 10:4282. [PMID: 32152330 PMCID: PMC7063051 DOI: 10.1038/s41598-020-60598-y] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 02/10/2020] [Indexed: 11/29/2022] Open
Abstract
Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein's backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein's structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein's backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.
Collapse
Affiliation(s)
- Dong Si
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA.
| | - Spencer A Moritz
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA
| | - Jonas Pfab
- Division of Computing & Software Systems, University of Washington, Bothell, WA, 98011, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint Louis, MO, 63103, USA
- Program in Bioinformatics & Computational Biology, Saint Louis University, Saint Louis, MO, 63103, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Liguo Wang
- Department of Biological Structure, University of Washington, Seattle, WA, 98185, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
4
|
Conover M, Staples M, Si D, Sun M, Cao R. AngularQA: Protein Model Quality Assessment with LSTM Networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2019. [DOI: 10.1515/cmb-2019-0001] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Abstract
Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA
Collapse
Affiliation(s)
- Matthew Conover
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Max Staples
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Dong Si
- Division of Computing and Software Systems , University of Washington-Bothell , Bothell , WA 98011 , USA
| | - Miao Sun
- JingChi, Sunnyvale , CA 94089 , USA
| | - Renzhi Cao
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| |
Collapse
|
5
|
Masson MAC, Karpfenstein R, de Oliveira-Silva D, Teuler JM, Archirel P, Maître P, Correra TC. Evaluation of Ca2+ Binding Sites in Tacrolimus by Infrared Multiple Photon Dissociation Spectroscopy. J Phys Chem B 2018; 122:9860-9868. [DOI: 10.1021/acs.jpcb.8b06523] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Maria Angélica C. Masson
- Department of Fundamental Chemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes, 748, 05508-000 São Paulo, São Paulo, Brazil
| | - Renan Karpfenstein
- Department of Chemistry, Institute of Environmental, Chemical and Pharmaceutical Sciences, Federal University of São Paulo, St. Prof. Arthur Riedel 275, 09972-270 Diadema, São Paulo, Brazil
| | - Diogo de Oliveira-Silva
- Department of Chemistry, Institute of Environmental, Chemical and Pharmaceutical Sciences, Federal University of São Paulo, St. Prof. Arthur Riedel 275, 09972-270 Diadema, São Paulo, Brazil
| | - Jean-Marie Teuler
- Laboratoire de Chimie Physique, URM8000, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay, France
| | - Pierre Archirel
- Laboratoire de Chimie Physique, URM8000, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay, France
| | - Philippe Maître
- Laboratoire de Chimie Physique, URM8000, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay, France
| | - Thiago C. Correra
- Department of Fundamental Chemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes, 748, 05508-000 São Paulo, São Paulo, Brazil
| |
Collapse
|
6
|
Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016; 17:495. [PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/01/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, 67260, USA
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|