1
|
Perdiguero B, Marcos-Villar L, López-Bravo M, Sánchez-Cordón PJ, Zamora C, Valverde JR, Sorzano CÓS, Sin L, Álvarez E, Ramos M, Del Val M, Esteban M, Gómez CE. Immunogenicity and efficacy of a novel multi-patch SARS-CoV-2/COVID-19 vaccine candidate. Front Immunol 2023; 14:1160065. [PMID: 37404819 PMCID: PMC10316789 DOI: 10.3389/fimmu.2023.1160065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/30/2023] [Indexed: 07/06/2023] Open
Abstract
Introduction While there has been considerable progress in the development of vaccines against SARS-CoV-2, largely based on the S (spike) protein of the virus, less progress has been made with vaccines delivering different viral antigens with cross-reactive potential. Methods In an effort to develop an immunogen with the capacity to induce broad antigen presentation, we have designed a multi-patch synthetic candidate containing dominant and persistent B cell epitopes from conserved regions of SARS-CoV-2 structural proteins associated with long-term immunity, termed CoV2-BMEP. Here we describe the characterization, immunogenicity and efficacy of CoV2-BMEP using two delivery platforms: nucleic acid DNA and attenuated modified vaccinia virus Ankara (MVA). Results In cultured cells, both vectors produced a main protein of about 37 kDa as well as heterogeneous proteins with size ranging between 25-37 kDa. In C57BL/6 mice, both homologous and heterologous prime/boost combination of vectors induced the activation of SARS-CoV-2-specific CD4 and CD8 T cell responses, with a more balanced CD8+ T cell response detected in lungs. The homologous MVA/MVA immunization regimen elicited the highest specific CD8+ T cell responses in spleen and detectable binding antibodies (bAbs) to S and N antigens of SARS-CoV-2. In SARS-CoV-2 susceptible k18-hACE2 Tg mice, two doses of MVA-CoV2-BMEP elicited S- and N-specific bAbs as well as cross-neutralizing antibodies against different variants of concern (VoC). After SARS-CoV-2 challenge, all animals in the control unvaccinated group succumbed to the infection while vaccinated animals with high titers of neutralizing antibodies were fully protected against mortality, correlating with a reduction of virus infection in the lungs and inhibition of the cytokine storm. Discussion These findings revealed a novel immunogen with the capacity to control SARS-CoV-2 infection, using a broader antigen presentation mechanism than the approved vaccines based solely on the S antigen.
Collapse
Affiliation(s)
- Beatriz Perdiguero
- Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Laura Marcos-Villar
- Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - María López-Bravo
- Department of Microbial Biotechnology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Pedro J. Sánchez-Cordón
- Veterinary Pathology Department, Centro de Investigación en Sanidad Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Carmen Zamora
- Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - José Ramón Valverde
- Scientific Computing, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Carlos Óscar S. Sorzano
- Biocomputing Unit and Computational Genomics, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Laura Sin
- Centro de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Enrique Álvarez
- Centro de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| | - Manuel Ramos
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid, Madrid, Spain
| | - Margarita Del Val
- Centro de Biología Molecular Severo Ochoa, Consejo Superior de Investigaciones Científicas and Universidad Autónoma de Madrid, Madrid, Spain
| | - Mariano Esteban
- Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Carmen Elena Gómez
- Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| |
Collapse
|
2
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
3
|
Rachitskii P, Kruglov I, Finkelstein AV, Oganov AR. Protein structure prediction using the evolutionary algorithm USPEX. Proteins 2023. [PMID: 36780132 DOI: 10.1002/prot.26478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 11/08/2022] [Accepted: 02/06/2023] [Indexed: 02/14/2023]
Abstract
Protein structure prediction is one of major problems of modern biophysics: current attempts to predict the tertiary protein structure from amino acid sequence are successful mostly when the use of big data and machine learning allows one to reduce the "prediction problem" to the "problem of recognition". Compared with recent successes of deep learning, classical predictive methods lag behind in their accuracy for the prediction of stable conformations. Therefore, in this work we extended the evolutionary algorithm USPEX to predict protein structure based on global optimization starting with the amino acid sequence. Moreover, we compared frequently used force fields for the task of protein structure prediction. Protein structure relaxation and energy calculations were performed using Tinker (with several different force fields) and Rosetta (with REF2015 force field) codes. To create new protein structure models in the USPEX algorithm, we developed novel variation operators. The test of the new method on seven proteins having (for simplicity) no cis-proline (with ω ≈ 0°) residues, and a length of up to 100 residues, revealed that our algorithm predicts tertiary structures of proteins with high accuracy. The comparison of the final potential energies of the predicted protein structures obtained using the USPEX and the Rosetta Abinitio approach showed that in most cases the developed algorithm found structures with close or even lower energy (Amber/Charmm/Oplsaal) and scoring function (REF2015). While USPEX has clearly demonstrated its ability to find very deep energy minima, our study showed that the existing force fields are not sufficiently accurate for accurate blind prediction of protein structures without further experimental verification.
Collapse
Affiliation(s)
| | - Ivan Kruglov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia.,Dukhov Research Institute of Automatics (VNIIA), Moscow, Russia
| | - Alexei V Finkelstein
- Institute of Protein Research of the Russian Academy of Sciences, Moscow, Russia.,Biology Department of the Lomonosov Moscow State University, Moscow, Russia.,Biotechnology Department of the Lomonosov Moscow State University, Moscow, Russia
| | - Artem R Oganov
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia
| |
Collapse
|
4
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
5
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
6
|
Aderinwale T, Bharadwaj V, Christoffer C, Terashi G, Zhang Z, Jahandideh R, Kagaya Y, Kihara D. Real-time structure search and structure classification for AlphaFold protein models. Commun Biol 2022; 5:316. [PMID: 35383281 PMCID: PMC8983703 DOI: 10.1038/s42003-022-03261-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 03/11/2022] [Indexed: 11/17/2022] Open
Abstract
Last year saw a breakthrough in protein structure prediction, where the AlphaFold2 method showed a substantial improvement in the modeling accuracy. Following the software release of AlphaFold2, predicted structures by AlphaFold2 for proteins in 21 species were made publicly available via the AlphaFold Database. Here, to facilitate structural analysis and application of AlphaFold2 models, we provide the infrastructure, 3D-AF-Surfer, which allows real-time structure-based search for the AlphaFold2 models. In 3D-AF-Surfer, structures are represented with 3D Zernike descriptors (3DZD), which is a rotationally invariant, mathematical representation of 3D shapes. We developed a neural network that takes 3DZDs of proteins as input and retrieves proteins of the same fold more accurately than direct comparison of 3DZDs. Using 3D-AF-Surfer, we report structure classifications of AlphaFold2 models and discuss the correlation between confidence levels of AlphaFold2 models and intrinsic disordered regions.
Collapse
Affiliation(s)
- Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Vijay Bharadwaj
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | | | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
7
|
Geethu S, Vimina ER. Improved 3-D Protein Structure Predictions using Deep ResNet Model. Protein J 2021; 40:669-681. [PMID: 34510309 DOI: 10.1007/s10930-021-10016-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2021] [Indexed: 10/20/2022]
Abstract
Protein Structure Prediction (PSP) is considered to be a complicated problem in computational biology. In spite of, the remarkable progress made by the co-evolution-based method in PSP, it is still a challenging and unresolved problem. Recently, along with co-evolutionary relationships, deep learning approaches have been introduced in PSP that lead to significant progress. In this paper a novel methodology using deep ResNet architecture for predicting inter-residue distance and dihedral angles is proposed, that aims to generate 125 homologous sequences in an average from a set of customized sequence database. These sequences are used to generate input features. As an outcome of neural networks, a pool of structures is generated from which the lowest potential structure is chosen as the final predicted 3-D protein structure. The proposed method is trained using 6521 protein sequences extracted from Protein Data Bank (PDB). For testing 48 protein sequences whose residue length is less than 400 residues are chosen from the 13th Critical Assessment of protein Structure Prediction (CASP 13) dataset are used. The model is compared with Alphafold, Zhang, and RaptorX. The template modeling (TM) score is used to evaluate the accuracy of the estimated structure. The proposed method produces better performances for 52% of the target sequences while that of Alphafold, Zhang, RaptorX were 10%, 22.9%, and 6% respectively. Additionally, for 37.5% target sequences, the proposed method was able to achieve accuracy greater than or equal to 0.80. The TM score obtained for the sequences under consideration were 0.69, 0.67, 0.65, and 0.58 respectively for the proposed method, Alphafold, Zhang, and RaptorX.
Collapse
Affiliation(s)
- S Geethu
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Ernakulam, India.
| | - E R Vimina
- Department of Computer Science and IT, Amrita School of Arts and Sciences, Amrita Vishwa Vidyapeetham, Kochi Campus, Ernakulam, India
| |
Collapse
|
8
|
Protein Structure Prediction: Conventional and Deep Learning Perspectives. Protein J 2021; 40:522-544. [PMID: 34050498 DOI: 10.1007/s10930-021-10003-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2021] [Indexed: 10/21/2022]
Abstract
Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.
Collapse
|
9
|
Roche R, Bhattacharya S, Bhattacharya D. Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins. PLoS Comput Biol 2021; 17:e1008753. [PMID: 33621244 PMCID: PMC7935296 DOI: 10.1371/journal.pcbi.1008753] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 03/05/2021] [Accepted: 01/31/2021] [Indexed: 11/18/2022] Open
Abstract
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS. Predicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
- * E-mail:
| |
Collapse
|
10
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
11
|
Dhingra S, Sowdhamini R, Cadet F, Offmann B. A glance into the evolution of template-free protein structure prediction methodologies. Biochimie 2020; 175:85-92. [DOI: 10.1016/j.biochi.2020.04.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/24/2020] [Accepted: 04/27/2020] [Indexed: 11/26/2022]
|
12
|
Hou J, Adhikari B, Tanner JJ, Cheng J. SAXSDom: Modeling multidomain protein structures using small-angle X-ray scattering data. Proteins 2020; 88:775-787. [PMID: 31860156 PMCID: PMC7230021 DOI: 10.1002/prot.25865] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/18/2019] [Accepted: 12/14/2019] [Indexed: 12/27/2022]
Abstract
Many proteins are composed of several domains that pack together into a complex tertiary structure. Multidomain proteins can be challenging for protein structure modeling, particularly those for which templates can be found for individual domains but not for the entire sequence. In such cases, homology modeling can generate high quality models of the domains but not for the orientations between domains. Small-angle X-ray scattering (SAXS) reports the structural properties of entire proteins and has the potential for guiding homology modeling of multidomain proteins. In this article, we describe a novel multidomain protein assembly modeling method, SAXSDom that integrates experimental knowledge from SAXS with probabilistic Input-Output Hidden Markov model to assemble the structures of individual domains together. Four SAXS-based scoring functions were developed and tested, and the method was evaluated on multidomain proteins from two public datasets. Incorporation of SAXS information improved the accuracy of domain assembly for 40 out of 46 critical assessment of protein structure prediction multidomain protein targets and 45 out of 73 multidomain protein targets from the ab initio domain assembly dataset. The results demonstrate that SAXS data can provide useful information to improve the accuracy of domain-domain assembly. The source code and tool packages are available at https://github.com/jianlin-cheng/SAXSDom.
Collapse
Affiliation(s)
- Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, Saint Louis, MO 63121, USA
| | - John J. Tanner
- Departments of Biochemistry and Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
13
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
14
|
The MULTICOM Protein Structure Prediction Server Empowered by Deep Learning and Contact Distance Prediction. Methods Mol Biol 2020; 2165:13-26. [PMID: 32621217 DOI: 10.1007/978-1-0716-0708-4_2] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Prediction of the three-dimensional (3D) structure of a protein from its sequence is important for studying its biological function. With the advancement in deep learning contact distance prediction and residue-residue coevolutionary analysis, significant progress has been made in both template-based and template-free protein structure prediction in the last several years. Here, we provide a practical guide for our latest MULTICOM protein structure prediction system built on top of the latest advances, which was rigorously tested in the 2018 CASP13 experiment. Its specific functionalities include: (1) prediction of 1D structural features (secondary structure, solvent accessibility, disordered regions) and 2D interresidue contacts; (2) domain boundary prediction; (3) template-based (or homology) 3D structure modeling; (4) contact distance-driven ab initio 3D structure modeling; and (5) large-scale protein quality assessment enhanced by deep learning and predicted contacts. The MULTICOM web server ( http://sysbio.rnet.missouri.edu/multicom_cluster/ ) presents all the 1D, 2D, and 3D prediction results and quality assessment to users via user-friendly web interfaces and e-mails. The source code of the MULTICOM package is also available at https://github.com/multicom-toolbox/multicom .
Collapse
|
15
|
Yu Z, Yao Y, Deng H, Yi M. ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment. BMC Bioinformatics 2019; 20:299. [PMID: 31159742 PMCID: PMC6547486 DOI: 10.1186/s12859-019-2898-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/13/2019] [Indexed: 01/05/2023] Open
Abstract
Background The knowledge-based statistical potential has been widely used in protein structure modeling and model quality assessment. They are commonly evaluated based on their abilities of native recognition as well as decoy discrimination. However, these two aspects are found to be mutually exclusive in many statistical potentials. Results We developed an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter. When distance cutoff is ≤9.0 Å, “effective atomic interaction” is employed to enhance the ability of native recognition. For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination. Benchmark tests on 632 structural decoy sets from diverse sources demonstrate that ANDIS outperforms other state-of-the-art potentials in both native recognition and decoy discrimination. Conclusions Distance cutoff is a crucial parameter for distance-dependent statistical potentials. A lower distance cutoff is better for native recognition, while a higher one is favorable for decoy discrimination. The ANDIS potential is freely available as a standalone application at http://qbp.hzau.edu.cn/ANDIS/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2898-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongwang Yu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
16
|
Conover M, Staples M, Si D, Sun M, Cao R. AngularQA: Protein Model Quality Assessment with LSTM Networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2019. [DOI: 10.1515/cmb-2019-0001] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Abstract
Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA
Collapse
Affiliation(s)
- Matthew Conover
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Max Staples
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Dong Si
- Division of Computing and Software Systems , University of Washington-Bothell , Bothell , WA 98011 , USA
| | - Miao Sun
- JingChi, Sunnyvale , CA 94089 , USA
| | - Renzhi Cao
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| |
Collapse
|
17
|
Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019; 87:1165-1178. [PMID: 30985027 PMCID: PMC6800999 DOI: 10.1002/prot.25697] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 04/04/2019] [Accepted: 04/12/2019] [Indexed: 12/28/2022]
Abstract
Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.
Collapse
Affiliation(s)
- Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, Washington
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
18
|
Flot M, Mishra A, Kuchi AS, Hoque MT. StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence. Methods Mol Biol 2019; 1958:101-122. [PMID: 30945215 DOI: 10.1007/978-1-4939-9161-7_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Supersecondary structure (SSS) refers to specific geometric arrangements of several secondary structure (SS) elements that are connected by loops. The SSS can provide useful information about the spatial structure and function of a protein. As such, the SSS is a bridge between the secondary structure and tertiary structure. In this chapter, we propose a stacking-based machine learning method for the prediction of two types of SSSs, namely, β-hairpins and β-α-β, from the protein sequence based on comprehensive feature encoding. To encode protein residues, we utilize key features such as solvent accessibility, conservation profile, half surface exposure, torsion angle fluctuation, disorder probabilities, and more. The usefulness of the proposed approach is assessed using a widely used threefold cross-validation technique. The obtained empirical result shows that the proposed approach is useful and prediction can be improved further.
Collapse
Affiliation(s)
- Michael Flot
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Avdesh Mishra
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Aditi Sharma Kuchi
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Md Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
19
|
Gadzała M, Dułak D, Kalinowska B, Baster Z, Bryliński M, Konieczny L, Banach M, Roterman I. The aqueous environment as an active participant in the protein folding process. J Mol Graph Model 2018; 87:227-239. [PMID: 30580160 DOI: 10.1016/j.jmgm.2018.12.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 12/05/2018] [Accepted: 12/12/2018] [Indexed: 01/27/2023]
Abstract
Existing computational models applied in the protein structure prediction process do not sufficiently account for the presence of the aqueous solvent. The solvent is usually represented by a predetermined number of H2O molecules in the bounding box which contains the target chain. The fuzzy oil drop (FOD) model, presented in this paper, follows an alternative approach, with the solvent assuming the form of a continuous external hydrophobic force field, with a Gaussian distribution. The effect of this force field is to guide hydrophobic residues towards the center of the protein body, while promoting exposure of hydrophilic residues on its surface. This work focuses on the following sample proteins: Engrailed homeodomain (RCSB: 1enh), Chicken villin subdomain hp-35, n68h (RCSB: 1yrf), Chicken villin subdomain hp-35, k65(nle), n68h, k70(nle) (RCSB: 2f4k), Thermostable subdomain from chicken villin headpiece (RCSB: 1vii), de novo designed single chain three-helix bundle (a3d) (RCSB: 2a3d), albumin-binding domain (RCSB: 1prb) and lambda repressor-operator complex (RCSB: 1lmb).
Collapse
Affiliation(s)
| | - Dawid Dułak
- ABB Business Services Sp. z o.o. ul. Żegańska 1, 04-713, Warszawa, Poland.
| | - Barbara Kalinowska
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 11 Łojasiewicza Street, Kraków, Poland; Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland
| | - Zbigniew Baster
- Department of Molecular and Interfacial Biophysics, Faculty of Physics, Astronomy, Applied Computer Science Jagiellonian University, 11 Łojasiewicza Street, Kraków, Poland; Markey Cancer Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, USA
| | - Michał Bryliński
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA; Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, 70803, USA
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Jagiellonian University - Medical College, Kopernika 7E, 31-034, Kraków, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Łazarza 16, 31-530, Kraków, Poland.
| |
Collapse
|
20
|
Basith S, Manavalan B, Shin TH, Lee G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput Struct Biotechnol J 2018; 16:412-420. [PMID: 30425802 PMCID: PMC6222285 DOI: 10.1016/j.csbj.2018.10.007] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 10/04/2018] [Accepted: 10/12/2018] [Indexed: 11/27/2022] Open
Abstract
A soluble carrier growth hormone binding protein (GHBP) that can selectively and non-covalently interact with growth hormone, thereby acting as a modulator or inhibitor of growth hormone signalling. Accurate identification of the GHBP from a given protein sequence also provides important clues for understanding cell growth and cellular mechanisms. In the postgenomic era, there has been an abundance of protein sequence data garnered, hence it is crucial to develop an automated computational method which enables fast and accurate identification of putative GHBPs within a vast number of candidate proteins. In this study, we describe a novel machine-learning-based predictor called iGHBP for the identification of GHBP. In order to predict GHBP from a given protein sequence, we trained an extremely randomised tree with an optimal feature set that was obtained from a combination of dipeptide composition and amino acid index values by applying a two-step feature selection protocol. During cross-validation analysis, iGHBP achieved an accuracy of 84.9%, which was ~7% higher than the control extremely randomised tree predictor trained with all features, thus demonstrating the effectiveness of our feature selection protocol. Furthermore, when objectively evaluated on an independent data set, our proposed iGHBP method displayed superior performance compared to the existing method. Additionally, a user-friendly web server that implements the proposed iGHBP has been established and is available at http://thegleelab.org/iGHBP.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | | | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
21
|
de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018; 34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. Results We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Availability and implementation Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. Contact saulo.deoliveira@dtc.ox.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Eleanor C Law
- Department of Statistics, University of Oxford, Oxford, UK
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough, UK
- Division of Physical Biology, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
22
|
Identify High-Quality Protein Structural Models by Enhanced K-Means. BIOMED RESEARCH INTERNATIONAL 2017; 2017:7294519. [PMID: 28421198 PMCID: PMC5381204 DOI: 10.1155/2017/7294519] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Revised: 02/09/2017] [Accepted: 02/19/2017] [Indexed: 01/01/2023]
Abstract
Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means.
Collapse
|
23
|
Li H, Lyu Q, Cheng J. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning. ACTA ACUST UNITED AC 2016; 9:306-313. [PMID: 29081613 PMCID: PMC5658031 DOI: 10.4172/jpb.1000419] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Collapse
Affiliation(s)
- Haiou Li
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China.,Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Qiang Lyu
- Department of Computer Science and Technology, Soochow University, Suzhou, 215006, China
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
24
|
Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J. ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics 2016; 17:517. [PMID: 27923350 PMCID: PMC5142288 DOI: 10.1186/s12859-016-1404-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/01/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In recent years, successful contact prediction methods and contact-guided ab initio protein structure prediction methods have highlighted the importance of incorporating contact information into protein structure prediction methods. It is also observed that for almost all globular proteins, the quality of contact prediction dictates the accuracy of structure prediction. Hence, like many existing evaluation measures for evaluating 3D protein models, various measures are currently used to evaluate predicted contacts, with the most popular ones being precision, coverage and distance distribution score (Xd). RESULTS We have built a web application and a downloadable tool, ConEVA, for comprehensive assessment and detailed comparison of predicted contacts. Besides implementing existing measures for contact evaluation we have implemented new and useful methods of contact visualization using chord diagrams and comparison using Jaccard similarity computations. For a set (or sets) of predicted contacts, the web application runs even when a native structure is not available, visualizing the contact coverage and similarity between predicted contacts. We applied the tool on various contact prediction data sets and present our findings and insights we obtained from the evaluation of effective contact assessments. ConEVA is publicly available at http://cactus.rnet.missouri.edu/coneva/ . CONCLUSION ConEVA is useful for a range of contact related analysis and evaluations including predicted contact comparison, investigation of individual protein folding using predicted contacts, and analysis of contacts in a structure of interest.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jackson Nowotny
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | | | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
- C. Bond Life Science Center, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
25
|
Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016; 17:495. [PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/01/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, 67260, USA
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|