51
|
Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst 2021; 12:654-669.e3. [PMID: 34139171 PMCID: PMC8238390 DOI: 10.1016/j.cels.2021.05.017] [Citation(s) in RCA: 128] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 05/20/2021] [Accepted: 05/20/2021] [Indexed: 02/06/2023]
Abstract
Language models have recently emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases. From readily available sequence data alone, these models discover evolutionary, structural, and functional organization across protein space. Using language models, we can encode amino-acid sequences into distributed vector representations that capture their structural and functional properties, as well as evaluate the evolutionary fitness of sequence variants. We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. We then consider how these models can be enriched with prior biological knowledge and introduce an approach for encoding protein structural knowledge into the learned representations. The knowledge distilled by these models allows us to improve downstream function prediction through transfer learning. Deep protein language models are revolutionizing protein biology. They suggest new ways to approach protein and therapeutic design. However, further developments are needed to encode strong biological priors into protein language models and to increase their accessibility to the broader community.
Collapse
Affiliation(s)
- Tristan Bepler
- Simons Machine Learning Center, New York Structural Biology Center, New York, NY, USA; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
52
|
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci Rep 2021; 11:12295. [PMID: 34112907 PMCID: PMC8192766 DOI: 10.1038/s41598-021-91827-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/28/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.
Collapse
|
53
|
Sharma C, Nigam A, Singh R. Computational-approach understanding the structure-function prophecy of Fibrinolytic Protease RFEA1 from Bacillus cereus RSA1. PeerJ 2021; 9:e11570. [PMID: 34141495 PMCID: PMC8183432 DOI: 10.7717/peerj.11570] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 05/17/2021] [Indexed: 12/02/2022] Open
Abstract
Microbial fibrinolytic proteases are therapeutic enzymes responsible to ameliorate thrombosis, a fatal cardiac-disorder which effectuates due to excessive fibrin accumulation in blood vessels. Inadequacies such as low fibrin specificity, lethal after-effects and short life-span of available fibrinolytic enzymes stimulates an intensive hunt for novel, efficient and safe substitutes. Therefore, we herewith suggest a novel and potent fibrinolytic enzyme RFEA1 from Bacillus cereus RSA1 (MK288105). Although, attributes such as in-vitro purification, characterization and thrombolytic potential of RFEA1 were successfully accomplished in our previous study. However, it is known that structure-function traits and mode of action significantly aid to commercialization of an enzyme. Also, predicting structural model of a protein from its amino acid sequence is challenging in computational biology owing to intricacy of energy functions and inspection of vast conformational space. Our present study thus reports In-silico structural-functional analysis of RFEA1. Sequence based modelling approaches such as-Iterative threading ASSEmbly Refinement (I-TASSER), SWISS-MODEL, RaptorX and Protein Homology/analogY Recognition Engine V 2.0 (Phyre2) were employed to model three-dimensional structure of RFEA1 and the modelled RFEA1 was validated by structural analysis and verification server (SAVES v6.0). The modelled crystal structure revealed the presence of high affinity Ca1 binding site, associated with hydrogen bonds at Asp147, Leu181, Ile185 and Val187residues. RFEA1 is structurally analogous to Subtilisin E from Bacillus subtilis 168. Molecular docking analysis using PATCH DOCK and FIRE DOCK servers was performed to understand the interaction of RFEA1 with substrate fibrin. Strong RFEA1-fibrin interaction was observed with high binding affinity (-21.36 kcal/mol), indicating significant fibrinolytic activity and specificity of enzyme RFEA1. Overall, the computational research suggests that RFEA1 is a subtilisin-like serine endopeptidase with proteolytic potential, involved in thrombus hydrolysis.
Collapse
Affiliation(s)
- Chhavi Sharma
- Amity Institute of Microbial Technology, Amity University Uttar Pradesh, Noida, India
| | - Arti Nigam
- Department of Microbiology, Institute of Home Economics, Delhi University South Campus, Delhi, India
| | - Rajni Singh
- Amity Institute of Microbial Technology, Amity University Uttar Pradesh, Noida, India
| |
Collapse
|
54
|
Di Lena P, Baldi P. Fold recognition by scoring protein maps using the congruence coefficient. Bioinformatics 2021; 37:506-513. [PMID: 32976564 DOI: 10.1093/bioinformatics/btaa833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 09/07/2020] [Accepted: 09/10/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue-residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. RESULTS Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. AVAILABILITY AND IMPLEMENTATION The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pietro Di Lena
- Department of Computer Science and Engineering, University of Bologna, Bologna 40126, Italy
| | - Pierre Baldi
- Department of Computer Science, University of California, Irvine, CA 92697, USA.,Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA
| |
Collapse
|
55
|
Mulligan VK. Current directions in combining simulation-based macromolecular modeling approaches with deep learning. Expert Opin Drug Discov 2021; 16:1025-1044. [PMID: 33993816 DOI: 10.1080/17460441.2021.1918097] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Introduction: Structure-guided drug discovery relies on accurate computational methods for modeling macromolecules. Simulations provide means of predicting macromolecular folds, of discovering function from structure, and of designing macromolecules to serve as drugs. Success rates are limited for any of these tasks, however. Recently, deep neural network-based methods have greatly enhanced the accuracy of predictions of protein structure from sequence, generating excitement about the potential impact of deep learning.Areas covered: This review introduces biologists to deep neural network architecture, surveys recent successes of deep learning in structure prediction, and discusses emerging deep learning-based approaches for structure-function analysis and design. Particular focus is given to the interplay between simulation-based and neural network-based approaches.Expert opinion: As deep learning grows integral to macromolecular modeling, simulation- and neural network-based approaches must grow more tightly interconnected. Modular software architecture must emerge allowing both types of tools to be combined with maximal versatility. Open sharing of code under permissive licenses will be essential. Although experiments will remain the gold standard for reliable information to guide drug discovery, we may soon see successful drug development projects based on high-accuracy predictions from algorithms that combine simulation with deep learning - the ultimate validation of this combination's power.
Collapse
|
56
|
Protein Structure Prediction: Conventional and Deep Learning Perspectives. Protein J 2021; 40:522-544. [PMID: 34050498 DOI: 10.1007/s10930-021-10003-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2021] [Indexed: 10/21/2022]
Abstract
Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.
Collapse
|
57
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
58
|
Xu J, Mcpartlon M, Li J. Improved protein structure prediction by deep learning irrespective of co-evolution information. NAT MACH INTELL 2021; 3:601-609. [PMID: 34368623 PMCID: PMC8340610 DOI: 10.1038/s42256-021-00348-5] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Predicting the tertiary structure of a protein from its primary sequence has been greatly improved by integrating deep learning and co-evolutionary analysis, as shown in CASP13 and CASP14. We describe our latest study of this idea, analyzing the efficacy of network size and co-evolution data and its performance on both natural and designed proteins. We show that a large ResNet (convolutional residual neural networks) can predict structures of correct folds for 26 out of 32 CASP13 free-modeling (FM) targets and L/5 long-range contacts with precision over 80%. When co-evolution is not used ResNet still can predict structures of correct folds for 18 CASP13 FM targets, greatly exceeding previous methods that do not use co-evolution either. Even with only primary sequence ResNet can predict structures of correct folds for all tested human-designed proteins. In addition, ResNet may fare better for the designed proteins when trained without co-evolution than with co-evolution. These results suggest that ResNet does not simply denoise co-evolution signals, but instead may learn important protein sequence-structure relationship. This has important implications on protein design and engineering especially when co-evolutionary data is unavailable.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago
| | - Matthew Mcpartlon
- Department of Computer Science, University of Chicago.,Toyota Technological Institute at Chicago
| | - Jin Li
- Department of Computer Science, University of Chicago.,Toyota Technological Institute at Chicago
| |
Collapse
|
59
|
Alli-Balogun GO, Levine TP. Fungal Ice2p is in the same superfamily as SERINCs, restriction factors for HIV and other viruses. Proteins 2021; 89:1240-1250. [PMID: 33982326 DOI: 10.1002/prot.26145] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 05/10/2021] [Indexed: 12/13/2022]
Abstract
Ice2p is an integral endoplasmic reticulum (ER) membrane protein in budding yeast S. cerevisiae named ICE because it is required for Inheritance of Cortical ER. Ice2p has also been reported to be involved in an ER metabolic branch-point that regulates the flux of lipid either to be stored in lipid droplets or to be used as membrane components. Alternately, Ice2p has been proposed to act as a tether that physically bridges the ER at contact sites with both lipid droplets and the plasma membrane via a long loop on the protein's cytoplasmic face that contains multiple predicted amphipathic helices. Here we carried out a bioinformatic analysis to increase understanding of Ice2p. First, regarding topology, we found that diverse members of the fungal Ice2 family have 10 transmembrane helices (TMHs), which places the long loop on the exofacial face of Ice2p, where it cannot form inter-organelle bridges. Second, we identified Ice2p as a full-length homolog of SERINC (serine incorporator), a family of proteins with 10 TMHs found universally in eukaryotes. Since SERINCs are potent restriction factors for HIV and other viruses, study of Ice2p may reveal functions or mechanisms that shed light on viral restriction by SERINCs.
Collapse
Affiliation(s)
| | - Tim P Levine
- UCL Institute of Ophthalmology, University College London, London, UK
| |
Collapse
|
60
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
61
|
Guo Z, Wu T, Liu J, Hou J, Cheng J. Improving deep learning-based protein distance prediction in CASP14. Bioinformatics 2021; 37:3190-3196. [PMID: 33961009 PMCID: PMC8504632 DOI: 10.1093/bioinformatics/btab355] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/22/2021] [Accepted: 05/06/2021] [Indexed: 11/21/2022] Open
Abstract
Motivation Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. Availability and implementation The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, Saint. Louis, MO 63103, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
62
|
Li J, Xu J. Study of Real-Valued Distance Prediction for Protein Structure Prediction with Deep Learning. Bioinformatics 2021; 37:3197-3203. [PMID: 33961022 PMCID: PMC8504618 DOI: 10.1093/bioinformatics/btab333] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 03/07/2021] [Accepted: 04/28/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Inter-residue distance prediction by deep ResNet (convolutional residual neural network) has greatly advanced protein structure prediction. Currently the most successful structure prediction methods predict distance by discretizing it into dozens of bins. Here we study how well real-valued distance can be predicted and how useful it is for 3D structure modeling by comparing it with discrete-valued prediction based upon the same deep ResNet. RESULTS Different from the recent methods that predict only a single real value for the distance of an atom pair, we predict both the mean and standard deviation of a distance and then fold a protein by the predicted mean and deviation. Our findings include: 1) tested on the CASP13 FM (free-modeling) targets, our real-valued distance prediction obtains 81% precision on top L/5 long-range contact prediction, much better than the best CASP13 results (70%); 2) our real-valued prediction can predict correct folds for the same number of CASP13 FM targets as the best CASP13 group, despite generating only 20 decoys for each target; 3) our method greatly outperforms a very new real-valued prediction method DeepDist in both contact prediction and 3D structure modeling; and 4) when the same deep ResNet is used, our real-valued distance prediction has 1-6% higher contact and distance accuracy than our own discrete-valued prediction, but less accurate 3D structure models. AVAILABILITY AND IMPLEMENTATION https://github.com/j3xugit/RaptorX-3DModeling. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Li
- Toyota Technological Institute at Chicago, USA.,Department of Computer Science, University of Chicago, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| |
Collapse
|
63
|
Nallasamy V, S M. Bingham deep neural and oppositional fish swarm optimized protein structure prediction. J Biomol Struct Dyn 2021; 40:8706-8724. [PMID: 33955323 DOI: 10.1080/07391102.2021.1915181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
It is familiar that essential proteins take part in managing cellular activities in living organisms. Moreover, protein structure prediction from its amino acid sequence is advantageous to the comprehending of cellular functions. Formerly, several essential protein prediction methods have been proposed. However, those existing prediction methods were not satisfactory because to low sensitivity to imbalance characteristics. To address this issue, this paper presents a novel secondary protein structure prediction method, called, Bingham Deep Convolutional-based Oppositional Artificial Fish Optimized (BDC-OAFO). First, a protein structure identification framework, called, Bingham Distributed Deep Convolutional (BDDC) is designed to identify the essential proteins by eliminating the imbalanced learning issue. Next, secondary structure prediction framework, called, Oppositional Artificial Fish Swarm Optimization is proposed that obtain precise prediction results. Then, predicting secondary protein structure by emulating three biological behaviors of artificial fishes, including foraging behavior, following behavior, swarming behavior in which process, proximal count, oppositional function and Gaussian function are utilized. To evaluate the performance of BDC-OAFO method, we conduct experiments on Protein Data Bank dataset the experimental results show that our method BDC-OAFO achieves a better performance for identifying essential proteins and precise prediction in comparison with several other well-known prediction methods, which confirms the significance of BDC-OAFO.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Malarvizhi S
- Department of Computer Science, Thiruvalluvar Government Arts College, Namakkal, Tamil Nadu, India
| |
Collapse
|
64
|
Wu F, Xu J. Deep template-based protein structure prediction. PLoS Comput Biol 2021; 17:e1008954. [PMID: 33939695 PMCID: PMC8118551 DOI: 10.1371/journal.pcbi.1008954] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/13/2021] [Accepted: 04/11/2021] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. RESULTS This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.
Collapse
Affiliation(s)
- Fandi Wu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
| |
Collapse
|
65
|
Jain A, Terashi G, Kagaya Y, Maddhuri Venkata Subramaniya SR, Christoffer C, Kihara D. Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction. Sci Rep 2021; 11:7574. [PMID: 33828153 PMCID: PMC8027171 DOI: 10.1038/s41598-021-87204-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/25/2021] [Indexed: 12/12/2022] Open
Abstract
Protein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA's feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.
Collapse
Affiliation(s)
- Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Yuki Kagaya
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | | | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| |
Collapse
|
66
|
Flower TG, Hurley JH. Crystallographic molecular replacement using an in silico-generated search model of SARS-CoV-2 ORF8. Protein Sci 2021; 30:728-734. [PMID: 33625752 PMCID: PMC7980513 DOI: 10.1002/pro.4050] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/21/2021] [Accepted: 02/22/2021] [Indexed: 12/01/2022]
Abstract
The majority of crystal structures are determined by the method of molecular replacement (MR). The range of application of MR is limited mainly by the need for an accurate search model. In most cases, pre-existing experimentally determined structures are used as search models. In favorable cases, ab initio predicted structures have yielded search models adequate for MR. The ORF8 protein of SARS-CoV-2 represents a challenging case for MR using an ab initio prediction because ORF8 has an all β-sheet fold and few orthologs. We previously determined experimentally the structure of ORF8 using the single anomalous dispersion (SAD) phasing method, having been unable to find an MR solution to the crystallographic phase problem. Following a report of an accurate prediction of the ORF8 structure, we assessed whether the predicted model would have succeeded as an MR search model. A phase problem solution was found, and the resulting structure was refined, yielding structural parameters equivalent to the original experimental solution.
Collapse
Affiliation(s)
- Thomas G. Flower
- Department of Molecular and Cell Biology and California Institute for Quantitative BiosciencesUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - James H. Hurley
- Department of Molecular and Cell Biology and California Institute for Quantitative BiosciencesUniversity of CaliforniaBerkeleyCaliforniaUSA
| |
Collapse
|
67
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
68
|
Igashov I, Olechnovič L, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep convolutional neural network built on 3D Voronoi tessellation of protein structures. Bioinformatics 2021; 37:2332-2339. [PMID: 33620450 DOI: 10.1093/bioinformatics/btab118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/08/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Effective use of evolutionary information has recently led to tremendous progress in computational prediction of three-dimensional (3D) structures of proteins and their complexes. Despite the progress, the accuracy of predicted structures tends to vary considerably from case to case. Since the utility of computational models depends on their accuracy, reliable estimates of deviation between predicted and native structures are of utmost importance. RESULTS For the first time, we present a deep convolutional neural network (CNN) constructed on a Voronoi tessellation of 3D molecular structures. Despite the irregular data domain, our data representation allows us to efficiently introduce both convolution and pooling operations and train the network in an end-to-end fashion without precomputed descriptors. The resultant model, VoroCNN, predicts local qualities of 3D protein folds. The prediction results are competitive to state of the art and superior to the previous 3D CNN architectures built for the same task. We also discuss practical applications of VoroCNN, for example, in recognition of protein binding interfaces. AVAILABILITY The model, data, and evaluation tests are available at https://team.inria.fr/nano-d/software/vorocnn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Igashov
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Liment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Maria Kadukova
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Sergei Grudinin
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
69
|
Birhanu BT, Lee EB, Lee SJ, Park SC. Targeting Salmonella Typhimurium Invasion and Intracellular Survival Using Pyrogallol. Front Microbiol 2021; 12:631426. [PMID: 33603727 PMCID: PMC7884331 DOI: 10.3389/fmicb.2021.631426] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 01/07/2021] [Indexed: 01/25/2023] Open
Abstract
Salmonella enterica serovar Typhimurium, an intracellular pathogen, evades the host immune response mechanisms to cause gastroenteritis in animals and humans. After invading the host cells, the bacteria proliferate in Salmonella-containing vacuole (SCV) and escapes from antimicrobial therapy. Moreover, Salmonella Typhimurium develops resistance to various antimicrobials including, fluoroquinolones. Treating intracellular bacteria and combating drug resistance is essential to limit the infection rate. One way of overcoming these challenges is through combination therapy. In this study, Pyrogallol (PG), a polyphenol, is combined with marbofloxacin (MAR) to investigate its effect on Salmonella Typhimurium invasion and intracellular survival inhibition. The Minimum inhibitory concentration (MIC) and minimum bactericidal concentration (MBC) of PG against Salmonella Typhimurium were 128 and 256 μg/mL, respectively. The lowest fractional inhibitory concentration (FIC) index for a combination of PG and MAR was 0.5. The gentamycin protection assay revealed that PG (30 μg/mL) alone and in combination with sub-MIC of MAR inhibited 72.75 and 76.18% of the invading bacteria in Caco-2 cells, respectively. Besides, the intracellular survival of Salmonella Typhimurium was reduced by 7.69 and 74.36% in treatment with PG alone and combined with sub-MIC of MAR, respectively, which was visualized by the confocal microscopy. PG has also shown to increase the intracellular accumulation of fluoroquinolone by 15.2 and 34.9% at 30 and 100 μg/mL concentration, respectively. Quantitative real-time PCR demonstrated PG suppressed the genetic expression of hilA, invF, sipB, and acrA by 14.6, 15.4, 13.6, and 36%, respectively. However, the downregulation of hilA, invF, sipB, and acrA increased to 80, 74.6, 78, and 70.1%, in combination with sub-MIC of MAR, respectively. Similarly, PG combined with MAR inhibited the expression of sdiA, srgE, and rck genes by 78.6, 62.8, and 61.8%, respectively. In conclusion, PG has shown antimicrobial activity against Salmonella Typhimurium alone and in combination with MAR. It also inhibited invasion and intracellular survival of the bacteria through downregulation of quorum sensing, invading virulence, and efflux pump genes. Hence, PG could be a potential antimicrobial candidate which could limit the intracellular survival and replication of Salmonella Typhimurium.
Collapse
Affiliation(s)
- Biruk Tesfaye Birhanu
- Laboratory of Veterinary Pharmacokinetics and Pharmacodynamics, College of Veterinary Medicine, Kyungpook National University, Daegu, South Korea
| | - Eon-Bee Lee
- Laboratory of Veterinary Pharmacokinetics and Pharmacodynamics, College of Veterinary Medicine, Kyungpook National University, Daegu, South Korea
| | - Seung-Jin Lee
- Development and Reproductive Toxicology Research Group, Korea Institute of Toxicology, Daejeon, South Korea
| | - Seung-Chun Park
- Laboratory of Veterinary Pharmacokinetics and Pharmacodynamics, College of Veterinary Medicine, Kyungpook National University, Daegu, South Korea
| |
Collapse
|
70
|
Wu T, Guo Z, Hou J, Cheng J. DeepDist: real-value inter-residue distance prediction with deep residual convolutional network. BMC Bioinformatics 2021; 22:30. [PMID: 33494711 PMCID: PMC7831258 DOI: 10.1186/s12859-021-03960-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. RESULTS To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist's real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist's multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist's multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. CONCLUSIONS DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.
Collapse
Affiliation(s)
- Tianqi Wu
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Zhiye Guo
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
71
|
Adhikari B, Shrestha B, Bernardini M, Hou J, Lea J. DISTEVAL: a web server for evaluating predicted protein distances. BMC Bioinformatics 2021; 22:8. [PMID: 33407077 PMCID: PMC7788990 DOI: 10.1186/s12859-020-03938-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/15/2020] [Indexed: 05/30/2023] Open
Abstract
Background Protein inter-residue contact and distance prediction are two key intermediate steps essential to accurate protein structure prediction. Distance prediction comes in two forms: real-valued distances and ‘binned’ distograms, which are a more finely grained variant of the binary contact prediction problem. The latter has been introduced as a new challenge in the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14) 2020 experiment. Despite the recent proliferation of methods for predicting distances, few methods exist for evaluating these predictions. Currently only numerical metrics, which evaluate the entire prediction at once, are used. These give no insight into the structural details of a prediction. For this reason, new methods and tools are needed. Results We have developed a web server for evaluating predicted inter-residue distances. Our server, DISTEVAL, accepts predicted contacts, distances, and a true structure as optional inputs to generate informative heatmaps, chord diagrams, and 3D models. All of these outputs facilitate visual and qualitative assessment. The server also evaluates predictions using other metrics such as mean absolute error, root mean squared error, and contact precision. Conclusions The visualizations generated by DISTEVAL complement each other and collectively serve as a powerful tool for both quantitative and qualitative assessments of predicted contacts and distances, even in the absence of a true 3D structure.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, 312 Express Scripts Hall, St. Louis, MO, USA.
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, 312 Express Scripts Hall, St. Louis, MO, USA
| | - Matthew Bernardini
- Department of Computer Science, University of Missouri-St. Louis, 312 Express Scripts Hall, St. Louis, MO, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, 217 Ritter Hall, St. Louis, MO, USA
| | - Jamie Lea
- Department of Computer Science, University of Missouri-St. Louis, 312 Express Scripts Hall, St. Louis, MO, USA
| |
Collapse
|
72
|
Flower TG, Hurley JH. Crystallographic molecular replacement using an in silico-generated search model of SARS-CoV-2 ORF8. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.05.425441. [PMID: 33442695 PMCID: PMC7805452 DOI: 10.1101/2021.01.05.425441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The majority of crystal structures are determined by the method of molecular replacement (MR). The range of application of MR is limited mainly by the need for an accurate search model. In most cases, pre-existing experimentally determined structures are used as search models. In favorable cases, ab initio predicted structures have yielded search models adequate for molecular replacement. The ORF8 protein of SARS-CoV-2 represents a challenging case for MR using an ab initio prediction because ORF8 has an all β-sheet fold and few orthologs. We previously determined experimentally the structure of ORF8 using the single anomalous dispersion (SAD) phasing method, having been unable to find an MR solution to the crystallographic phase problem. Following a report of an accurate prediction of the ORF8 structure, we assessed whether the predicted model would have succeeded as an MR search model. A phase problem solution was found, and the resulting structure was refined, yielding structural parameters equivalent to the original experimental solution.
Collapse
Affiliation(s)
- Thomas G. Flower
- Department of Molecular and Cell Biology and California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720
| | - James H. Hurley
- Department of Molecular and Cell Biology and California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720
| |
Collapse
|
73
|
Rosário-Ferreira N, Marques-Pereira C, Gouveia RP, Mourão J, Moreira IS. Guardians of the Cell: State-of-the-Art of Membrane Proteins from a Computational Point-of-View. Methods Mol Biol 2021; 2315:3-28. [PMID: 34302667 DOI: 10.1007/978-1-0716-1468-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Membrane proteins (MPs) encompass a large family of proteins with distinct cellular functions, and although representing over 50% of existing pharmaceutical drug targets, their structural and functional information is still very scarce. Over the last years, in silico analysis and algorithm development were essential to characterize MPs and overcome some limitations of experimental approaches. The optimization and improvement of these methods remain an ongoing process, with key advances in MPs' structure, folding, and interface prediction being continuously tackled. Herein, we discuss the latest trends in computational methods toward a deeper understanding of the atomistic and mechanistic details of MPs.
Collapse
Affiliation(s)
- Nícia Rosário-Ferreira
- Coimbra Chemistry Center, Department of Chemistry, University of Coimbra, Coimbra, Portugal.,Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Catarina Marques-Pereira
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.,PhD Programme in Experimental Biology and Biomedicine, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, Coimbra, Portugal
| | - Raquel P Gouveia
- Center for Neuroscience and Cell Biology, Center for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
74
|
Zhang H, Shen Y. Template-based prediction of protein structure with deep learning. BMC Genomics 2020; 21:878. [PMID: 33372607 PMCID: PMC7771081 DOI: 10.1186/s12864-020-07249-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 11/18/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. RESULTS We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13's TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. CONCLUSIONS These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
- Program in Mathematical Genomics, Columbia University, New York, NY, USA.
| |
Collapse
|
75
|
Seffernick JT, Lindert S. Hybrid methods for combined experimental and computational determination of protein structure. J Chem Phys 2020; 153:240901. [PMID: 33380110 PMCID: PMC7773420 DOI: 10.1063/5.0026025] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 11/10/2020] [Indexed: 02/04/2023] Open
Abstract
Knowledge of protein structure is paramount to the understanding of biological function, developing new therapeutics, and making detailed mechanistic hypotheses. Therefore, methods to accurately elucidate three-dimensional structures of proteins are in high demand. While there are a few experimental techniques that can routinely provide high-resolution structures, such as x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-EM, which have been developed to determine the structures of proteins, these techniques each have shortcomings and thus cannot be used in all cases. However, additionally, a large number of experimental techniques that provide some structural information, but not enough to assign atomic positions with high certainty have been developed. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. In cases where it is not possible to determine the structure of a protein experimentally, computational structure prediction methods can be used as an alternative. Although computational methods can be performed without any experimental data in a large number of studies, inclusion of sparse experimental data into these prediction methods has yielded significant improvement. In this Perspective, we cover many of the successes of integrative modeling, computational modeling with experimental data, specifically for protein folding, protein-protein docking, and molecular dynamics simulations. We describe methods that incorporate sparse data from cryo-EM, NMR, mass spectrometry, electron paramagnetic resonance, small-angle x-ray scattering, Förster resonance energy transfer, and genetic sequence covariation. Finally, we highlight some of the major challenges in the field as well as possible future directions.
Collapse
Affiliation(s)
- Justin T. Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
76
|
Lv Z, Wang P, Zou Q, Jiang Q. Identification of Sub-Golgi protein localization by use of deep representation learning features. Bioinformatics 2020; 36:5600-5609. [PMID: 33367627 PMCID: PMC8023683 DOI: 10.1093/bioinformatics/btaa1074] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/10/2020] [Accepted: 12/14/2020] [Indexed: 12/11/2022] Open
Abstract
Motivation The Golgi apparatus has a key functional role in protein biosynthesis within the eukaryotic cell with malfunction resulting in various neurodegenerative diseases. For a better understanding of the Golgi apparatus, it is essential to identification of sub-Golgi protein localization. Although some machine learning methods have been used to identify sub-Golgi localization proteins by sequence representation fusion, more accurate sub-Golgi protein identification is still challenging by existing methodology. Results we developed a protein sub-Golgi localization identification protocol using deep representation learning features with 107 dimensions. By this protocol, we demonstrated that instead of multi-type protein sequence feature representation fusion as in previous state-of-the-art sub-Golgi-protein localization classifiers, it is sufficient to exploit only one type of feature representation for more accurately identification of sub-Golgi proteins. Compared with independent testing results for benchmark datasets, our protocol is able to perform generally, reliably and robustly for sub-Golgi protein localization prediction. Availabilityand implementation A use-friendly webserver is freely accessible at http://isGP-DRLF.aibiochem.net and the prediction code is accessible at https://github.com/zhibinlv/isGP-DRLF. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| |
Collapse
|
77
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
78
|
McGehee AJ, Bhattacharya S, Roche R, Bhattacharya D. PolyFold: An interactive visual simulator for distance-based protein folding. PLoS One 2020; 15:e0243331. [PMID: 33270805 PMCID: PMC7714222 DOI: 10.1371/journal.pone.0243331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/18/2020] [Indexed: 11/18/2022] Open
Abstract
Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Andrew J. McGehee
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- * E-mail:
| |
Collapse
|
79
|
Chen M, Chen X, Jin S, Lu W, Lin X, Wolynes PG. Protein Structure Refinement Guided by Atomic Packing Frustration Analysis. J Phys Chem B 2020; 124:10889-10898. [PMID: 32931278 DOI: 10.1021/acs.jpcb.0c06719] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent advances in machine learning, bioinformatics, and the understanding of the folding problem have enabled efficient predictions of protein structures with moderate accuracy, even for targets where there is little information from templates. All-atom molecular dynamics simulations provide a route to refine such predicted structures, but unguided atomistic simulations, even when lengthy in time, often fail to eliminate incorrect structural features that would prevent the structure from becoming more energetically favorable owing to the necessity of making large scale motions and to overcoming energy barriers for side chain repacking. In this study, we show that localizing packing frustration at atomic resolution by examining the statistics of the energetic changes that occur when the local environment of a site is changed allows one to identify the most likely locations of incorrect contacts. The global statistics of atomic resolution frustration in structures that have been predicted using various algorithms provide strong indicators of structural quality when tested over a database of 20 targets from previous CASP experiments. Residues that are more correctly located turn out to be more minimally frustrated than more poorly positioned sites. These observations provide a diagnosis of both global and local quality of predicted structures and thus can be used as guidance in all-atom refinement simulations of the 20 targets. Refinement simulations guided by atomic packing frustration turn out to be quite efficient and significantly improve the quality of the structures.
Collapse
Affiliation(s)
- Mingchen Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Xun Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Shikai Jin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| | - Wei Lu
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77030, United States
| | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77030, United States
| |
Collapse
|
80
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
81
|
Li B, Yang YT, Capra JA, Gerstein MB. Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput Biol 2020; 16:e1008291. [PMID: 33253214 PMCID: PMC7728386 DOI: 10.1371/journal.pcbi.1008291] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 12/10/2020] [Accepted: 08/26/2020] [Indexed: 12/22/2022] Open
Abstract
Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
Collapse
Affiliation(s)
- Bian Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Yucheng T. Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - John A. Capra
- Department of Biological Sciences and Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Mark B. Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
82
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
83
|
Du Z, Pan S, Wu Q, Peng Z, Yang J. CATHER: a novel threading algorithm with predicted contacts. Bioinformatics 2020; 36:2119-2125. [PMID: 31790141 DOI: 10.1093/bioinformatics/btz876] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/31/2019] [Accepted: 11/28/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. RESULTS We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/CATHER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Shuo Pan
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
84
|
Adhikari B. A fully open-source framework for deep learning protein real-valued distances. Sci Rep 2020; 10:13374. [PMID: 32770096 PMCID: PMC7414848 DOI: 10.1038/s41598-020-70181-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 07/23/2020] [Indexed: 11/12/2022] Open
Abstract
As deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this merging superhighway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predicting accurate models. However, deep learning methods that predict these distances are still in the early stages of their development. To advance these methods and develop other novel methods, a need exists for a small and representative dataset packaged for faster development and testing. In this work, we introduce protein distance net (PDNET), a framework that consists of one such representative dataset along with the scripts for training and testing deep learning methods. The framework also includes all the scripts that were used to curate the dataset, and generate the input features and distance maps. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how PDNET can be used to predict contacts, distance intervals, and real-valued distances.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO, 63132, USA.
| |
Collapse
|
85
|
Abriata LA, Dal Peraro M. State-of-the-art web services for de novo protein structure prediction. Brief Bioinform 2020; 22:5870389. [PMID: 34020540 DOI: 10.1093/bib/bbaa139] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 06/04/2020] [Accepted: 06/05/2020] [Indexed: 02/06/2023] Open
Abstract
Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.
Collapse
Affiliation(s)
- Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
86
|
Leman JK, Weitzner BD, Lewis SM, Adolf-Bryfogle J, Alam N, Alford RF, Aprahamian M, Baker D, Barlow KA, Barth P, Basanta B, Bender BJ, Blacklock K, Bonet J, Boyken SE, Bradley P, Bystroff C, Conway P, Cooper S, Correia BE, Coventry B, Das R, De Jong RM, DiMaio F, Dsilva L, Dunbrack R, Ford AS, Frenz B, Fu DY, Geniesse C, Goldschmidt L, Gowthaman R, Gray JJ, Gront D, Guffy S, Horowitz S, Huang PS, Huber T, Jacobs TM, Jeliazkov JR, Johnson DK, Kappel K, Karanicolas J, Khakzad H, Khar KR, Khare SD, Khatib F, Khramushin A, King IC, Kleffner R, Koepnick B, Kortemme T, Kuenze G, Kuhlman B, Kuroda D, Labonte JW, Lai JK, Lapidoth G, Leaver-Fay A, Lindert S, Linsky T, London N, Lubin JH, Lyskov S, Maguire J, Malmström L, Marcos E, Marcu O, Marze NA, Meiler J, Moretti R, Mulligan VK, Nerli S, Norn C, Ó'Conchúir S, Ollikainen N, Ovchinnikov S, Pacella MS, Pan X, Park H, Pavlovicz RE, Pethe M, Pierce BG, Pilla KB, Raveh B, Renfrew PD, Burman SSR, Rubenstein A, Sauer MF, Scheck A, Schief W, Schueler-Furman O, Sedan Y, Sevy AM, Sgourakis NG, Shi L, Siegel JB, Silva DA, Smith S, Song Y, Stein A, Szegedy M, Teets FD, Thyme SB, Wang RYR, Watkins A, Zimmerman L, Bonneau R. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods 2020; 17:665-680. [PMID: 32483333 PMCID: PMC7603796 DOI: 10.1038/s41592-020-0848-2] [Citation(s) in RCA: 402] [Impact Index Per Article: 100.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 04/22/2020] [Indexed: 12/12/2022]
Abstract
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, New York, USA.
| | - Brian D Weitzner
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Lyell Immunopharma Inc., Seattle, WA, USA
| | - Steven M Lewis
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biochemistry, Duke University, Durham, NC, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Jared Adolf-Bryfogle
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nawsad Alam
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Rebecca F Alford
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Aprahamian
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Kyle A Barlow
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Patrick Barth
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Baylor College of Medicine, Department of Pharmacology, Houston, TX, USA
| | - Benjamin Basanta
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Biological Physics Structure and Design PhD Program, University of Washington, Seattle, WA, USA
| | - Brian J Bender
- Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Kristin Blacklock
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Jaume Bonet
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Scott E Boyken
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Lyell Immunopharma Inc., Seattle, WA, USA
| | - Phil Bradley
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Chris Bystroff
- Department of Biological Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Patrick Conway
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Seth Cooper
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Coventry
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lorna Dsilva
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Roland Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Alexander S Ford
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Brandon Frenz
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Darwin Y Fu
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Caleb Geniesse
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD, USA
| | - Dominik Gront
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Sharon Guffy
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Scott Horowitz
- Department of Chemistry & Biochemistry, University of Denver, Denver, CO, USA
- The Knoebel Institute for Healthy Aging, University of Denver, Denver, CO, USA
| | - Po-Ssu Huang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Thomas Huber
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Tim M Jacobs
- Program in Bioinformatics and Computational Biology, Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - David K Johnson
- Center for Computational Biology, University of Kansas, Lawrence, KS, USA
| | - Kalli Kappel
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - John Karanicolas
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Hamed Khakzad
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute for Computational Science, University of Zurich, Zurich, Switzerland
- S3IT, University of Zurich, Zurich, Switzerland
| | - Karen R Khar
- Cyrus Biotechnology, Seattle, WA, USA
- Center for Computational Biology, University of Kansas, Lawrence, KS, USA
| | - Sagar D Khare
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Chemistry and Chemical Biology, The State University of New Jersey, Piscataway, NJ, USA
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Computational Biology and Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Firas Khatib
- Department of Computer and Information Science, University of Massachusetts Dartmouth, Dartmouth, MA, USA
| | - Alisa Khramushin
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Indigo C King
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Robert Kleffner
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Georg Kuenze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Daisuke Kuroda
- Medical Device Development and Regulation Research Center, School of Engineering, University of Tokyo, Tokyo, Japan
- Department of Bioengineering, School of Engineering, University of Tokyo, Tokyo, Japan
| | - Jason W Labonte
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Chemistry, Franklin & Marshall College, Lancaster, PA, USA
| | - Jason K Lai
- Baylor College of Medicine, Department of Pharmacology, Houston, TX, USA
| | - Gideon Lapidoth
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Andrew Leaver-Fay
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA
| | - Thomas Linsky
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Nir London
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Joseph H Lubin
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Lyskov
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jack Maguire
- Program in Bioinformatics and Computational Biology, Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Lars Malmström
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute for Computational Science, University of Zurich, Zurich, Switzerland
- S3IT, University of Zurich, Zurich, Switzerland
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Enrique Marcos
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Research in Biomedicine Barcelona, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Orly Marcu
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nicholas A Marze
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Departments of Chemistry, Pharmacology and Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
- Institute for Chemical Biology, Vanderbilt University, Nashville, TN, USA
| | - Rocco Moretti
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Vikram Khipple Mulligan
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Santrupti Nerli
- Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Christoffer Norn
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Shane Ó'Conchúir
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Noah Ollikainen
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Michael S Pacella
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Ryan E Pavlovicz
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Manasi Pethe
- Department of Chemistry and Chemical Biology, The State University of New Jersey, Piscataway, NJ, USA
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Kala Bharath Pilla
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Barak Raveh
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - P Douglas Renfrew
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Shourya S Roy Burman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Aliza Rubenstein
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Computational Biology and Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Marion F Sauer
- Chemical and Physical Biology Program, Vanderbilt Vaccine Center, Vanderbilt University, Nashville, TN, USA
| | - Andreas Scheck
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - William Schief
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yuval Sedan
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Alexander M Sevy
- Chemical and Physical Biology Program, Vanderbilt Vaccine Center, Vanderbilt University, Nashville, TN, USA
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Lei Shi
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Justin B Siegel
- Department of Chemistry, University of California, Davis, Davis, CA, USA
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, California, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | | | - Shannon Smith
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Yifan Song
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Amelie Stein
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Maria Szegedy
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Frank D Teets
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Summer B Thyme
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Ray Yu-Ruei Wang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Andrew Watkins
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Lior Zimmerman
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, New York, USA.
- Department of Computer Science, New York University, New York, NY, USA.
- Center for Data Science, New York University, New York, NY, USA.
| |
Collapse
|
87
|
Gress A, Kalinina OV. SphereCon-a method for precise estimation of residue relative solvent accessible area from limited structural information. Bioinformatics 2020; 36:3372-3378. [PMID: 32154837 DOI: 10.1093/bioinformatics/btaa159] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 02/28/2020] [Accepted: 03/04/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. RESULTS We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. AVAILABILITY AND IMPLEMENTATION https://github.com/kalininalab/spherecon. CONTACT alexander.gress@helmholtz-hips.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander Gress
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Graduate School of Computer Science, Saarland University, Saarbrücken 66123, Germany
| | - Olga V Kalinina
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken 66123, Germany.,Medical Faculty, Saarland University, Homburg 66421, Germany
| |
Collapse
|
88
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020; 577:706-710. [PMID: 31942072 DOI: 10.1038/s41586-019-1923-7] [Citation(s) in RCA: 1362] [Impact Index Per Article: 340.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 12/10/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T Jones
- The Francis Crick Institute, London, UK.,University College London, London, UK
| | | | | | | |
Collapse
|
89
|
Park T, Woo H, Baek M, Yang J, Seok C. Structure prediction of biological assemblies using GALAXY in CAPRI rounds 38-45. Proteins 2019; 88:1009-1017. [PMID: 31774573 DOI: 10.1002/prot.25859] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 11/11/2019] [Accepted: 11/23/2019] [Indexed: 12/12/2022]
Abstract
We participated in CARPI rounds 38-45 both as a server predictor and a human predictor. These CAPRI rounds provided excellent opportunities for testing prediction methods for three classes of protein interactions, that is, protein-protein, protein-peptide, and protein-oligosaccharide interactions. Both template-based methods (GalaxyTBM for monomer protein, GalaxyHomomer for homo-oligomer protein, GalaxyPepDock for protein-peptide complex) and ab initio docking methods (GalaxyTongDock and GalaxyPPDock for protein oligomer, GalaxyPepDock-ab-initio for protein-peptide complex, GalaxyDock2 and Galaxy7TM for protein-oligosaccharide complex) have been tested. Template-based methods depend heavily on the availability of proper templates and template-target similarity, and template-target difference is responsible for inaccuracy of template-based models. Inaccurate template-based models could be improved by our structure refinement and loop modeling methods based on physics-based energy optimization (GalaxyRefineComplex and GalaxyLoop) for several CAPRI targets. Current ab initio docking methods require accurate protein structures as input. Small conformational changes from input structure could be accounted for by our docking methods, producing one of the best models for several CAPRI targets. However, predicting large conformational changes involving protein backbone is still challenging, and full exploration of physics-based methods for such problems is still to come.
Collapse
Affiliation(s)
- Taeyong Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hyeonuk Woo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Minkyung Baek
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jinsol Yang
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
90
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 2019; 87:1141-1148. [PMID: 31602685 PMCID: PMC7079254 DOI: 10.1002/prot.25834] [Citation(s) in RCA: 169] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/17/2022]
Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T. Jones
- The Francis Crick InstituteLondonUK
- University College LondonLondonUK
| | | | | | | |
Collapse
|
91
|
Heo L, Feig M. High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 2019; 88:637-642. [PMID: 31693199 DOI: 10.1002/prot.25847] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/05/2019] [Accepted: 11/03/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| |
Collapse
|
92
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
93
|
Wu T, Hou J, Adhikari B, Cheng J. Analysis of several key factors influencing deep learning-based inter-residue contact prediction. Bioinformatics 2019; 36:1091-1098. [PMID: 31504181 PMCID: PMC7703788 DOI: 10.1093/bioinformatics/btz679] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/02/2019] [Accepted: 08/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated. RESULTS We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/multicom-toolbox/DNCON2/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri, St. Louis, MO 63121, USA
| | | |
Collapse
|