1
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
2
|
Huang B, Kong L, Wang C, Ju F, Zhang Q, Zhu J, Gong T, Zhang H, Yu C, Zheng WM, Bu D. Protein Structure Prediction: Challenges, Advances, and the Shift of Research Paradigms. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:913-925. [PMID: 37001856 PMCID: PMC10928435 DOI: 10.1016/j.gpb.2022.11.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/23/2022] [Accepted: 11/30/2022] [Indexed: 03/31/2023]
Abstract
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields, including biochemistry, medicine, physics, mathematics, and computer science. These researchers adopt various research paradigms to attack the same structure prediction problem: biochemists and physicists attempt to reveal the principles governing protein folding; mathematicians, especially statisticians, usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure, while computer scientists formulate protein structure prediction as an optimization problem - finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure. These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman, namely, data modeling and algorithmic modeling. Recently, we have also witnessed the great success of deep learning in protein structure prediction. In this review, we present a survey of the efforts for protein structure prediction. We compare the research paradigms adopted by researchers from different fields, with an emphasis on the shift of research paradigms in the era of deep learning. In short, the algorithmic modeling techniques, especially deep neural networks, have considerably improved the accuracy of protein structure prediction; however, theories interpreting the neural networks and knowledge on protein folding are still highly desired.
Collapse
Affiliation(s)
- Bin Huang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lupeng Kong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; Changping Laboratory, Beijing 102206, China
| | - Chao Wang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing 100080, China
| | - Qi Zhang
- Huawei Noah's Ark Lab, Wuhan 430206, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing 100080, China
| | - Tiansu Gong
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haicang Zhang
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Chungong Yu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.
| | - Dongbo Bu
- Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100049, China; Zhongke Big Data Academy, Zhengzhou 450046, China.
| |
Collapse
|
3
|
Zhao K, Xia Y, Zhang F, Zhou X, Li SZ, Zhang G. Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader. Commun Biol 2023; 6:243. [PMID: 36871126 PMCID: PMC9985440 DOI: 10.1038/s42003-023-04605-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/16/2023] [Indexed: 03/06/2023] Open
Abstract
Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we propose a method, PAthreader, to recognize remote templates and explore folding pathways. Firstly, we design a three-track alignment between predicted distance profiles and structure profiles extracted from PDB and AlphaFold DB, to improve the recognition accuracy of remote templates. Secondly, we improve the performance of AlphaFold2 using the templates identified by PAthreader. Thirdly, we explore protein folding pathways based on our conjecture that dynamic folding information of protein is implicitly contained in its remote homologs. The results show that the average accuracy of PAthreader templates is 11.6% higher than that of HHsearch. In terms of structure modelling, PAthreader outperform AlphaFold2 and ranks first on the CAMEO blind test for the latest three months. Furthermore, we predict protein folding pathways for 37 proteins, in which the results of 7 proteins are almost consistent with those of biological experiments, and the other 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Fujin Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Stan Z Li
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, 310030, Zhejiang, China.
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
4
|
Skariyachan S, Praveen PKU, Uttarkar A, Niranjan V. Computational design of prospective molecular targets for Burkholderia cepacia complex by molecular docking and dynamic simulation studies. Proteins 2023; 91:724-738. [PMID: 36601892 DOI: 10.1002/prot.26462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/27/2022] [Accepted: 01/02/2023] [Indexed: 01/06/2023]
Abstract
The study aimed to screen prospective molecular targets of BCC and potential natural lead candidates as effective binders by computational modeling, molecular docking, and dynamic (MD) simulation studies. Based on the virulent functions, tRNA 5-methylaminomethyl-2-thiouridine biosynthesis bifunctional protein (mnmC) and pyrimidine/purine nucleoside phosphorylase (ppnP) were selected as the prospective molecular targets. In the absence of experimental data, the three-dimensional (3D) structures of these targets were computationally predicted. After a thorough literature survey and database search, the drug-likeness, and pharmacokinetic properties of 70 natural molecules were computationally predicted and the effectual binding of the best lead molecules against both the targets was predicted by molecular docking. The stabilities of the best-docked complexes were validated by MD simulation and the binding energy calculations were carried out by MM-GBSA approaches. The present study revealed that the hypothetical models of mnmC and ppnP showed stereochemical accuracy. The study also showed that among 70 natural compounds subjected to computational screening, Honokiol (3',5-Di(prop-2-en-1-yl) [1,1'-biphenyl]-2,4'-diol) present in Magnolia showed ideal drug-likeness, pharmacokinetic features and showed effectual binding with mnmC and ppnP (binding energies -7.3 kcal/mol and -6.6 kcal/mol, respectively). The MD simulation and GBSA calculation studies showed that the ligand-protein complexes stabilized throughout tMD simulation. The present study suggests that Honokiol can be used as a potential lead molecule against mnmC and ppnP targets of BCC and this study provides insight into further experimental validation for alternative lead development against drug resistant BCC.
Collapse
Affiliation(s)
- Sinosh Skariyachan
- Department of Microbiology, St. Pius X College Rajapuram, Kasaragod, Kerala, India
| | | | - Akshay Uttarkar
- Department of Biotechnology, RV College of Engineering, Bengaluru, Karnataka, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College of Engineering, Bengaluru, Karnataka, India
| |
Collapse
|
5
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
6
|
Newton MAH, Rahman J, Zaman R, Sattar A. Enhancing Protein Contact Map Prediction Accuracy via Ensembles of Inter-Residue Distance Predictors. Comput Biol Chem 2022; 99:107700. [DOI: 10.1016/j.compbiolchem.2022.107700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/03/2022]
|
7
|
Kong L, Ju F, Zheng WM, Zhu J, Sun S, Xu J, Bu D. ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs. J Comput Biol 2022; 29:92-105. [PMID: 35073170 PMCID: PMC8892980 DOI: 10.1089/cmb.2021.0430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Toyota Technological Institute, Chicago, Illinois, USA
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Wei-mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | | | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, Illinois, USA.,Address correspondence to: Prof. Jinbo Xu, Toyota Technological Institute, Chicago, IL 60637, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,Address correspondence to: Dr. Dongbo Bu, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
8
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
9
|
Ju F, Zhu J, Zhang Q, Wei G, Sun S, Zheng WM, Bu D. Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction. Bioinformatics 2022; 38:990-996. [PMID: 34849579 DOI: 10.1093/bioinformatics/btab777] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 10/22/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate prediction of protein structure relies heavily on exploiting multiple sequence alignment (MSA) for residue mutations and correlations as this information specifies protein tertiary structure. The widely used prediction approaches usually transform MSA into inter-mediate models, say position-specific scoring matrix or profile hidden Markov model. These inter-mediate models, however, cannot fully represent residue mutations and correlations carried by MSA; hence, an effective way to directly exploit MSAs is highly desirable. RESULTS Here, we report a novel sequence set network (called Seq-SetNet) to directly and effectively exploit MSA for protein structure prediction. Seq-SetNet uses an 'encoding and aggregation' strategy that consists of two key elements: (i) an encoding module that takes a component homologue in MSA as input, and encodes residue mutations and correlations into context-specific features for each residue; and (ii) an aggregation module to aggregate the features extracted from all component homologues, which are further transformed into structural properties for residues of the query protein. As Seq-SetNet encodes each homologue protein individually, it could consider both insertions and deletions, as well as long-distance correlations among residues, thus representing more information than the inter-mediate models. Moreover, the encoding module automatically learns effective features and thus avoids manual feature engineering. Using symmetric aggregation functions, Seq-SetNet processes the homologue proteins as a sequence set, making its prediction results invariable to the order of these proteins. On popular benchmark sets, we demonstrated the successful application of Seq-SetNet to predict secondary structure and torsion angles of residues with improved accuracy and efficiency. AVAILABILITY AND IMPLEMENTATION The code and datasets are available through https://github.com/fusong-ju/Seq-SetNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianwei Zhu
- Microsoft Research Asia, Beijing 100080, China
| | - Qi Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guozheng Wei
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, Beijing 100049, China.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Zhongke Big Data Academy, Zhengzhou 450046, Henan, China
| |
Collapse
|
10
|
Rahman J, Newton MAH, Islam MKB, Sattar A. Enhancing protein inter-residue real distance prediction by scrutinising deep learning models. Sci Rep 2022; 12:787. [PMID: 35039537 PMCID: PMC8764118 DOI: 10.1038/s41598-021-04441-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 12/17/2021] [Indexed: 12/29/2022] Open
Abstract
Protein structure prediction (PSP) has achieved significant progress lately via prediction of inter-residue distances using deep learning models and exploitation of the predictions during conformational search. In this context, prediction of large inter-residue distances and also prediction of distances between residues separated largely in the protein sequence remain challenging. To deal with these challenges, state-of-the-art inter-residue distance prediction algorithms have used large sets of coevolutionary and non-coevolutionary features. In this paper, we argue that the more the types of features used, the more the kinds of noises introduced and then the deep learning model has to overcome the noises to improve the accuracy of the predictions. Also, multiple features capturing similar underlying characteristics might not necessarily have significantly better cumulative effect. So we scrutinise the feature space to reduce the types of features to be used, but at the same time, we strive to improve the prediction accuracy. Consequently, for inter-residue real distance prediction, in this paper, we propose a deep learning model named scrutinised distance predictor (SDP), which uses only 2 coevolutionary and 3 non-coevolutionary features. On several sets of benchmark proteins, our proposed SDP method improves mean Local Distance Different Test (LDDT) scores at least by 10% over existing state-of-the-art methods. The SDP program along with its data is available from the website https://gitlab.com/mahnewton/sdp .
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Southport, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia.
| | - Md Khaled Ben Islam
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Southport, Australia
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia
| |
Collapse
|
11
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
12
|
Mabonga L, Masamba P, Kappo AP. Inhibitory potential of a benzoxazole derivative, 4FI against SNRPG∼RING finger domain protein complex as a lead compound in the discovery of anti-cancer drugs: A molecular dynamics simulation approach. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
|
13
|
Olo Ndela E, Enault F, Toussaint A. Transposable Prophages in Leptospira: An Ancient, Now Diverse, Group Predominant in Causative Agents of Weil's Disease. Int J Mol Sci 2021; 22:13434. [PMID: 34948244 PMCID: PMC8705779 DOI: 10.3390/ijms222413434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/06/2021] [Accepted: 12/11/2021] [Indexed: 12/24/2022] Open
Abstract
The virome associated with the corkscrew shaped bacterium Leptospira, responsible for Weil's disease, is scarcely known, and genetic tools available for these bacteria remain limited. To reduce these two issues, potential transposable prophages were searched in Leptospiraceae genomes. The 236 predicted transposable prophages were particularly abundant in the most pathogenic leptospiral clade, being potentially involved in the acquisition of virulent traits. According to genomic similarities and phylogenies, these prophages are distantly related to known transposable phages and are organized into six groups, one of them encompassing prophages with unusual TA-TA ends. Interestingly, structural and transposition proteins reconstruct different relationships between groups, suggesting ancestral recombinations. Based on the baseplate phylogeny, two large clades emerge, with specific gene-contents and high sequence divergence reflecting their ancient origin. Despite their high divergence, the size and overall genomic organization of all prophages are very conserved, a testimony to the highly constrained nature of their genomes. Finally, similarities between these prophages and the three known non-transposable phages infecting L. biflexa, suggest gene transfer between different Caudovirales inside their leptospiral host, and the possibility to use some of the transposable prophages in that model strain.
Collapse
Affiliation(s)
- Eric Olo Ndela
- Laboratoire Microorganismes: Genome Environment (LMGE), Université Clermont Auvergne, CNRS, F-63000 Clermont-Ferrand, France;
| | - François Enault
- Laboratoire Microorganismes: Genome Environment (LMGE), Université Clermont Auvergne, CNRS, F-63000 Clermont-Ferrand, France;
| | - Ariane Toussaint
- Microbiologie Cellulaire et Moléculaire, Université Libre de Bruxelles, IBMM-DBM, 12 Rue des Professeurs Jeneer et Brachet, B-6041 Gosselies, Belgium;
| |
Collapse
|
14
|
Kong L, Ju F, Zhang H, Sun S, Bu D. FALCON2: a web server for high-quality prediction of protein tertiary structures. BMC Bioinformatics 2021; 22:439. [PMID: 34525939 PMCID: PMC8444573 DOI: 10.1186/s12859-021-04353-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/01/2021] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising. RESULTS In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches. CONCLUSIONS By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.
Collapse
Affiliation(s)
- Lupeng Kong
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190 Beijing, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| |
Collapse
|
15
|
Shen T, Wu J, Lan H, Zheng L, Pei J, Wang S, Liu W, Huang J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins 2021; 89:1901-1910. [PMID: 34473376 DOI: 10.1002/prot.26232] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/16/2021] [Accepted: 08/20/2021] [Indexed: 12/29/2022]
Abstract
In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Wei Liu
- Tencent AI Lab, Shenzhen, China
| | | |
Collapse
|
16
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
17
|
Zhang H, Bei Z, Xi W, Hao M, Ju Z, Saravanan KM, Zhang H, Guo N, Wei Y. Evaluation of residue-residue contact prediction methods: From retrospective to prospective. PLoS Comput Biol 2021; 17:e1009027. [PMID: 34029314 PMCID: PMC8177648 DOI: 10.1371/journal.pcbi.1009027] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/04/2021] [Accepted: 04/28/2021] [Indexed: 12/31/2022] Open
Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction.
Collapse
Affiliation(s)
- Huiling Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhendong Bei
- Cloud Computing Department, Alibaba Group, Hangzhou, China
| | - Wenhui Xi
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Konda Mani Saravanan
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haiping Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ning Guo
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- * E-mail:
| |
Collapse
|
18
|
Xu J, Mcpartlon M, Li J. Improved protein structure prediction by deep learning irrespective of co-evolution information. NAT MACH INTELL 2021; 3:601-609. [PMID: 34368623 PMCID: PMC8340610 DOI: 10.1038/s42256-021-00348-5] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Predicting the tertiary structure of a protein from its primary sequence has been greatly improved by integrating deep learning and co-evolutionary analysis, as shown in CASP13 and CASP14. We describe our latest study of this idea, analyzing the efficacy of network size and co-evolution data and its performance on both natural and designed proteins. We show that a large ResNet (convolutional residual neural networks) can predict structures of correct folds for 26 out of 32 CASP13 free-modeling (FM) targets and L/5 long-range contacts with precision over 80%. When co-evolution is not used ResNet still can predict structures of correct folds for 18 CASP13 FM targets, greatly exceeding previous methods that do not use co-evolution either. Even with only primary sequence ResNet can predict structures of correct folds for all tested human-designed proteins. In addition, ResNet may fare better for the designed proteins when trained without co-evolution than with co-evolution. These results suggest that ResNet does not simply denoise co-evolution signals, but instead may learn important protein sequence-structure relationship. This has important implications on protein design and engineering especially when co-evolutionary data is unavailable.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago
| | - Matthew Mcpartlon
- Department of Computer Science, University of Chicago.,Toyota Technological Institute at Chicago
| | - Jin Li
- Department of Computer Science, University of Chicago.,Toyota Technological Institute at Chicago
| |
Collapse
|
19
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
20
|
Li J, Xu J. Study of Real-Valued Distance Prediction for Protein Structure Prediction with Deep Learning. Bioinformatics 2021; 37:3197-3203. [PMID: 33961022 PMCID: PMC8504618 DOI: 10.1093/bioinformatics/btab333] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 03/07/2021] [Accepted: 04/28/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Inter-residue distance prediction by deep ResNet (convolutional residual neural network) has greatly advanced protein structure prediction. Currently the most successful structure prediction methods predict distance by discretizing it into dozens of bins. Here we study how well real-valued distance can be predicted and how useful it is for 3D structure modeling by comparing it with discrete-valued prediction based upon the same deep ResNet. RESULTS Different from the recent methods that predict only a single real value for the distance of an atom pair, we predict both the mean and standard deviation of a distance and then fold a protein by the predicted mean and deviation. Our findings include: 1) tested on the CASP13 FM (free-modeling) targets, our real-valued distance prediction obtains 81% precision on top L/5 long-range contact prediction, much better than the best CASP13 results (70%); 2) our real-valued prediction can predict correct folds for the same number of CASP13 FM targets as the best CASP13 group, despite generating only 20 decoys for each target; 3) our method greatly outperforms a very new real-valued prediction method DeepDist in both contact prediction and 3D structure modeling; and 4) when the same deep ResNet is used, our real-valued distance prediction has 1-6% higher contact and distance accuracy than our own discrete-valued prediction, but less accurate 3D structure models. AVAILABILITY AND IMPLEMENTATION https://github.com/j3xugit/RaptorX-3DModeling. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Li
- Toyota Technological Institute at Chicago, USA.,Department of Computer Science, University of Chicago, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| |
Collapse
|
21
|
Ju F, Zhu J, Shao B, Kong L, Liu TY, Zheng WM, Bu D. CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nat Commun 2021; 12:2535. [PMID: 33953201 PMCID: PMC8100175 DOI: 10.1038/s41467-021-22869-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 03/28/2021] [Indexed: 11/29/2022] Open
Abstract
Residue co-evolution has become the primary principle for estimating inter-residue distances of a protein, which are crucially important for predicting protein structure. Most existing approaches adopt an indirect strategy, i.e., inferring residue co-evolution based on some hand-crafted features, say, a covariance matrix, calculated from multiple sequence alignment (MSA) of target protein. This indirect strategy, however, cannot fully exploit the information carried by MSA. Here, we report an end-to-end deep neural network, CopulaNet, to estimate residue co-evolution directly from MSA. The key elements of CopulaNet include: (i) an encoder to model context-specific mutation for each residue; (ii) an aggregator to model residue co-evolution, and thereafter estimate inter-residue distances. Using CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrate that CopulaNet can predict protein structure with improved accuracy and efficiency. This study represents a step toward improved end-to-end prediction of inter-residue distances and protein tertiary structures.
Collapse
Affiliation(s)
- Fusong Ju
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Bin Shao
- Microsoft Research Asia, Beijing, China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, Beijing, China
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
22
|
Wu F, Xu J. Deep template-based protein structure prediction. PLoS Comput Biol 2021; 17:e1008954. [PMID: 33939695 PMCID: PMC8118551 DOI: 10.1371/journal.pcbi.1008954] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 05/13/2021] [Accepted: 04/11/2021] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. RESULTS This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.
Collapse
Affiliation(s)
- Fandi Wu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, United States of America
| |
Collapse
|
23
|
The Protective A673T Mutation of Amyloid Precursor Protein (APP) in Alzheimer's Disease. Mol Neurobiol 2021; 58:4038-4050. [PMID: 33914267 DOI: 10.1007/s12035-021-02385-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 04/05/2021] [Indexed: 10/21/2022]
Abstract
Alzheimer's disease is a progressive neurodegenerative disorder characterized by extracellular amyloid beta peptides and neurofibrillary tangles consisted of intracellular hyperphosphorylated Tau in the hippocampus and cerebral cortex. Most of the mutations in key genes that code for amyloid precursor protein can lead to significant accumulation of these peptides in the brain and cause Alzheimer's disease. Moreover, some point mutations in amyloid precursor protein can cause familial Alzheimer's disease, such as Swedish mutation (KM670/671NL) and A673V mutation. However, recent studies have found that the A673T mutation in amyloid precursor protein gene can protect against Alzheimer's disease, even if it is located next to the Swedish mutation (KM670/671NL) and at the same site as A673V mutation, which are pathogenic. It makes us curious about the protective A673T mutation. Here, we summarize the most recent insights of A673T mutation, focus on their roles in protective mechanisms against Alzheimer's disease, and discuss their involvement in future treatment.
Collapse
|
24
|
Zhang H, Shen Y. Template-based prediction of protein structure with deep learning. BMC Genomics 2020; 21:878. [PMID: 33372607 PMCID: PMC7771081 DOI: 10.1186/s12864-020-07249-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 11/18/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. RESULTS We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13's TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. CONCLUSIONS These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA.
- Program in Mathematical Genomics, Columbia University, New York, NY, USA.
| |
Collapse
|
25
|
Jing X, Xu J. Improved Protein Model Quality Assessment By Integrating Sequential And Pairwise Features Using Deep Learning. Bioinformatics 2020; 36:5361-5367. [PMID: 33325480 PMCID: PMC8016469 DOI: 10.1093/bioinformatics/btaa1037] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/27/2020] [Accepted: 12/06/2020] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Accurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection, but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but the accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets. RESULTS We propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1 D and 2 D convolutional residual neural networks (ResNet). The 2 D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information, and predicted distance potential from sequences. The 1 D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2 D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2 D ResNet module and pairwise features play an important role in improving model quality assessment. AVAILABILITY https://github.com/AndersJing/ResNetQA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
26
|
Du Z, Pan S, Wu Q, Peng Z, Yang J. CATHER: a novel threading algorithm with predicted contacts. Bioinformatics 2020; 36:2119-2125. [PMID: 31790141 DOI: 10.1093/bioinformatics/btz876] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 10/31/2019] [Accepted: 11/28/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. RESULTS We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/CATHER/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zongyang Du
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Shuo Pan
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
27
|
Fontenot CR, Tasnim H, Valdes KA, Popescu CV, Ding H. Ferric uptake regulator (Fur) reversibly binds a [2Fe-2S] cluster to sense intracellular iron homeostasis in Escherichia coli. J Biol Chem 2020; 295:15454-15463. [PMID: 32928958 DOI: 10.1074/jbc.ra120.014814] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/04/2020] [Indexed: 12/19/2022] Open
Abstract
The ferric uptake regulator (Fur) is a global transcription factor that regulates intracellular iron homeostasis in bacteria. The current hypothesis states that when the intracellular "free" iron concentration is elevated, Fur binds ferrous iron, and the iron-bound Fur represses the genes encoding for iron uptake systems and stimulates the genes encoding for iron storage proteins. However, the "iron-bound" Fur has never been isolated from any bacteria. Here we report that the Escherichia coli Fur has a bright red color when expressed in E. coli mutant cells containing an elevated intracellular free iron content because of deletion of the iron-sulfur cluster assembly proteins IscA and SufA. The acid-labile iron and sulfide content analyses in conjunction with the EPR and Mössbauer spectroscopy measurements and the site-directed mutagenesis studies show that the red Fur protein binds a [2Fe-2S] cluster via conserved cysteine residues. The occupancy of the [2Fe-2S] cluster in Fur protein is ∼31% in the E. coli iscA/sufA mutant cells and is decreased to ∼4% in WT E. coli cells. Depletion of the intracellular free iron content using the membrane-permeable iron chelator 2,2´-dipyridyl effectively removes the [2Fe-2S] cluster from Fur in E. coli cells, suggesting that Fur senses the intracellular free iron content via reversible binding of a [2Fe-2S] cluster. The binding of the [2Fe-2S] cluster in Fur appears to be highly conserved, because the Fur homolog from Hemophilus influenzae expressed in E. coli cells also reversibly binds a [2Fe-2S] cluster to sense intracellular iron homeostasis.
Collapse
Affiliation(s)
- Chelsey R Fontenot
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Homyra Tasnim
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Kathryn A Valdes
- Department of Chemistry, University of St. Thomas, St. Paul, Minnesota, USA
| | - Codrina V Popescu
- Department of Chemistry, University of St. Thomas, St. Paul, Minnesota, USA
| | - Huangen Ding
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA.
| |
Collapse
|
28
|
Runthala A, Chowdhury S. Refined template selection and combination algorithm significantly improves template-based modeling accuracy. J Bioinform Comput Biol 2020; 17:1950006. [PMID: 31057073 DOI: 10.1142/s0219720019500069] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.
Collapse
Affiliation(s)
- Ashish Runthala
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| | - Shibasish Chowdhury
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| |
Collapse
|
29
|
AlQuraishi M. AlphaFold at CASP13. Bioinformatics 2020; 35:4862-4865. [PMID: 31116374 DOI: 10.1093/bioinformatics/btz422] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 03/26/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.,Lab of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
30
|
Liu ZL, Hu JH, Jiang F, Wu YD. CRiSP: accurate structure prediction of disulfide-rich peptides with cystine-specific sequence alignment and machine learning. Bioinformatics 2020; 36:3385-3392. [PMID: 32215567 DOI: 10.1093/bioinformatics/btaa193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 02/06/2020] [Accepted: 03/22/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION High-throughput sequencing discovers many naturally occurring disulfide-rich peptides or cystine-rich peptides (CRPs) with diversified bioactivities. However, their structure information, which is very important to peptide drug discovery, is still very limited. RESULTS We have developed a CRP-specific structure prediction method called Cystine-Rich peptide Structure Prediction (CRiSP), based on a customized template database with cystine-specific sequence alignment and three machine-learning predictors. The modeling accuracy is significantly better than several popular general-purpose structure modeling methods, and our CRiSP can provide useful model quality estimations. AVAILABILITY AND IMPLEMENTATION The CRiSP server is freely available on the website at http://wulab.com.cn/CRISP. CONTACT wuyd@pkusz.edu.cn or jiangfan@pku.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zi-Lin Liu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Jing-Hao Hu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Fan Jiang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,NanoAI Biotech Co., Ltd, Shenzhen 518118, China
| | - Yun-Dong Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
31
|
Zheng W, Zhang C, Wuyun Q, Pearce R, Li Y, Zhang Y. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res 2020; 47:W429-W436. [PMID: 31081035 PMCID: PMC6602514 DOI: 10.1093/nar/gkz384] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 04/19/2019] [Accepted: 04/30/2019] [Indexed: 12/13/2022] Open
Abstract
The LOMETS2 server (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is an online meta-threading server system for template-based protein structure prediction. Although the server has been widely used by the community over the last decade, the previous LOMETS server no longer represents the state-of-the-art due to aging of the algorithms and unsatisfactory performance on distant-homology template identification. An extension of the server built on cutting-edge methods, especially techniques developed since the recent CASP experiments, is urgently needed. In this work, we report the recent advancements of the LOMETS2 server, which comprise a number of major new developments, including (i) new state-of-the-art threading programs, including contact-map-based threading approaches, (ii) deep sequence search-based sequence profile construction and (iii) a new web interface design that incorporates structure-based function annotations. Large-scale benchmark tests demonstrated that the integration of the deep profiles and new threading approaches into LOMETS2 significantly improve its structure modeling quality and template detection, where LOMETS2 detected 176% more templates with TM-scores >0.5 than the previous LOMETS server for Hard targets that lacked homologous templates. Meanwhile, the newly incorporated structure-based function prediction helps extend the usefulness of the online server to the broader biological community.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI 48824, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
32
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
33
|
Dong D, Zhu Y, Aili Z, Chen Z, Ding J. Bioinformatics analysis of HPV-68 E6 and E7 oncoproteins for designing a therapeutic epitope vaccine against HPV infection. INFECTION GENETICS AND EVOLUTION 2020; 81:104266. [PMID: 32114254 DOI: 10.1016/j.meegid.2020.104266] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 10/24/2022]
Abstract
The incidence and mortality of cervical cancer, which mainly results from the infection of human papillomavirus (HPV) is significantly increasing in Xinjiang. According to the previous research, the incidence of HPV-68 in cervical cancer patients in Xinjiang is significantly higher than in other parts of China. HPV E6 and E7 oncoproteins play a crucial role in cervical cancer, and can be used as ideal targets for therapeutic vaccines. Therefore, we analyzed and identified the possible T-cell and B-cell dominant epitopes and various aspects of HPV-68 E6 and E7 oncoproteins, including the physicochemical properties, secondary and tertiary structures using a bioinformatic approach, which provided a basis for designing an effective HPV infection therapeutic vaccine. The results showed that E6 oncoproteins was an unstable and hydrophilic protein, while E7 oncoproteins was unstable and hydrophilic protein. The secondary structure of the E6 oncoproteins consisted of 45.57% alpha helixes, 14.56% extended strands, 4.43% beta turns and 35.44% random coils. The secondary structure of E7 oncoproteins consisted of 35.45% alpha helixes, 17.27% extended strands, 0.91% beta turns and 46.36% random coils. Moreover, our results identified 5 dominant T-cell epitopes and 6 dominant B-cell epitopes in the E6 oncoproteins structure and 5 dominant T-cell epitopes and 3 dominant B-cell epitopes in E7 oncoproteins. In conclusion, this study provides comprehensive biological information about the HPV-68 E6 and E7 oncoproteins, which will lay a theoretical foundation for multi-epitope vaccines against HPV infection.
Collapse
Affiliation(s)
- Di Dong
- Department of Gynecology, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang 830054, China
| | - Yuejie Zhu
- Center of Reproductive Medicine, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang 830054, China
| | - Zufeiya Aili
- Department of Gynecology, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang 830054, China
| | - Zhifang Chen
- Department of Gynecology, the First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang 830054, China.
| | - Jianbing Ding
- Department of Immunology, Xinjiang Medical University, Urumqi, Xinjiang 830011, China.
| |
Collapse
|
34
|
Bhattacharya S, Bhattacharya D. Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci Rep 2020; 10:2908. [PMID: 32076047 PMCID: PMC7031282 DOI: 10.1038/s41598-020-59834-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/04/2020] [Indexed: 12/02/2022] Open
Abstract
The development of improved threading algorithms for remote homology modeling is a critical step forward in template-based protein structure prediction. We have recently demonstrated the utility of contact information to boost protein threading by developing a new contact-assisted threading method. However, the nature and extent to which the quality of a predicted contact map impacts the performance of contact-assisted threading remains elusive. Here, we systematically analyze and explore this interdependence by employing our newly-developed contact-assisted threading method over a large-scale benchmark dataset using predicted contact maps from four complementary methods including direct coupling analysis (mfDCA), sparse inverse covariance estimation (PSICOV), classical neural network-based meta approach (MetaPSICOV), and state-of-the-art ultra-deep learning model (RaptorX). Experimental results demonstrate that contact-assisted threading using high-quality contacts having the Matthews Correlation Coefficient (MCC) ≥ 0.5 improves threading performance in nearly 30% cases, while low-quality contacts with MCC <0.35 degrades the performance for 50% cases. This holds true even in CASP13 dataset, where threading using high-quality contacts (MCC ≥ 0.5) significantly improves the performance of 22 instances out of 29. Collectively, our study uncovers the mutual association between the quality of predicted contacts and its possible utility in boosting threading performance for improving low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA.
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA.
| |
Collapse
|
35
|
Skariyachan S, Gopal D, Kadam SP, Muddebihalkar AG, Uttarkar A, Niranjan V. Carbon fullerene acts as potential lead molecule against prospective molecular targets of biofilm-producing multidrug-resistant Acinetobacter baumanni and Pseudomonas aerugenosa: computational modeling and MD simulation studies. J Biomol Struct Dyn 2020; 39:1121-1137. [PMID: 32036742 DOI: 10.1080/07391102.2020.1726821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
This study aimed to screen putative drug targets associated with biofilm formation of multidrug-resistant Acinetobacter baumannii and Pseudomonas areugenosa and prioritize carbon nano-fullerene as potential lead molecule by structure-based virtual screening. Based on the functional role, 36 and 83 genes that are involved in biofilm formation of A. baumannii and P. areugenosa respectively were selected and metabolic network was computationally constructed. The genes that lack three-dimensional structures were predicted and validated. Carbon nano-fullerene selected as lead molecule and their drug-likeliness and pharmacokinetics properties were computationally predicted. The binding potential of carbon nano-fullerene toward selected drug targets was modeled and compared with the binding of conventional drugs, doripenem, and polymyxin-B with their usual targets. The stabilities of four best-docked complexes were confirmed by molecular dynamic (MD) simulation. This study suggested that selected genes demonstrated relevant interactions in the constructed metabolic pathways. Carbon fullerene exhibited significant binding abilities to most of the prioritized targets in comparison with the binding of last-resort antibiotics and their usual target. The four best ligand-receptor interactions predicted by molecular docking revealed that stability throughout MD simulation. Notably, carbon fullerene exhibited profound binding with outer membrane protein (OmpA) and ribonuclease-HII (rnhB) of A. baumannii and 2-heptyl-4(1H)-quinolone synthase (pqsBC) and chemotaxis protein (wspA) of P. aeruginosa. Thus, the current study suggested that carbon fullerene was probably used as potential lead molecules toward selected targets of A. baumannii and P. aeruginosa and the applied aspects probably scaled up to design promising lead molecules toward these pathogens. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sinosh Skariyachan
- Department of Microbiology, St. Pius X College, Rajapuram, Kasaragod, India
| | - Dharshini Gopal
- Department of Biotechnology, Dayananda Sagar College of Engineering, Bengaluru, India
| | - Sanjana Pratab Kadam
- Department of Biotechnology, Dayananda Sagar College of Engineering, Bengaluru, India
| | - Aditi G Muddebihalkar
- Department of Biotechnology, Dayananda Sagar College of Engineering, Bengaluru, India.,Department of Biotechnology, RV College of Engineering, Bengaluru, India
| | - Akshay Uttarkar
- Department of Biotechnology, RV College of Engineering, Bengaluru, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College of Engineering, Bengaluru, India
| |
Collapse
|
36
|
Karczyńska AS, Ziȩba K, Uciechowska U, Mozolewska MA, Krupa P, Lubecka EA, Lipska AG, Sikorska C, Samsonov SA, Sieradzan AK, Giełdoń A, Liwo A, Ślusarz R, Ślusarz M, Lee J, Joo K, Czaplewski C. Improved Consensus-Fragment Selection in Template-Assisted Prediction of Protein Structures with the UNRES Force Field in CASP13. J Chem Inf Model 2020; 60:1844-1864. [PMID: 31999919 PMCID: PMC7588044 DOI: 10.1021/acs.jcim.9b00864] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The method for protein-structure
prediction, which combines the
physics-based coarse-grained UNRES force field with knowledge-based
modeling, has been developed further and tested in the 13th Community
Wide Experiment on the Critical Assessment of Techniques for Protein
Structure Prediction (CASP13). The method implements restraints from
the consensus fragments common to server models. In this work, the
server models to derive fragments have been chosen on the basis of
quality assessment; a fully automatic fragment-selection procedure
has been introduced, and Dynamic Fragment Assembly pseudopotentials
have been fully implemented. The Global Distance Test Score (GDT_TS),
averaged over our “Model 1” predictions, increased by
over 10 units with respect to CASP12 for the free-modeling category
to reach 40.82. Our “Model 1” predictions ranked 20
and 14 for all and free-modeling targets, respectively (upper 20.2%
and 14.3% of all models submitted to CASP13 in these categories, respectively),
compared to 27 (upper 21.1%) and 24 (upper 18.9%) in CASP12, respectively.
For oligomeric targets, the Interface Patch Similarity (IPS) and Interface
Contact Similarity (ICS) averaged over our best oligomer models increased
from 0.28 to 0.36 and from 12.4 to 17.8, respectively, from CASP12
to CASP13, and top-ranking models of 2 targets (H0968 and T0997o)
were obtained (none in CASP12). The improvement of our method in CASP13
over CASP12 was ascribed to the combined effect of the overall enhancement
of server-model quality, our success in selecting server models and
fragments to derive restraints, and improvements of the restraint
and potential-energy functions.
Collapse
Affiliation(s)
| | - Karolina Ziȩba
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Urszula Uciechowska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena A Mozolewska
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw PL-02668, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw PL-02668, Poland
| | - Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Sergey A Samsonov
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
37
|
Shi Q, Chen W, Huang S, Wang Y, Xue Z. Deep learning for mining protein data. Brief Bioinform 2019; 22:194-218. [PMID: 31867611 DOI: 10.1093/bib/bbz156] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/21/2019] [Accepted: 11/07/2019] [Indexed: 01/16/2023] Open
Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining
| | - Weiya Chen
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization
| | - Siqi Huang
- Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining
| | - Yan Wang
- School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing
| |
Collapse
|
38
|
Baquero Forero A, Cvrčková F. SH3Ps-Evolution and Diversity of a Family of Proteins Engaged in Plant Cytokinesis. Int J Mol Sci 2019; 20:ijms20225623. [PMID: 31717902 PMCID: PMC6888108 DOI: 10.3390/ijms20225623] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/04/2019] [Accepted: 11/06/2019] [Indexed: 01/02/2023] Open
Abstract
SH3P2 (At4g34660), an Arabidopsis thaliana SH3 and Bin/amphiphysin/Rvs (BAR) domain-containing protein, was reported to have a specific role in cell plate assembly, unlike its paralogs SH3P1 (At1g31440) and SH3P3 (At4g18060). SH3P family members were also predicted to interact with formins—evolutionarily conserved actin nucleators that participate in microtubule organization and in membrane–cytoskeleton interactions. To trace the origin of functional specialization of plant SH3Ps, we performed phylogenetic analysis of SH3P sequences from selected plant lineages. SH3Ps are present in charophytes, liverworts, mosses, lycophytes, gymnosperms, and angiosperms, but not in volvocal algae, suggesting association of these proteins with phragmoplast-, but not phycoplast-based cell division. Separation of three SH3P clades, represented by SH3P1, SH3P2, and SH3P3 of A. thaliana, appears to be a seed plant synapomorphy. In the yeast two hybrid system, Arabidopsis SH3P3, but not SH3P2, binds the FH1 and FH2 domains of the formin FH5 (At5g54650), known to participate in cytokinesis, while an opposite binding specificity was found for the dynamin homolog DRP1A (At5g42080), confirming earlier findings. This suggests that the cytokinetic role of SH3P2 is not due to its interaction with FH5. Possible determinants of interaction specificity of SH3P2 and SH3P3 were identified bioinformatically.
Collapse
|
39
|
Gopal D, Muddebihalkar AG, Skariyachan S, C AU, Kaveramma P, Praveen U, Shankar RR, Venkatesan T, Niranjan V. Mitogen activated protein kinase-1 and cell division control protein-42 are putative targets for the binding of novel natural lead molecules: a therapeutic intervention against Candida albicans. J Biomol Struct Dyn 2019; 38:4584-4599. [PMID: 31625462 DOI: 10.1080/07391102.2019.1682053] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Candida albicans, fungal yeast causes several lethal infections in immune-suppressed patients and recently emerged as drug-resistant pathogens worldwide. The present study aimed to screen putative drug targets of Candia albicans and to study the binding potential of novel natural lead compounds towards these targets by computational virtual screening and molecular dynamic (MD) simulation. Through extensive analysis of mitogen-activated protein kinase (MAPK) signalling pathways, mitogen-activated protein kinase-1 (HOG1) and cell division control protein-42 (CDC42) genes were prioritized as putative targets based on their virulent functions. The three-dimensional structures of these genes, not available in their native forms, were computationally modeled and validated. 76 lead molecules from various natural sources were screened and their drug likeliness and pharmacokinetic features were predicted. Among these ligands, two lead molecules that demonstrated ideal drug-likeliness and pharmacokinetic features were docked against HOG1 and CDC42 and their binding potential was compared with the binding of conventional drug Fluconazole with their usual target. The prediction was computationally validated by MD simulation. The current study revealed that Cudraxanthone-S present in Cudrania cochinchinensis and Scutifoliamide-B present in Piper scutifolium exhibited ideal drug likeliness, pharmacokinetics and binding potential to the prioritized targets in comparison with the binding of Fluconazole and their usual target. MD simulation showed that CDC42-Cudraxanthone-S and HOG1-Scutifoliamide-B complexes were exhibited stability throughout MD simulation. Thus, the study provides significant insight into employing HOG1 and CDC42 of MAPK as putative drug targets of C. albicans and Cudraxanthone-S and Scutifoliamide-B as potential inhibitors for drug discovery.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Dharshini Gopal
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Aditi G Muddebihalkar
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India.,Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Sinosh Skariyachan
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Akshay Uttarkar C
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Prinith Kaveramma
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Ulluvangada Praveen
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Roshini Ravi Shankar
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Tejaswini Venkatesan
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Bengaluru, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| |
Collapse
|
40
|
Haas J, Gumienny R, Barbato A, Ackermann F, Tauriello G, Bertoni M, Studer G, Smolinski A, Schwede T. Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO). Proteins 2019; 87:1378-1387. [PMID: 31571280 DOI: 10.1002/prot.25815] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 09/10/2019] [Accepted: 09/13/2019] [Indexed: 12/17/2022]
Abstract
Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.
Collapse
Affiliation(s)
- Juergen Haas
- Computational Structural Biology, University of Basel, Switzerland
| | - Rafal Gumienny
- Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland
| | - Alessandro Barbato
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Flavio Ackermann
- Computational Structural Biology, University of Basel, Switzerland
| | | | - Martino Bertoni
- Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland
| | - Gabriel Studer
- Computational Structural Biology, University of Basel, Switzerland
| | - Anna Smolinski
- Computational Structural Biology, University of Basel, Switzerland
| | - Torsten Schwede
- Computational Structural Biology, University of Basel, Switzerland
| |
Collapse
|
41
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
42
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
43
|
Hu B, Zheng X, Wang Y, Wang J, Zhang F. Computational Approaches for Elucidating Protein-Protein Interactions in Cation Channel Signaling. Curr Drug Targets 2019; 21:179-192. [PMID: 31490747 DOI: 10.2174/1389450120666190906154412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 08/10/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The lipid bilayer of the plasma membrane is impermeable to ions, yet changes in the flux of ions across the cell membrane are critical regulatory events in cells. Because of their regulatory roles in a range of physiological processes, such as electrical signaling in muscles and neurons, to name a few, these proteins are one of the most important drug targets. OBJECTIVE This review mainly focused on the computational approaches for elucidating proteinprotein interactions in cation channel signaling. DISCUSSION Due to continuously advanced facilities and technologies in computer sciences, the physical contacts of macromolecules of channel structures have been virtually visualized. Indeed, techniques like protein-protein docking, homology modeling, and molecular dynamics simulation are valuable tools for predicting the protein complex and refining channels with unreleased structures. Undoubtedly, these approaches will greatly expand the cation channel signaling research, thereby speeding up structure-based drug design and discovery. CONCLUSION We introduced a series of valuable computational tools for elucidating protein-protein interactions in cation channel signaling, including molecular graphics, protein-protein docking, homology modeling, and molecular dynamics simulation.
Collapse
Affiliation(s)
- Baichun Hu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, China.,School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang, China
| | - Xiaoming Zheng
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, China.,School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang, China
| | - Ying Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, China.,School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang, China.,Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang, Liaoning, 110016, China.,School of Traditional Chinese Materia Medica, Shenyang Pharmaceutical University, Shenyang, 110016, China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, China.,School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang, China
| | - Fengjiao Zhang
- Wuya College of Innovation, Shenyang Pharmaceutical University, Shenyang, Liaoning, 110016, China
| |
Collapse
|
44
|
Zhao X, Zhang F, Li Z, Wang H, An M, Li Y, Pang N, Ding J. Bioinformatics analysis of EgA31 and EgG1Y162 proteins for designing a multi-epitope vaccine against Echinococcus granulosus. INFECTION GENETICS AND EVOLUTION 2019; 73:98-108. [PMID: 31022474 DOI: 10.1016/j.meegid.2019.04.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 04/17/2019] [Accepted: 04/19/2019] [Indexed: 11/15/2022]
Affiliation(s)
- Xiao Zhao
- State Key Laboratory of Pathogenesis, Prevention, Treatment of Central Asian High Incidence Diseases, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China; College of Basic Medicine of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Fengbo Zhang
- State Key Laboratory of Pathogenesis, Prevention, Treatment of Central Asian High Incidence Diseases, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China; Department of Clinical Laboratory, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Zhiwei Li
- College of Basic Medicine of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Hongying Wang
- College of Basic Medicine of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Mengting An
- College of Basic Medicine of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Yujiao Li
- Department of Clinical Laboratory, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Nannan Pang
- Department of Clinical Laboratory, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China
| | - Jianbing Ding
- State Key Laboratory of Pathogenesis, Prevention, Treatment of Central Asian High Incidence Diseases, the First Affiliated Hospital of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China; College of Basic Medicine of Xinjiang Medical University, Urumqi 830011, Xinjiang, PR China.
| |
Collapse
|
45
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
46
|
Kandathil SM, Greener JG, Jones DT. Prediction of interresidue contacts with DeepMetaPSICOV in CASP13. Proteins 2019; 87:1092-1099. [PMID: 31298436 PMCID: PMC6899903 DOI: 10.1002/prot.25779] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 06/25/2019] [Accepted: 07/06/2019] [Indexed: 12/28/2022]
Abstract
In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning‐based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
47
|
Wang C, Wei Y, Zhang H, Kong L, Sun S, Zheng WM, Bu D. Constructing effective energy functions for protein structure prediction through broadening attraction-basin and reverse Monte Carlo sampling. BMC Bioinformatics 2019; 20:135. [PMID: 30925867 PMCID: PMC6439974 DOI: 10.1186/s12859-019-2652-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ab initio approaches to protein structure prediction usually employ the Monte Carlo technique to search the structural conformation that has the lowest energy. However, the widely-used energy functions are usually ineffective for conformation search. How to construct an effective energy function remains a challenging task. RESULTS Here, we present a framework to construct effective energy functions for protein structure prediction. Unlike existing energy functions only requiring the native structure to be the lowest one, we attempt to maximize the attraction-basin where the native structure lies in the energy landscape. The underlying rationale is that each energy function determines a specific energy landscape together with a native attraction-basin, and the larger the attraction-basin is, the more likely for the Monte Carlo search procedure to find the native structure. Following this rationale, we constructed effective energy functions as follows: i) To explore the native attraction-basin determined by a certain energy function, we performed reverse Monte Carlo sampling starting from the native structure, identifying the structural conformations on the edge of attraction-basin. ii) To broaden the native attraction-basin, we smoothened the edge points of attraction-basin through tuning weights of energy terms, thus acquiring an improved energy function. Our framework alternates the broadening attraction-basin and reverse sampling steps (thus called BARS) until the native attraction-basin is sufficiently large. We present extensive experimental results to show that using the BARS framework, the constructed energy functions could greatly facilitate protein structure prediction in improving the quality of predicted structures and speeding up conformation search. CONCLUSION Using the BARS framework, we constructed effective energy functions for protein structure prediction, which could improve the quality of predicted structures and speed up conformation search as well.
Collapse
Affiliation(s)
- Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Yi Wei
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
- Institute of Theoretical Physics, Chinese Academy of Sciences, 55, Zhongguancun East Road, Beijing, 100190 China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190 China
- University of Chinese Academy of Sciences, 19-1, Yuquan Road, Shijingshan, Beijing, 100049 China
| |
Collapse
|
48
|
Bhattacharya S, Bhattacharya D. Does inclusion of residue-residue contact information boost protein threading? Proteins 2019; 87:596-606. [PMID: 30882932 DOI: 10.1002/prot.25684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/20/2019] [Accepted: 03/13/2019] [Indexed: 12/26/2022]
Abstract
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| |
Collapse
|