1
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Han Y, Lu Y, Yan X, Cui H, Cheng S, Zheng J, Zhou Y, Wang S, Li Z. Atom-ProteinQA: Atom-level protein model quality assessment through fine-grained joint learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 249:108078. [PMID: 38537495 DOI: 10.1016/j.cmpb.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/26/2023] [Accepted: 02/10/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION Protein model quality assessment (ProteinQA) is a fundamental task that is essential for biologically relevant applications, i.e., protein structure refinement, protein design, etc. Previous works aimed to conduct ProteinQA only on the global structure or per-residue level, ignoring potentially usable and precise cues from a fine-grained per-atom perspective. In this study, we propose an atom-level ProteinQA model, named Atom-ProteinQA, in which two innovative modules are designed to extract geometric and topological atom-level relationships respectively. Specifically, on the one hand, a geometric perception module exploits 3D sparse convolution to capture the geometric features of the input protein, generating fine-grained atom-level predictions. On the other hand, natural chemical bonds are utilized to construct an atom-level graph, then message passing from a topological perception module is applied to output residue-level predictions in parallel. Eventually, through a cross-model aggregation module, features from different modules mutually interact, enhancing performance on both the atom and residue levels. RESULTS Extensive experiments show that our proposed Atom-ProteinQA outperforms previous methods by a large margin, regardless of residue-level or atom-level assessment. Concretely, we achieved state-of-the-art performance on CATH-2084, Decoy-8000, public benchmarks CASP13 & CASP14, and the CAMEO. AVAILABILITY The repository of this project is released on: https://github.com/luyfcandy/Atom_ProteinQA.
Collapse
Affiliation(s)
- Yatong Han
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yingfeng Lu
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Xu Yan
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Hannah Cui
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | | | - Jiayou Zheng
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Yuzhe Zhou
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China.
| | - Zhen Li
- Future Network of Intelligence Institute, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China; School of Science and Engineering, the Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, China.
| |
Collapse
|
3
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
4
|
Wang H, Liu D, Zhao K, Wang Y, Zhang G. SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition. Brief Bioinform 2024; 25:bbae146. [PMID: 38600663 PMCID: PMC11006797 DOI: 10.1093/bib/bbae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 03/02/2024] [Accepted: 03/15/2024] [Indexed: 04/12/2024] Open
Abstract
Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.
Collapse
Affiliation(s)
| | | | | | - Yajun Wang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| | - Guijun Zhang
- Corresponding authors. Guijun Zhang, College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: ; Yajun Wang, College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou 310014, China. E-mail:
| |
Collapse
|
5
|
Parra-Perez AM, Gallego-Martinez A, Lopez-Escamez JA. An overload of missense variants in the OTOG gene may drive a higher prevalence of familial Meniere disease in the European population. Hum Genet 2024; 143:423-435. [PMID: 38519595 PMCID: PMC11043142 DOI: 10.1007/s00439-024-02643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/14/2024] [Indexed: 03/25/2024]
Abstract
Meniere disease is a complex inner ear disorder with significant familial aggregation. A differential prevalence of familial MD (FMD) has been reported, being 9-10% in Europeans compared to 6% in East Asians. A broad genetic heterogeneity in FMD has been described, OTOG being the most common mutated gene, with a compound heterozygous recessive inheritance. We hypothesize that an OTOG-related founder effect may explain the higher prevalence of FMD in the European population. Therefore, the present study aimed to compare the allele frequency (AF) and distribution of OTOG rare variants across different populations. For this purpose, the coding regions with high constraint (low density of rare variants) were retrieved in the OTOG coding sequence in Non-Finnish European (NFE).. Missense variants (AF < 0.01) were selected from a 100 FMD patient cohort, and their population AF was annotated using gnomAD v2.1. A linkage analysis was performed, and odds ratios were calculated to compare AF between NFE and other populations. Thirteen rare missense variants were observed in 13 FMD patients, with 2 variants (rs61978648 and rs61736002) shared by 5 individuals and another variant (rs117315845) shared by two individuals. The results confirm the observed enrichment of OTOG rare missense variants in FMD. Furthermore, eight variants were enriched in the NFE population, and six of them were in constrained regions. Structural modeling predicts five missense variants that could alter the otogelin stability. We conclude that several variants reported in FMD are in constraint regions, and they may have a founder effect and explain the burden of FMD in the European population.
Collapse
Affiliation(s)
- Alberto M Parra-Perez
- Division of Otolaryngology, Department of Surgery, Instituto de Investigación Biosanitaria, Ibs.GRANADA, Universidad de Granada, Granada, Spain
- Sensorineural Pathology Programme, Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER),, Madrid, Spain
- Faculty of Medicine and Health, School of Medical Sciences, Meniere's Disease Neuroscience Research Program, The Kolling Institute, The University of Sydney, 10 Westbourne St, Sydney, NSW, Australia
| | - Alvaro Gallego-Martinez
- Division of Otolaryngology, Department of Surgery, Instituto de Investigación Biosanitaria, Ibs.GRANADA, Universidad de Granada, Granada, Spain
- Sensorineural Pathology Programme, Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER),, Madrid, Spain
- Faculty of Medicine and Health, School of Medical Sciences, Meniere's Disease Neuroscience Research Program, The Kolling Institute, The University of Sydney, 10 Westbourne St, Sydney, NSW, Australia
| | - Jose A Lopez-Escamez
- Division of Otolaryngology, Department of Surgery, Instituto de Investigación Biosanitaria, Ibs.GRANADA, Universidad de Granada, Granada, Spain.
- Sensorineural Pathology Programme, Centro de Investigación Biomédica en Red en Enfermedades Raras (CIBERER),, Madrid, Spain.
- Faculty of Medicine and Health, School of Medical Sciences, Meniere's Disease Neuroscience Research Program, The Kolling Institute, The University of Sydney, 10 Westbourne St, Sydney, NSW, Australia.
| |
Collapse
|
6
|
Zhang Z, Cai Y, Zhang B, Zheng W, Freddolino L, Zhang G, Zhou X. DEMO-EM2: assembling protein complex structures from cryo-EM maps through intertwined chain and domain fitting. Brief Bioinform 2024; 25:bbae113. [PMID: 38517699 PMCID: PMC10959074 DOI: 10.1093/bib/bbae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/10/2024] [Accepted: 02/25/2024] [Indexed: 03/24/2024] Open
Abstract
The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.
Collapse
Affiliation(s)
- Ziying Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yaxian Cai
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Peng Z, Wang W, Wei H, Li X, Yang J. Improved protein structure prediction with trRosettaX2, AlphaFold2, and optimized MSAs in CASP15. Proteins 2023; 91:1704-1711. [PMID: 37565699 DOI: 10.1002/prot.26570] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/17/2023] [Accepted: 07/31/2023] [Indexed: 08/12/2023]
Abstract
We present the monomer and multimer structure prediction results of our methods in CASP15. We first designed an elaborate pipeline that leverages complementary sequence databases and advanced database searching algorithms to generate high-quality multiple sequence alignments (MSAs). Top MSAs were then selected for the subsequent step of structure prediction. We utilized trRosettaX2 and AlphaFold2 for monomer structure prediction (group name Yang-Server), and AlphaFold-Multimer for multimer structure prediction (group name Yang-Multimer). Yang-Server and Yang-Multimer are ranked at the top and the fourth, respectively, for monomer and multimer structure prediction. For 94 monomers, the average TM-score of the predicted structure models by Yang-Server is 0.876, compared to 0.798 by the default AlphaFold2 (i.e., the group NBIS-AF2-standard). For 42 multimers, the average DockQ score of the predicted structure models by Yang-Multimer is 0.464, compared to 0.389 by the default AlphaFold-Multimer (i.e., the group NBIS-AF2-multimer). Detailed analysis of the results shows that several factors contribute to the improvement, including improved MSAs, iterated modeling for large targets, interplay between monomer and multimer structure prediction for intertwined structures, etc. However, the structure predictions for orphan proteins and multimers remain challenging, and breakthroughs in this area are anticipated in the future.
Collapse
Affiliation(s)
- Zhenling Peng
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Xiaoge Li
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| | - Jianyi Yang
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, China
| |
Collapse
|
8
|
Liu J, Liu D, He G, Zhang G. Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15. Proteins 2023; 91:1861-1870. [PMID: 37553848 DOI: 10.1002/prot.26564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/10/2023]
Abstract
This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guangxing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
9
|
Edmunds NS, Alharbi SMA, Genc AG, Adiyaman R, McGuffin LJ. Estimation of model accuracy in CASP15 using the ModFOLDdock server. Proteins 2023; 91:1871-1878. [PMID: 37314190 PMCID: PMC10952711 DOI: 10.1002/prot.26532] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/12/2023] [Accepted: 05/18/2023] [Indexed: 06/15/2023]
Abstract
In CASP15, there was a greater emphasis on multimeric modeling than in previous experiments, with assembly structures nearly doubling in number (41 up from 22) since the previous round. CASP15 also included a new estimation of model accuracy (EMA) category in recognition of the importance of objective quality assessment (QA) for quaternary structure models. ModFOLDdock is a multimeric model QA server developed by the McGuffin group at the University of Reading, which brings together a range of single-model, clustering, and deep learning methods to form a consensus of approaches. For CASP15, three variants of ModFOLDdock were developed to optimize for the different facets of the quality estimation problem. The standard ModFOLDdock variant produced predicted scores optimized for positive linear correlations with the observed scores. The ModFOLDdockR variant produced predicted scores optimized for ranking, that is, the top-ranked models have the highest accuracy. In addition, the ModFOLDdockS variant used a quasi-single model approach to score each model on an individual basis. The scores from all three variants achieved strongly positive Pearson correlation coefficients with the CASP observed scores (oligo-lDDT) in excess of 0.70, which were maintained across both homomeric and heteromeric model populations. In addition, at least one of the ModFOLDdock variants was consistently ranked in the top two methods across all three EMA categories. Specifically, for overall global fold prediction accuracy, ModFOLDdock placed second and ModFOLDdockR placed third; for overall interface quality prediction accuracy, ModFOLDdockR, ModFOLDdock, and ModFOLDdockS were placed above all other predictor methods, and ModFOLDdockR and ModFOLDdockS were placed second and third respectively for individual residue confidence scores. The ModFOLDdock server is available at: https://www.reading.ac.uk/bioinf/ModFOLDdock/. ModFOLDdock is also available as part of the MultiFOLD docker package: https://hub.docker.com/r/mcguffin/multifold.
Collapse
Affiliation(s)
| | | | - Ahmet G. Genc
- School of Biological SciencesUniversity of ReadingReadingUK
| | - Recep Adiyaman
- School of Biological SciencesUniversity of ReadingReadingUK
| | | |
Collapse
|
10
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
11
|
Liu J, Liu D, Zhang GJ. DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes. Bioinformatics 2023; 39:btad591. [PMID: 37740296 PMCID: PMC10560100 DOI: 10.1093/bioinformatics/btad591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
12
|
Jiang Z, Shen YY, Liu R. Structure-based prediction of nucleic acid binding residues by merging deep learning- and template-based approaches. PLoS Comput Biol 2023; 19:e1011428. [PMID: 37672551 PMCID: PMC10482303 DOI: 10.1371/journal.pcbi.1011428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 08/11/2023] [Indexed: 09/08/2023] Open
Abstract
Accurate prediction of nucleic binding residues is essential for the understanding of transcription and translation processes. Integration of feature- and template-based strategies could improve the prediction of these key residues in proteins. Nevertheless, traditional hybrid algorithms have been surpassed by recently developed deep learning-based methods, and the possibility of integrating deep learning- and template-based approaches to improve performance remains to be explored. To address these issues, we developed a novel structure-based integrative algorithm called NABind that can accurately predict DNA- and RNA-binding residues. A deep learning module was built based on the diversified sequence and structural descriptors and edge aggregated graph attention networks, while a template module was constructed by transforming the alignments between the query and its multiple templates into features for supervised learning. Furthermore, the stacking strategy was adopted to integrate the above two modules for improving prediction performance. Finally, a post-processing module dependent on the random walk algorithm was proposed to further correct the integrative predictions. Extensive evaluations indicated that our approach could not only achieve excellent performance on both native and predicted structures but also outperformed existing hybrid algorithms and recent deep learning methods. The NABind server is available at http://liulab.hzau.edu.cn/NABind/.
Collapse
Affiliation(s)
- Zheng Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yue-Yue Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
13
|
Boral S, Sen S, Kushwaha T, Inampudi KK, De S. Extein residues regulate the catalytic function of Spl DnaX intein enzyme by restricting the near-attack conformations of the active-site residues. Protein Sci 2023; 32:e4699. [PMID: 37313648 PMCID: PMC10288555 DOI: 10.1002/pro.4699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/30/2023] [Accepted: 06/06/2023] [Indexed: 06/15/2023]
Abstract
Intein enzymes catalyze the splicing of their flanking polypeptide chains and have found tremendous biotechnological applications. Their terminal residues form the catalytic core and participate in the splicing reaction. Hence, the neighboring N- and C-terminal extein residues influence the catalytic rate. As these extein residues vary depending on the substrate identity, we tested the influence of 20 amino acids at these sites in the Spl DnaX intein and observed significant variation of spliced product as well as N- and C-terminus cleavage product formation. We investigated the dependence of these reactions on the extein residues by molecular dynamics (MD) simulations on eight extein variants, and found that the conformational sampling of the active-site residues of the intein enzyme differed among these extein variants. We found that the extein variants that sample higher population of near-attack conformers (NACs) of the active-site residues undergo higher product formation in our activity assays. Ground state conformers that closely resemble the transition state are referred to as NACs. Very good correlation was observed between the NAC populations from the MD simulations of eight extein variants and the corresponding product formation from our activity assays. Furthermore, this molecular detail enabled us to elucidate the mechanistic roles of several conserved active-site residues in the splicing reaction. Overall, this study shows that the catalytic power of Spl DnaX intein enzyme, and most likely other inteins, depends on the efficiency of formation of NACs in the ground state, which is further modulated by the extein residues.
Collapse
Affiliation(s)
- Soumendu Boral
- School of BioscienceIndian Institute of Technology KharagpurKharagpurIndia
| | - Srijon Sen
- School of BioscienceIndian Institute of Technology KharagpurKharagpurIndia
| | - Tushar Kushwaha
- Department of BiophysicsAll India Institute of Medical SciencesNew DelhiIndia
| | - Krishna K. Inampudi
- Department of BiophysicsAll India Institute of Medical SciencesNew DelhiIndia
| | - Soumya De
- School of BioscienceIndian Institute of Technology KharagpurKharagpurIndia
| |
Collapse
|
14
|
Wang F, Yang H, Wu Y, Peng L, Li X. SAELGMDA: Identifying human microbe-disease associations based on sparse autoencoder and LightGBM. Front Microbiol 2023; 14:1207209. [PMID: 37415823 PMCID: PMC10320730 DOI: 10.3389/fmicb.2023.1207209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/18/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction Identification of complex associations between diseases and microbes is important to understand the pathogenesis of diseases and design therapeutic strategies. Biomedical experiment-based Microbe-Disease Association (MDA) detection methods are expensive, time-consuming, and laborious. Methods Here, we developed a computational method called SAELGMDA for potential MDA prediction. First, microbe similarity and disease similarity are computed by integrating their functional similarity and Gaussian interaction profile kernel similarity. Second, one microbe-disease pair is presented as a feature vector by combining the microbe and disease similarity matrices. Next, the obtained feature vectors are mapped to a low-dimensional space based on a Sparse AutoEncoder. Finally, unknown microbe-disease pairs are classified based on Light Gradient boosting machine. Results The proposed SAELGMDA method was compared with four state-of-the-art MDA methods (MNNMDA, GATMDA, NTSHMDA, and LRLSHMDA) under five-fold cross validations on diseases, microbes, and microbe-disease pairs on the HMDAD and Disbiome databases. The results show that SAELGMDA computed the best accuracy, Matthews correlation coefficient, AUC, and AUPR under the majority of conditions, outperforming the other four MDA prediction models. In particular, SAELGMDA obtained the best AUCs of 0.8358 and 0.9301 under cross validation on diseases, 0.9838 and 0.9293 under cross validation on microbes, and 0.9857 and 0.9358 under cross validation on microbe-disease pairs on the HMDAD and Disbiome databases. Colorectal cancer, inflammatory bowel disease, and lung cancer are diseases that severely threat human health. We used the proposed SAELGMDA method to find possible microbes for the three diseases. The results demonstrate that there are potential associations between Clostridium coccoides and colorectal cancer and one between Sphingomonadaceae and inflammatory bowel disease. In addition, Veillonella may associate with autism. The inferred MDAs need further validation. Conclusion We anticipate that the proposed SAELGMDA method contributes to the identification of new MDAs.
Collapse
Affiliation(s)
- Feixiang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Huandong Yang
- Department of Gastrointestinal Surgery, Yidu Central Hospital of Weifang, Weifang, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiaoling Li
- The Second Department of Oncology, Beidahuang Industry Group General Hospital, Harbin, China
- The Second Department of Oncology, Heilongjiang Second Cancer Hospital, Harbin, China
| |
Collapse
|
15
|
Zhou L, Wang Y, Peng L, Li Z, Luo X. Identifying potential drug-target interactions based on ensemble deep learning. Front Aging Neurosci 2023; 15:1176400. [PMID: 37396659 PMCID: PMC10309650 DOI: 10.3389/fnagi.2023.1176400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/10/2023] [Indexed: 07/04/2023] Open
Abstract
Introduction Drug-target interaction prediction is one important step in drug research and development. Experimental methods are time consuming and laborious. Methods In this study, we developed a novel DTI prediction method called EnGDD by combining initial feature acquisition, dimensional reduction, and DTI classification based on Gradient boosting neural network, Deep neural network, and Deep Forest. Results EnGDD was compared with seven stat-of-the-art DTI prediction methods (BLM-NII, NRLMF, WNNGIP, NEDTP, DTi2Vec, RoFDT, and MolTrans) on the nuclear receptor, GPCR, ion channel, and enzyme datasets under cross validations on drugs, targets, and drug-target pairs, respectively. EnGDD computed the best recall, accuracy, F1-score, AUC, and AUPR under the majority of conditions, demonstrating its powerful DTI identification performance. EnGDD predicted that D00182 and hsa2099, D07871 and hsa1813, DB00599 and hsa2562, D00002 and hsa10935 have a higher interaction probabilities among unknown drug-target pairs and may be potential DTIs on the four datasets, respectively. In particular, D00002 (Nadide) was identified to interact with hsa10935 (Mitochondrial peroxiredoxin3) whose up-regulation might be used to treat neurodegenerative diseases. Finally, EnGDD was used to find possible drug targets for Parkinson's disease and Alzheimer's disease after confirming its DTI identification performance. The results show that D01277, D04641, and D08969 may be applied to the treatment of Parkinson's disease through targeting hsa1813 (dopamine receptor D2) and D02173, D02558, and D03822 may be the clues of treatment for patients with Alzheimer's disease through targeting hsa5743 (prostaglandinendoperoxide synthase 2). The above prediction results need further biomedical validation. Discussion We anticipate that our proposed EnGDD model can help discover potential therapeutic clues for various diseases including neurodegenerative diseases.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yuzhuang Wang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Xueming Luo
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
16
|
He G, Liu J, Liu D, Zhang G. GraphGPSM: a global scoring model for protein structure using graph neural networks. Brief Bioinform 2023:bbad219. [PMID: 37317619 DOI: 10.1093/bib/bbad219] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/14/2023] [Accepted: 05/22/2023] [Indexed: 06/16/2023] Open
Abstract
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
Collapse
Affiliation(s)
- Guangxing He
- College of Information Engineering, Zhejiang University of Technology
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
17
|
Liu R, Chen X, Zhao F, Jiang Y, Lu Z, Ji H, Feng Y, Li J, Zhang H, Zheng J, Zhang J, Zhao Y. The COMPASS Complex Regulates Fungal Development and Virulence through Histone Crosstalk in the Fungal Pathogen Cryptococcus neoformans. J Fungi (Basel) 2023; 9:672. [PMID: 37367608 DOI: 10.3390/jof9060672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/07/2023] [Accepted: 06/10/2023] [Indexed: 06/28/2023] Open
Abstract
The Complex of Proteins Associated with Set1 (COMPASS) methylates lysine K4 on histone H3 (H3K4) and is conserved from yeast to humans. Its subunits and regulatory roles in the meningitis-causing fungal pathogen Cryptococcus neoformans remain unknown. Here we identified the core subunits of the COMPASS complex in C. neoformans and C. deneoformans and confirmed their conserved roles in H3K4 methylation. Through AlphaFold modeling, we found that Set1, Bre2, Swd1, and Swd3 form the catalytic core of the COMPASS complex and regulate the cryptococcal yeast-to-hypha transition, thermal tolerance, and virulence. The COMPASS complex-mediated histone H3K4 methylation requires H2B mono-ubiquitination by Rad6/Bre1 and the Paf1 complex in order to activate the expression of genes specific for the yeast-to-hypha transition in C. deneoformans. Taken together, our findings demonstrate that putative COMPASS subunits function as a unified complex, contributing to cryptococcal development and virulence.
Collapse
Affiliation(s)
- Ruoyan Liu
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Xiaoyu Chen
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Fujie Zhao
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Yixuan Jiang
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhenguo Lu
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Huining Ji
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yuanyuan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Junqiang Li
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Heng Zhang
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Jianting Zheng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Joint International Research Laboratory of Metabolic and Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jing Zhang
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Youbao Zhao
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
18
|
Zhao K, Xia Y, Zhang F, Zhou X, Li SZ, Zhang G. Protein structure and folding pathway prediction based on remote homologs recognition using PAthreader. Commun Biol 2023; 6:243. [PMID: 36871126 PMCID: PMC9985440 DOI: 10.1038/s42003-023-04605-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/16/2023] [Indexed: 03/06/2023] Open
Abstract
Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we propose a method, PAthreader, to recognize remote templates and explore folding pathways. Firstly, we design a three-track alignment between predicted distance profiles and structure profiles extracted from PDB and AlphaFold DB, to improve the recognition accuracy of remote templates. Secondly, we improve the performance of AlphaFold2 using the templates identified by PAthreader. Thirdly, we explore protein folding pathways based on our conjecture that dynamic folding information of protein is implicitly contained in its remote homologs. The results show that the average accuracy of PAthreader templates is 11.6% higher than that of HHsearch. In terms of structure modelling, PAthreader outperform AlphaFold2 and ranks first on the CAMEO blind test for the latest three months. Furthermore, we predict protein folding pathways for 37 proteins, in which the results of 7 proteins are almost consistent with those of biological experiments, and the other 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Fujin Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Stan Z Li
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, 310030, Zhejiang, China.
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
19
|
Liu J, Zhao K, Zhang G. Improved model quality assessment using sequence and structural information by enhanced deep neural networks. Brief Bioinform 2023; 24:6865134. [PMID: 36460624 DOI: 10.1093/bib/bbac507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/02/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
20
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 104] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|