1
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024:10.1007/s12539-024-00626-x. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
2
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
3
|
Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024; 85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]
Abstract
Protein-protein interactions play crucial roles in many biological processes. Traditionally, protein complex structures are normally built by protein-protein docking. With the rapid development of artificial intelligence and its great success in monomer protein structure prediction, deep learning has widely been applied to modeling protein-protein complex structures through inter-protein contact prediction and end-to-end approaches in the past few years. This article reviews the recent advances of deep-learning-based approaches in modeling protein-protein complex structures as well as their advantages and limitations. Challenges and possible future directions are also briefly discussed in applying deep learning for the prediction of protein complex structures.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China.
| |
Collapse
|
4
|
Chu L, Ruffolo JA, Harmalkar A, Gray JJ. Flexible protein-protein docking with a multitrack iterative transformer. Protein Sci 2024; 33:e4862. [PMID: 38148272 PMCID: PMC10804679 DOI: 10.1002/pro.4862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 11/17/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and reranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, for example, structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multitrack iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments, GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. On the Database of Interacting Protein Structures (DIPS) test set, GeoDock achieves a 43% top-1 success rate, outperforming all other tested methods. However, in the standard DIPS train/test splits, we discovered contamination of close homologs in the training set. After decontaminating the training set, the success rate is 31%. On the DB5.5 test set and a benchmark dataset of antibody-antigen complexes, GeoDock outperforms the deep learning models trained using the same dataset but falls behind most of the conventional methods and AlphaFold-Multimer. GeoDock attains an average inference speed of under 1 s on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee‐Shin Chu
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey A. Ruffolo
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
5
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
6
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. Proteins 2023; 91:1889-1902. [PMID: 37357816 PMCID: PMC10749984 DOI: 10.1002/prot.26542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/27/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
7
|
Liu J, Guo Z, Wu T, Roy RS, Quadir F, Chen C, Cheng J. Enhancing alphafold-multimer-based protein complex structure prediction with MULTICOM in CASP15. Commun Biol 2023; 6:1140. [PMID: 37949999 PMCID: PMC10638423 DOI: 10.1038/s42003-023-05525-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 10/30/2023] [Indexed: 11/12/2023] Open
Abstract
To enhance the AlphaFold-Multimer-based protein complex structure prediction, we developed a quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine its outputs. MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural predictions by using both traditional sequence alignments and Foldseek-based structure alignments, ranks structural predictions through multiple complementary metrics, and refines the structural predictions via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. MULTICOM_qa ranked 3rd among 26 CASP15 server predictors and MULTICOM_human ranked 7th among 87 CASP15 server and human predictors. The average TM-score of the first predictions submitted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 predictions submitted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the Foldseek Structure Alignment-based Multimer structure Generation (FSAMG) method outperforms the widely used sequence alignment-based multimer structure generation.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Chen Chen
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
8
|
Liu Z, Zhu YH, Shen LC, Xiao X, Qiu WR, Yu DJ. Integrating unsupervised language model with multi-view multiple sequence alignments for high-accuracy inter-chain contact prediction. Comput Biol Med 2023; 166:107529. [PMID: 37748220 DOI: 10.1016/j.compbiomed.2023.107529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 08/30/2023] [Accepted: 09/19/2023] [Indexed: 09/27/2023]
Abstract
Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China; Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Yi-Heng Zhu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, 210095 , China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China.
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China.
| |
Collapse
|
9
|
Kilian M, Bischofs IB. Co-evolution at protein-protein interfaces guides inference of stoichiometry of oligomeric protein complexes by de novo structure prediction. Mol Microbiol 2023; 120:763-782. [PMID: 37777474 DOI: 10.1111/mmi.15169] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023]
Abstract
The quaternary structure with specific stoichiometry is pivotal to the specific function of protein complexes. However, determining the structure of many protein complexes experimentally remains a major bottleneck. Structural bioinformatics approaches, such as the deep learning algorithm Alphafold2-multimer (AF2-multimer), leverage the co-evolution of amino acids and sequence-structure relationships for accurate de novo structure and contact prediction. Pseudo-likelihood maximization direct coupling analysis (plmDCA) has been used to detect co-evolving residue pairs by statistical modeling. Here, we provide evidence that combining both methods can be used for de novo prediction of the quaternary structure and stoichiometry of a protein complex. We achieve this by augmenting the existing AF2-multimer confidence metrics with an interpretable score to identify the complex with an optimal fraction of native contacts of co-evolving residue pairs at intermolecular interfaces. We use this strategy to predict the quaternary structure and non-trivial stoichiometries of Bacillus subtilis spore germination protein complexes with unknown structures. Co-evolution at intermolecular interfaces may therefore synergize with AI-based de novo quaternary structure prediction of structurally uncharacterized bacterial protein complexes.
Collapse
Affiliation(s)
- Max Kilian
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Ilka B Bischofs
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| |
Collapse
|
10
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
11
|
Liu J, Guo Z, Wu T, Roy RS, Chen C, Cheng J. Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15. Commun Chem 2023; 6:188. [PMID: 37679431 PMCID: PMC10484931 DOI: 10.1038/s42004-023-00991-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023] Open
Abstract
Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Chen Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
12
|
Lin P, Yan Y, Tao H, Huang SY. Deep transfer learning for inter-chain contact predictions of transmembrane protein complexes. Nat Commun 2023; 14:4935. [PMID: 37582780 PMCID: PMC10427616 DOI: 10.1038/s41467-023-40426-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/21/2023] [Indexed: 08/17/2023] Open
Abstract
Membrane proteins are encoded by approximately a quarter of human genes. Inter-chain residue-residue contact information is important for structure prediction of membrane protein complexes and valuable for understanding their molecular mechanism. Although many deep learning methods have been proposed to predict the intra-protein contacts or helix-helix interactions in membrane proteins, it is still challenging to accurately predict their inter-chain contacts due to the limited number of transmembrane proteins. Addressing the challenge, here we develop a deep transfer learning method for predicting inter-chain contacts of transmembrane protein complexes, named DeepTMP, by taking advantage of the knowledge pre-trained from a large data set of non-transmembrane proteins. DeepTMP utilizes a geometric triangle-aware module to capture the correct inter-chain interaction from the coevolution information generated by protein language models. DeepTMP is extensively evaluated on a test set of 52 self-associated transmembrane protein complexes, and compared with state-of-the-art methods including DeepHomo2.0, CDPred, GLINTER, DeepHomo, and DNCON2_Inter. It is shown that DeepTMP considerably improves the precision of inter-chain contact prediction and outperforms the existing approaches in both accuracy and robustness.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Huanyu Tao
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China.
| |
Collapse
|
13
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
14
|
Chu LS, Ruffolo JA, Harmalkar A, Gray JJ. Flexible Protein-Protein Docking with a Multi-Track Iterative Transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.29.547134. [PMID: 37425754 PMCID: PMC10327054 DOI: 10.1101/2023.06.29.547134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Conventional protein-protein docking algorithms usually rely on heavy candidate sampling and re-ranking, but these steps are time-consuming and hinder applications that require high-throughput complex structure prediction, e.g., structure-based virtual screening. Existing deep learning methods for protein-protein docking, despite being much faster, suffer from low docking success rates. In addition, they simplify the problem to assume no conformational changes within any protein upon binding (rigid docking). This assumption precludes applications when binding-induced conformational changes play a role, such as allosteric inhibition or docking from uncertain unbound model structures. To address these limitations, we present GeoDock, a multi-track iterative transformer network to predict a docked structure from separate docking partners. Unlike deep learning models for protein structure prediction that input multiple sequence alignments (MSAs), GeoDock inputs just the sequences and structures of the docking partners, which suits the tasks when the individual structures are given. GeoDock is flexible at the protein residue level, allowing the prediction of conformational changes upon binding. For a benchmark set of rigid targets, GeoDock obtains a 41% success rate, outperforming all the other tested methods. For a more challenging benchmark set of flexible targets, GeoDock achieves a similar number of top-model successes as the traditional method ClusPro [1], but fewer than ReplicaDock2 [2]. GeoDock attains an average inference speed of under one second on a single GPU, enabling its application in large-scale structure screening. Although binding-induced conformational changes are still a challenge owing to limited training and evaluation data, our architecture sets up the foundation to capture this backbone flexibility. Code and a demonstration Jupyter notebook are available at https://github.com/Graylab/GeoDock.
Collapse
Affiliation(s)
- Lee-Shin Chu
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey A Ruffolo
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
15
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
16
|
Liu J, Guo Z, Wu T, Roy RS, Quadir F, Chen C, Cheng J. Enhancing AlphaFold-Multimer-based Protein Complex Structure Prediction with MULTICOM in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541055. [PMID: 37293073 PMCID: PMC10245707 DOI: 10.1101/2023.05.16.541055] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
AlphaFold-Multimer has emerged as the state-of-the-art tool for predicting the quaternary structure of protein complexes (assemblies or multimers) since its release in 2021. To further enhance the AlphaFold-Multimer-based complex structure prediction, we developed a new quaternary structure prediction system (MULTICOM) to improve the input fed to AlphaFold-Multimer and evaluate and refine the outputs generated by AlphaFold2-Multimer. Specifically, MULTICOM samples diverse multiple sequence alignments (MSAs) and templates for AlphaFold-Multimer to generate structural models by using both traditional sequence alignments and new Foldseek-based structure alignments, ranks structural models through multiple complementary metrics, and refines the structural models via a Foldseek structure alignment-based refinement method. The MULTICOM system with different implementations was blindly tested in the assembly structure prediction in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 as both server and human predictors. Our server (MULTICOM_qa) ranked 3rd among 26 CASP15 server predictors and our human predictor (MULTICOM_human) ranked 7th among 87 CASP15 server and human predictors. The average TM-score of the first models predicted by MULTICOM_qa for CASP15 assembly targets is ~0.76, 5.3% higher than ~0.72 of the standard AlphaFold-Multimer. The average TM-score of the best of top 5 models predicted by MULTICOM_qa is ~0.80, about 8% higher than ~0.74 of the standard AlphaFold-Multimer. Moreover, the novel Foldseek Structure Alignment-based Model Generation (FSAMG) method based on AlphaFold-Multimer outperforms the widely used sequence alignment-based model generation. The source code of MULTICOM is available at: https://github.com/BioinfoMachineLearning/MULTICOM3.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Chen Chen
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
17
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
18
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531814. [PMID: 36945536 PMCID: PMC10028888 DOI: 10.1101/2023.03.08.531814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA. The source code of MULTICOM_qa is available at https://github.com/BioinfoMachineLearning/MULTICOM_qa .
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
19
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
20
|
Lin P, Yan Y, Huang SY. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief Bioinform 2023; 24:6849483. [PMID: 36440949 DOI: 10.1093/bib/bbac499] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
21
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
22
|
Guo Z, Liu J, Skolnick J, Cheng J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat Commun 2022; 13:6963. [PMID: 36379943 PMCID: PMC9666547 DOI: 10.1038/s41467-022-34600-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022] Open
Abstract
Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.
Collapse
Affiliation(s)
- Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jeffrey Skolnick
- grid.213917.f0000 0001 2097 4943School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332-200 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
23
|
The Possible Mechanism of Amyloid Transformation Based on the Geometrical Parameters of Early-Stage Intermediate in Silico Model for Protein Folding. Int J Mol Sci 2022; 23:ijms23169502. [PMID: 36012765 PMCID: PMC9409474 DOI: 10.3390/ijms23169502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/09/2022] [Accepted: 08/19/2022] [Indexed: 12/03/2022] Open
Abstract
The specificity of the available experimentally determined structures of amyloid forms is expressed primarily by the two- and not three-dimensional forms of a single polypeptide chain. Such a flat structure is possible due to the β structure, which occurs predominantly. The stabilization of the fibril in this structure is achieved due to the presence of the numerous hydrogen bonds between the adjacent chains. Together with the different forms of twists created by the single R- or L-handed α-helices, they form the hydrogen bond network. The specificity of the arrangement of these hydrogen bonds lies in their joint orientation in a system perpendicular to the plane formed by the chain and parallel to the fibril axis. The present work proposes the possible mechanism for obtaining such a structure based on the geometric characterization of the polypeptide chain constituting the basis of our early intermediate model for protein folding introduced formerly. This model, being the conformational subspace of Ramachandran plot (the ellipse path), was developed on the basis of the backbone conformation, with the side-chain interactions excluded. Our proposal is also based on the results from molecular dynamics available in the literature leading to the unfolding of α-helical sections, resulting in the β-structural forms. Both techniques used provide a similar suggestion in a search for a mechanism of conformational changes leading to a formation of the amyloid form. The potential mechanism of amyloid transformation is presented here using the fragment of the transthyretin as well as amyloid Aβ.
Collapse
|
24
|
Fu Y, Wang J, Wang Y, Sun H. Investigating the Effect of Tyrosine Kinase Inhibitors on the Interaction between Human Serum Albumin by Atomic Force Microscopy. Biomolecules 2022; 12:biom12060819. [PMID: 35740944 PMCID: PMC9221072 DOI: 10.3390/biom12060819] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/02/2022] [Accepted: 06/09/2022] [Indexed: 02/01/2023] Open
Abstract
It is important for elucidating the regulation mechanism of life activities, as well as for the prevention, diagnosis, and drug design of diseases, to study protein–protein interactions (PPIs). Here, we investigated the interactions of human serum albumin (HSA) in the presence of tyrosine kinase inhibitors (TKIs: imatinib, nilotinib, dasatinib, bosutinib, and ponatinib) using atomic force microscopy (AFM). The distribution of rupture events including the specific interaction force Fi and the non-specific interaction force F0 between HSA pairs was analyzed. Based on the force measurements, Fi and F0 between HSA pairs in the control experiment were calculated to be 47 ± 1.5 and 116.1 ± 1.3 pN. However, Fi was significantly decreased in TKIs, while F0 was slightly decreased. By measuring the rupture forces at various loading rates and according to the Bell equation, the kinetic parameters of the complexes were investigated in greater detail. Molecular docking was used as a complementary means by which to explore the force of this effect. The whole measurements indicated that TKIs influenced PPIs in a variety of ways, among which hydrogen bonding and hydrophobic interactions were the most important. In conclusion, these outcomes give us a better insight into the mechanisms of PPIs when there are exogenous compounds present as well as in different liquid environments.
Collapse
Affiliation(s)
- Yuna Fu
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China; (Y.F.); (H.S.)
| | - Jianhua Wang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China; (Y.F.); (H.S.)
- Correspondence:
| | - Yan Wang
- Key Laboratory of Drug Design, College of Chemistry and Chemical Engineering, Huangshan University, Huangshan 245041, China;
| | - Heng Sun
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China; (Y.F.); (H.S.)
| |
Collapse
|
25
|
Zhang W, Meng Q, Wang J, Guo F. HDIContact: a novel predictor of residue-residue contacts on hetero-dimer interfaces via sequential information and transfer learning strategy. Brief Bioinform 2022; 23:6599074. [PMID: 35653713 DOI: 10.1093/bib/bbac169] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/07/2022] [Accepted: 04/16/2022] [Indexed: 11/12/2022] Open
Abstract
Proteins maintain the functional order of cell in life by interacting with other proteins. Determination of protein complex structural information gives biological insights for the research of diseases and drugs. Recently, a breakthrough has been made in protein monomer structure prediction. However, due to the limited number of the known protein structure and homologous sequences of complexes, the prediction of residue-residue contacts on hetero-dimer interfaces is still a challenge. In this study, we have developed a deep learning framework for inferring inter-protein residue contacts from sequential information, called HDIContact. We utilized transfer learning strategy to produce Multiple Sequence Alignment (MSA) two-dimensional (2D) embedding based on patterns of concatenated MSA, which could reduce the influence of noise on MSA caused by mismatched sequences or less homology. For MSA 2D embedding, HDIContact took advantage of Bi-directional Long Short-Term Memory (BiLSTM) with two-channel to capture 2D context of residue pairs. Our comprehensive assessment on the Escherichia coli (E. coli) test dataset showed that HDIContact outperformed other state-of-the-art methods, with top precision of 65.96%, the Area Under the Receiver Operating Characteristic curve (AUROC) of 83.08% and the Area Under the Precision Recall curve (AUPR) of 25.02%. In addition, we analyzed the potential of HDIContact for human-virus protein-protein complexes, by achieving top five precision of 80% on O75475-P04584 related to Human Immunodeficiency Virus. All experiments indicated that our method was a valuable technical tool for predicting inter-protein residue contacts, which would be helpful for understanding protein-protein interaction mechanisms.
Collapse
Affiliation(s)
- Wei Zhang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Qiaozhen Meng
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
26
|
BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction. J Mol Biol 2022; 434:167556. [DOI: 10.1016/j.jmb.2022.167556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/12/2022] [Accepted: 03/16/2022] [Indexed: 11/20/2022]
|
27
|
Guo L, He J, Lin P, Huang SY, Wang J. TRScore: a three-dimensional RepVGG-based scoring method for ranking protein docking models. Bioinformatics 2022; 38:2444-2451. [PMID: 35199137 DOI: 10.1093/bioinformatics/btac120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 01/19/2022] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPI) play important roles in cellular activities. Due to the technical difficulty and high cost of experimental methods, there are considerable interests towards the development of computational approaches, such as protein docking, to decipher PPI patterns. One of the important and difficult aspects in protein docking is recognizing near-native conformations from a set of decoys, but unfortunately traditional scoring functions still suffer from limited accuracy. Therefore, new scoring methods are pressingly needed in methodological and/or practical implications. RESULTS We present a new deep learning-based scoring method for ranking protein-protein docking models based on a three-dimensional (3D) RepVGG network, named TRScore. To recognize near-native conformations from a set of decoys, TRScore voxelizes the protein-protein interface into a 3D grid labeled by the number of atoms in different physicochemical classes. Benefiting from the deep convolutional RepVGG architecture, TRScore can effectively capture the subtle differences between energetically favorable near-native models and unfavorable non-native decoys without needing extra information. TRScore was extensively evaluated on diverse test sets including protein-protein docking benchmark 5.0 update set, DockGround decoy set, as well as realistic CAPRI decoy set, and overall obtained a significant improvement over existing methods in cross validation and independent evaluations. AVAILABILITY Codes available at: https://github.com/BioinformaticsCSU/TRScore.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jianxin Wang
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
28
|
Scognamiglio PL, Vicidomini C, Fontanella F, De Stefano C, Palumbo R, Roviello GN. Protein Binding of Benzofuran Derivatives: A CD Spectroscopic and In Silico Comparative Study of the Effects of 4-Nitrophenyl Functionalized Benzofurans and Benzodifurans on BSA Protein Structure. Biomolecules 2022; 12:biom12020262. [PMID: 35204762 PMCID: PMC8961527 DOI: 10.3390/biom12020262] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/31/2022] [Accepted: 02/01/2022] [Indexed: 02/06/2023] Open
Abstract
Benzofuran derivatives are synthetic compounds that are finding an increasing interest in the scientific community not only as building blocks for the realization of new materials, but also as potential drugs thanks to their ability to interact with nucleic acids, interfere with the amyloid peptide aggregation and cancer cell cycle. However, their ability to interact with proteins is a theme still in need of investigation for the therapeutic importance that benzofurans could have in the modulation of protein-driven processes and for the possibility of making use of serum albumins as benzofurans delivery systems. To this scope, we investigated the protein binding ability of two 4-nitrophenyl-functionalized benzofurans previously synthesized in our laboratory and herein indicated as BF1 and BDF1, which differed for the number of furan rings (a single moiety in BF1, two in BDF1), using bovine serum albumin (BSA) as a model protein. By circular dichroism (CD) spectroscopy we demonstrated the ability of the two heteroaromatic compounds to alter the secondary structure of the serum albumin leading to different consequences in terms of BSA thermal stability with respect to the unbound protein (ΔTm > 3 °C for BF1, −0.8 °C for BDF1 with respect to unbound BSA, in PBS buffer, pH 7.5) as revealed in our CD melting studies. Moreover, a molecular docking study allowed us to compare the possible ligand binding modes of the mono and difuranic derivatives showing that while BF1 is preferentially housed in the interior of protein structure, BDF1 is predicted to bind the albumin surface with a lower affinity than BF1. Interestingly, the different affinity for the protein target predicted computationally was confirmed also experimentally by fluorescence spectroscopy (kD = 142.4 ± 64.6 nM for BDF1 vs. 28.4 ± 10.1 nM for BF1). Overall, the above findings suggest the ability of benzofurans to bind serum albumins that could act as their carriers in drug delivery applications.
Collapse
Affiliation(s)
| | - Caterina Vicidomini
- Istituto di Biostrutture e Bioimmagini IBB-CNR, via Tommaso De Amicis 95, I-80145 Naples, Italy; (C.V.); (R.P.)
| | - Francesco Fontanella
- Department of Electrical and Information Engineering (DIEI), University of Cassino and Southern Lazio, 03043 Cassino (FR), Italy; (F.F.); (C.D.S.)
| | - Claudio De Stefano
- Department of Electrical and Information Engineering (DIEI), University of Cassino and Southern Lazio, 03043 Cassino (FR), Italy; (F.F.); (C.D.S.)
| | - Rosanna Palumbo
- Istituto di Biostrutture e Bioimmagini IBB-CNR, via Tommaso De Amicis 95, I-80145 Naples, Italy; (C.V.); (R.P.)
| | - Giovanni N. Roviello
- Istituto di Biostrutture e Bioimmagini IBB-CNR, via Tommaso De Amicis 95, I-80145 Naples, Italy; (C.V.); (R.P.)
- Correspondence: ; Tel.: +39-3491928417
| |
Collapse
|
29
|
Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure 2022; 30:462-475. [DOI: 10.1016/j.str.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/26/2021] [Accepted: 02/02/2022] [Indexed: 02/08/2023]
|
30
|
Dey S, Prilusky J, Levy ED. QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure. Front Mol Biosci 2022; 8:787510. [PMID: 35071324 PMCID: PMC8769216 DOI: 10.3389/fmolb.2021.787510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.
Collapse
Affiliation(s)
- Sucharita Dey
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Jaime Prilusky
- Department of Life Sciences and Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
31
|
Roy RS, Quadir F, Soltanikazemi E, Cheng J. OUP accepted manuscript. Bioinformatics 2022; 38:1904-1910. [PMID: 35134816 PMCID: PMC8963319 DOI: 10.1093/bioinformatics/btac063] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 01/17/2022] [Accepted: 01/31/2022] [Indexed: 11/23/2022] Open
Abstract
Motivation Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue–residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue–residue contacts in homodimers from residue–residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue–residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features. Results Tested on three homodimer test datasets (Homo_std dataset, DeepHomo dataset and CASP-CAPRI dataset), the precision of DRCon for top L/5 interchain contact predictions (L: length of monomer in a homodimer) is 43.46%, 47.10% and 33.50% respectively at 6 Å contact threshold, which is substantially better than DeepHomo and DNCON2_inter and similar to Glinter. Moreover, our experiments demonstrate that using predicted tertiary structure or intrachain contacts of monomers in the unbound state as input, DRCon still performs well, even though its accuracy is lower than using true tertiary structures in the bound state are used as input. Finally, our case study shows that good interchain contact predictions can be used to build high-accuracy quaternary structure models of homodimers. Availability and implementation The source code of DRCon is available at https://github.com/jianlin-cheng/DRCon. The datasets are available at https://zenodo.org/record/5998532#.YgF70vXMKsB. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Elham Soltanikazemi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | |
Collapse
|
32
|
Xie Z, Xu J. Deep graph learning of inter-protein contacts. Bioinformatics 2021; 38:947-953. [PMID: 34755837 PMCID: PMC8796373 DOI: 10.1093/bioinformatics/btab761] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/06/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. RESULTS We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments. Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54% on the homodimers and 52% on all the dimers, much higher than 30% obtained by the latest deep learning method DeepHomo on the homodimers and 15% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/zw2x/glinter. The datasets are available at https://github.com/zw2x/glinter/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Xie
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- To whom correspondence should be addressed.
| |
Collapse
|
33
|
Soltanikazemi E, Quadir F, Roy RS, Guo Z, Cheng J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 2021; 90:720-731. [PMID: 34716620 PMCID: PMC8816881 DOI: 10.1002/prot.26269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 09/25/2021] [Accepted: 10/12/2021] [Indexed: 12/21/2022]
Abstract
Predicting the quaternary structure of protein complex is an important problem. Inter‐chain residue‐residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter‐chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter‐chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter‐chain contacts as input, GD can reconstruct high‐quality structural models for homodimers and heterodimers with average TM‐score ranging from 0.92 to 0.99 and average interface root mean square distance from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter‐chain contacts as restraints, the average TM‐score of the structural models built by GD is 0.76. For 46% of the homodimers, high‐quality structural models with TM‐score ≥ 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter‐chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a Critical Assessment of Techniques for Protein Structure Prediction–Critical Assessments of Predictions of Interactions dataset.
Collapse
Affiliation(s)
- Elham Soltanikazemi
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
34
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
35
|
Quadir F, Roy RS, Soltanikazemi E, Cheng J. DeepComplex: A Web Server of Predicting Protein Complex Structures by Deep Learning Inter-chain Contact Prediction and Distance-Based Modelling. Front Mol Biosci 2021; 8:716973. [PMID: 34497831 PMCID: PMC8419425 DOI: 10.3389/fmolb.2021.716973] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 08/12/2021] [Indexed: 11/13/2022] Open
Abstract
Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| |
Collapse
|
36
|
Hong Z, Liu J, Chen Y. An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction. Biophys Chem 2021; 278:106666. [PMID: 34418678 DOI: 10.1016/j.bpc.2021.106666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 08/09/2021] [Accepted: 08/09/2021] [Indexed: 12/29/2022]
Abstract
Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.
Collapse
Affiliation(s)
- Zhonghua Hong
- Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing University, Jiaxing 314001, PR China.
| | - Jiale Liu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, PR China
| | - Yinggao Chen
- Shantou Central Hospital, Shantou 515041, PR China.
| |
Collapse
|
37
|
Ozden B, Kryshtafovych A, Karaca E. Assessment of the CASP14 assembly predictions. Proteins 2021; 89:1787-1799. [PMID: 34337786 DOI: 10.1002/prot.26199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 07/08/2021] [Accepted: 07/22/2021] [Indexed: 12/11/2022]
Abstract
In CASP14, 39 research groups submitted more than 2500 3D models on 22 protein complexes. In general, the community performed well in predicting the fold of the assemblies (for 80% of the targets), although it faced significant challenges in reproducing the native contacts. This is especially the case for the complexes without whole-assembly templates. The leading predictor, BAKER-experimental, used a methodology combining classical techniques (template-based modeling, protein docking) with deep learning-based contact predictions and a fold-and-dock approach. The Venclovas team achieved the runner-up position with template-based modeling and docking. By analyzing the target interfaces, we showed that the complexes with depleted charged contacts or dominating hydrophobic interactions were the most challenging ones to predict. We also demonstrated that if AlphaFold2 predictions were at hand, the interface prediction challenge could be alleviated for most of the targets. All in all, it is evident that new approaches are needed for the accurate prediction of assemblies, which undoubtedly will expand on the significant improvements in the tertiary structure prediction field.
Collapse
Affiliation(s)
- Burcu Ozden
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
38
|
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci Rep 2021; 11:12295. [PMID: 34112907 PMCID: PMC8192766 DOI: 10.1038/s41598-021-91827-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/28/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.
Collapse
|