1
|
Zhao A, Xian L, Franco Ortega S, Yu G, Macho AP. A bacterial effector manipulates plant metabolism, cell death, and immune responses via independent mechanisms. THE NEW PHYTOLOGIST 2024; 243:1137-1153. [PMID: 38877712 DOI: 10.1111/nph.19899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/19/2024] [Indexed: 06/16/2024]
Abstract
Bacterial pathogens inject effector proteins inside plant cells to manipulate cellular functions and achieve a successful infection. The soil-borne pathogen Ralstonia solanacearum (Smith), the causal agent of bacterial wilt disease, secretes > 70 different effectors inside plant cells, although only a handful of them have been thoroughly characterized. One of these effectors, named RipI, is required for full R. solanacearum pathogenicity. RipI associates with plant glutamate decarboxylases (GADs) to promote the accumulation of gamma-aminobutyric acid (GABA), which serves as bacterial nutrient. In this work, we found that RipI can also suppress plant immune responses to bacterial elicitors, which seems to be unrelated to the ability of RipI to induce GABA accumulation and plant cell death. A detailed characterization of the RipI features that contribute to its virulence activities identified two residues at the C-terminal domain that mediate RipI interaction with plant GADs and the subsequent promotion of GABA accumulation. These residues are also required for the appropriate homeostasis of RipI in plant cells and the induction of cell death, although they are partially dispensable for the suppression of plant immune responses. Altogether, we decipher and uncouple the virulence activities of an important bacterial effector at the biochemical level.
Collapse
Affiliation(s)
- Achen Zhao
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 201602, China
- University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Liu Xian
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 201602, China
- University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Sara Franco Ortega
- Department of Biology, Centre for Novel Agricultural Products (CNAP), University of York, York, YO10 5DD, UK
| | - Gang Yu
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 201602, China
| | - Alberto P Macho
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 201602, China
| |
Collapse
|
2
|
Hasan ME, Samir A, Khalil MM, Shafaa MW. Bioinformatics approach for prediction and analysis of the Non-Structural Protein 4B (NSP4B) of the Zika virus. J Genet Eng Biotechnol 2024; 22:100336. [PMID: 38494248 PMCID: PMC10860876 DOI: 10.1016/j.jgeb.2023.100336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
BACKGROUND The Nonstructural Protein (NSP) 4B of Zika virus of 251 amino acids from (ZIKV/Human/POLG_ZIKVF) with accession number (A0A024B7W1), Induces the production of Endoplasmic Reticulum ER-derived membrane vesicles, which are the sites of viral replication. To understand the physical basis of how proteins fold in nature and to solve the challenge of protein structure prediction, Ab-initio and comparative modeling are crucial tools. RESULTS The systematic in silico technique, ThreaDom, had only predicted one domain (4 - 190) of NSP4B. I-TASSER, and Alphafold were ranked as the best servers for full-length 3-D protein structure predictions of NSP4B, where the predicted models were evaluated quantitatively using benchmarked metrics including C-score (-3.43), TM-score (0.77949), RMSD (2.73), and Z-score (1.561). The functional and protein binding motifs were realized using motif databases, secondary and surface accessibility predictions combined with Post-Translational Modification Sites (PTMs) prediction. Two highly conserved protein-binding motifs (Flavi NS4B and Bacillus papRprotein), together with three (PTMs) (Casein Kinase II, Myristyl site, and ASN-Glycosylation site) were predicted utilizing the Motif scan and Scanprosite servers. These patterns and PTMs were associated with NSP4B's role in triggering the development of the viral replication complex and its participation in the localization of NS3 and NS5 on the membrane. Only one hit from Structural Classification of Protein (SCOP) matched the protein sequence at positions 10 to 397 and was categorized six-hairpin glycosidases superfamily according to CATH (Class, Architecture, Topology, and Homology). Integrating this NSP4B information with the templates' SCOP and CATH annotations achieves it easier to attribute structure-function/evolution links to both previously known and recently discovered protein structures.
Collapse
Affiliation(s)
- Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, University of Sadat City, Sadat City 32897, Egypt.
| | - Aya Samir
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt
| | - Magdy M Khalil
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt; School of Biotechnology, Badr University in Cairo, Egypt
| | - Medhat W Shafaa
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt
| |
Collapse
|
3
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
5
|
Yu ZZ, Peng CX, Liu J, Zhang B, Zhou XG, Zhang GJ. DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:912-922. [PMID: 35594218 DOI: 10.1109/tcbb.2022.3175905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.
Collapse
|
6
|
Zhu K, Su H, Peng Z, Yang J. A unified approach to protein domain parsing with inter-residue distance matrix. Bioinformatics 2023; 39:7025502. [PMID: 36734597 PMCID: PMC9919455 DOI: 10.1093/bioinformatics/btad070] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 01/02/2023] [Accepted: 02/01/2023] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION It is fundamental to cut multi-domain proteins into individual domains, for precise domain-based structural and functional studies. In the past, sequence-based and structure-based domain parsing was carried out independently with different methodologies. The recent progress in deep learning-based protein structure prediction provides the opportunity to unify sequence-based and structure-based domain parsing. RESULTS Based on the inter-residue distance matrix, which can be either derived from the input structure or predicted by trRosettaX, we can decode the domain boundaries under a unified framework. We name the proposed method UniDoc. The principle of UniDoc is based on the well-accepted physical concept of maximizing intra-domain interaction while minimizing inter-domain interaction. Comprehensive tests on five benchmark datasets indicate that UniDoc outperforms other state-of-the-art methods in terms of both accuracy and speed, for both sequence-based and structure-based domain parsing. The major contribution of UniDoc is providing a unified framework for structure-based and sequence-based domain parsing. We hope that UniDoc would be a convenient tool for protein domain analysis. AVAILABILITY AND IMPLEMENTATION https://yanglab.nankai.edu.cn/UniDoc/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kun Zhu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Hong Su
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
7
|
Wang L, Zhong H, Xue Z, Wang Y. Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM. BIOINFORMATICS ADVANCES 2022; 2:vbac060. [PMID: 36699417 PMCID: PMC9710680 DOI: 10.1093/bioadv/vbac060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 07/01/2022] [Accepted: 08/30/2022] [Indexed: 01/28/2023]
Abstract
Motivation Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. Results In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. Availability and implementation All source code, datasets and model are available at http://isyslab.info/Res-Dom/.
Collapse
Affiliation(s)
- Lei Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.,School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
8
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 128] [Impact Index Per Article: 64.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
9
|
Mahmud S, Guo Z, Quadir F, Liu J, Cheng J. Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps. BMC Bioinformatics 2022; 23:283. [PMID: 35854211 PMCID: PMC9295499 DOI: 10.1186/s12859-022-04829-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 07/08/2022] [Indexed: 01/25/2023] Open
Abstract
The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.
Collapse
Affiliation(s)
- Sajid Mahmud
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Farhan Quadir
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO USA
| |
Collapse
|
10
|
Zheng W, Wuyun Q, Zhou X, Li Y, Freddolino PL, Zhang Y. LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res 2022; 50:W454-W464. [PMID: 35420129 PMCID: PMC9252734 DOI: 10.1093/nar/gkac248] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Zhou X, Li Y, Zhang C, Zheng W, Zhang G, Zhang Y. Progressive assembly of multi-domain protein structures from cryo-EM density maps. NATURE COMPUTATIONAL SCIENCE 2022; 2:265-275. [PMID: 35844960 PMCID: PMC9281201 DOI: 10.1038/s43588-022-00232-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 03/21/2022] [Indexed: 05/20/2023]
Abstract
Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from cryo-electron microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural-network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to 12 continuous and discontinuous domains with medium- to low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (template modeling score (TM-score) >0.5) for 97% of cases and outperformed state-of-the-art methods. DEMO-EM was applied to the severe acute respiratory syndrome coronavirus 2 genome and generated models with average TM-score and root-mean-square deviation of 0.97 and 1.3 Å, respectively, with respect to the deposited structures. These results demonstrate an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modelling from cryo-electron microscopy maps.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
12
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
13
|
The male germline-specific protein MAPS is indispensable for pachynema progression and fertility. Proc Natl Acad Sci U S A 2021; 118:2025421118. [PMID: 33602822 PMCID: PMC7923350 DOI: 10.1073/pnas.2025421118] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Meiosis is a specialized cell division that creates haploid germ cells from diploid progenitors. Through differential RNA expression analyses, we previously identified a number of mouse genes that were dramatically elevated in spermatocytes, relative to their very low expression in spermatogonia and somatic organs. Here, we investigated in detail 1700102P08Rik, one of these genes, and independently conclude that it encodes a male germline-specific protein, in agreement with a recent report. We demonstrated that it is essential for pachynema progression in spermatocytes and named it male pachynema-specific (MAPS) protein. Mice lacking Maps (Maps -/- ) suffered from pachytene arrest and spermatocyte death, leading to male infertility, whereas female fertility was not affected. Interestingly, pubertal Maps -/- spermatocytes were arrested at early pachytene stage, accompanied by defects in DNA double-strand break (DSB) repair, crossover formation, and XY body formation. In contrast, adult Maps -/- spermatocytes only exhibited partially defective crossover but nonetheless were delayed or failed in progression from early to mid- and late pachytene stage, resulting in cell death. Furthermore, we report a significant transcriptional dysregulation in autosomes and XY chromosomes in both pubertal and adult Maps -/- pachytene spermatocytes, including failed meiotic sex chromosome inactivation (MSCI). Further experiments revealed that MAPS overexpression in vitro dramatically decreased the ubiquitination levels of cellular proteins. Conversely, in Maps -/- pachytene cells, protein ubiquitination was dramatically increased, likely contributing to the large-scale disruption in gene expression in pachytene cells. Thus, MAPS is a protein essential for pachynema progression in male mice, possibly in mammals in general.
Collapse
|
14
|
Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. CELL REPORTS METHODS 2021; 1:100014. [PMID: 34355210 PMCID: PMC8336924 DOI: 10.1016/j.crmeth.2021.100014] [Citation(s) in RCA: 251] [Impact Index Per Article: 83.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022]
Abstract
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
15
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
16
|
Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021; 19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.
Collapse
Affiliation(s)
- Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical College, Yantai, Shandong 264003, China
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hang Zhang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
17
|
Zheng W, Zhou X, Wuyun Q, Pearce R, Li Y, Zhang Y. FUpred: detecting protein domains through deep-learning-based contact map prediction. Bioinformatics 2020; 36:3749-3757. [PMID: 32227201 DOI: 10.1093/bioinformatics/btaa217] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 02/27/2020] [Accepted: 03/25/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. RESULTS We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew's correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/FUpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Qiqige Wuyun
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI 48824, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.,School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
18
|
Ullah A, Masood R. The Sequence and Three-Dimensional Structure Characterization of Snake Venom Phospholipases B. Front Mol Biosci 2020; 7:175. [PMID: 32850964 PMCID: PMC7419708 DOI: 10.3389/fmolb.2020.00175] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 07/06/2020] [Indexed: 11/23/2022] Open
Abstract
Snake venom phospholipases B (SVPLBs) are the least studied enzymes. They constitute about 1% of Bothrops crude venoms, however, in other snake venoms, it is present in less than 1%. These enzymes are considered the most potent hemolytic agent in the venom. Currently, no structural information is available about these enzymes from snake venom. To better understand its three-dimensional structure and mechanisms of envenomation, the current work describes the first model-based structure report of this enzyme from Bothrops moojeni venom named as B. moojeni phospholipase B (PLB_Bm). The structure model of PLB_Bm was generated using model building software like I-TESSER, MODELLER 9v19, and Swiss-Model. The build PLB_Bm model was validated using validation tools (PROCHECK, ERRAT, and Verif3D). The analysis of the PLB_Bm modeled structure indicates that it contains 491 amino acid residues that form a well-defined four-layer αββα sandwich core and has a typical fold of the N-terminal nucleophile aminohydrolase (Ntn-hydrolase). The overall structure of PLB_Bm contains 18 β-strands and 17 α-helices with many connecting loops. The structure divides into two chains (A and B) after maturation. The A chain is smaller and contains 207 amino acid residues, whereas the B chain is larger and contains 266 amino acid residues. The sequence and structural comparison among homologous snake venom, bacterial, and mammals PLBs indicate that differences in the length and sequence composition may confer variable substrate specificity to these enzymes. Moreover, the surface charge distribution, average volume, and depth of the active site cavity also vary in these enzymes. The present work will provide more information about the structure-function relationship and mechanism of action of these enzymes in snakebite envenomation.
Collapse
Affiliation(s)
- Anwar Ullah
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Rehana Masood
- Department of Biochemistry, Shaheed Benazir Bhutto Women University Peshawar, Peshawar, Pakistan
| |
Collapse
|
19
|
Giglio ML, Ituarte S, Milesi V, Dreon MS, Brola TR, Caramelo J, Ip JCH, Maté S, Qiu JW, Otero LH, Heras H. Exaptation of two ancient immune proteins into a new dimeric pore-forming toxin in snails. J Struct Biol 2020; 211:107531. [PMID: 32446810 DOI: 10.1016/j.jsb.2020.107531] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 05/01/2020] [Accepted: 05/14/2020] [Indexed: 11/24/2022]
Abstract
The Membrane Attack Complex-Perforin (MACPF) family is ubiquitously found in all kingdoms. They have diverse cellular roles, however MACPFs with pore-forming toxic function in venoms and poisons are very rare in animals. Here we present the structure of PmPV2, a MACPF toxin from the poisonous apple snail eggs, that can affect the digestive and nervous systems of potential predators. We report the three-dimensional structure of PmPV2, at 17.2 Å resolution determined by negative-stain electron microscopy and its solution structure by small angle X-ray scattering (SAXS). We found that PV2s differ from nearly all MACPFs in two respects: it is a dimer in solution and protomers combine two immune proteins into an AB toxin. The MACPF chain is linked by a single disulfide bond to a tachylectin chain, and two heterodimers are arranged head-to-tail by non-covalent forces in the native protein. MACPF domain is fused with a putative new Ct-accessory domain exclusive to invertebrates. The tachylectin is a six-bladed β-propeller, similar to animal tectonins. We experimentally validated the predicted functions of both subunits and demonstrated for the first time that PV2s are true pore-forming toxins. The tachylectin "B" delivery subunit would bind to target membranes, and then the MACPF "A" toxic subunit would disrupt lipid bilayers forming large pores altering the plasma membrane conductance. These results indicate that PV2s toxicity evolved by linking two immune proteins where their combined preexisting functions gave rise to a new toxic entity with a novel role in defense against predation. This structure is an unparalleled example of protein exaptation.
Collapse
Affiliation(s)
- M L Giglio
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina
| | - S Ituarte
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina
| | - V Milesi
- Instituto de Estudios Inmunológicos y Fisiopatológicos, IIFP CONICET CCT La Plata - UNLP, Facultad de Ciencias Exactas, 1900 La Plata, Argentina
| | - M S Dreon
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina
| | - T R Brola
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina
| | - J Caramelo
- Instituto de Investigaciones Bioquímicas de Buenos Aires, IIBBA, CONICET - Fundación Instituto Leloir, Av Patricias Argentinas 435, C1405BWE Buenos Aires, Argentina
| | - J C H Ip
- Department of Biology, Hong Kong Baptist University, 224 Waterloo Road, Hong Kong, China
| | - S Maté
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina
| | - J W Qiu
- Department of Biology, Hong Kong Baptist University, 224 Waterloo Road, Hong Kong, China
| | - L H Otero
- Instituto de Investigaciones Bioquímicas de Buenos Aires, IIBBA, CONICET - Fundación Instituto Leloir, Av Patricias Argentinas 435, C1405BWE Buenos Aires, Argentina; Plataforma Argentina de Biología Estructural y Metabolómica PLABEM, Av. Patricias Argentinas 435, C1405BWE Buenos Aires, Argentina.
| | - H Heras
- Instituto de Investigaciones Bioquímicas de La Plata "Prof. Dr. Rodolfo R. Brenner", INIBIOLP, CONICET CCT La Plata - Universidad Nacional de La Plata (UNLP), Facultad de Ciencias Médicas,1900 La Plata, Argentina; Cátedra de Química Biologica, Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata (UNLP), 1900 La Plata, Argentina.
| |
Collapse
|
20
|
Shi Q, Chen W, Huang S, Jin F, Dong Y, Wang Y, Xue Z. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics 2020; 35:5128-5136. [PMID: 31197306 DOI: 10.1093/bioinformatics/btz464] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Accepted: 06/05/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. RESULTS This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units' models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. AVAILABILITY AND IMPLEMENTATION The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Shi
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Weiya Chen
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Siqi Huang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Fanglin Jin
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yinghao Dong
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Yan Wang
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Zhidong Xue
- School of Software Engineering and College of Life Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| |
Collapse
|
21
|
Hong SH, Joo K, Lee J. ConDo: protein domain boundary prediction using coevolutionary information. Bioinformatics 2020; 35:2411-2417. [PMID: 30500873 DOI: 10.1093/bioinformatics/bty973] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 11/15/2018] [Accepted: 11/29/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Domain boundary prediction is one of the most important problems in the study of protein structure and function. Many sequence-based domain boundary prediction methods are either template-based or machine learning (ML) based. ML-based methods often perform poorly due to their use of only local (i.e. short-range) features. These conventional features such as sequence profiles, secondary structures and solvent accessibilities are typically restricted to be within 20 residues of the domain boundary candidate. RESULTS To address the performance of ML-based methods, we developed a new protein domain boundary prediction method (ConDo) that utilizes novel long-range features such as coevolutionary information in addition to the aforementioned local window features as inputs for ML. Toward this purpose, two types of coevolutionary information were extracted from multiple sequence alignment using direct coupling analysis: (i) partially aligned sequences, and (ii) correlated mutation information. Both the partially aligned sequence information and the modularity of residue-residue couplings possess long-range correlation information. AVAILABILITY AND IMPLEMENTATION https://github.com/gicsaw/ConDo.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Korea
| | - Jooyoung Lee
- School of Computational Sciences.,Center for Advanced Computation, Korea Institute for Advanced Study, Korea
| |
Collapse
|
22
|
Integrating Non-NMR Distance Restraints to Augment NMR Depiction of Protein Structure and Dynamics. J Mol Biol 2020; 432:2913-2929. [DOI: 10.1016/j.jmb.2020.01.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 01/17/2020] [Accepted: 01/17/2020] [Indexed: 11/24/2022]
|
23
|
Ullah A. Structure-Function Studies and Mechanism of Action of Snake Venom L-Amino Acid Oxidases. Front Pharmacol 2020; 11:110. [PMID: 32158389 PMCID: PMC7052187 DOI: 10.3389/fphar.2020.00110] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/28/2020] [Indexed: 12/30/2022] Open
Abstract
Snake venom L-amino acid oxidases (SV-LAAOs) are the least studied venom enzymes. These enzymes catalyze the stereospecific oxidation of an L-amino acid to their corresponding α-keto acid with the liberation of hydrogen peroxide (H2O2) and ammonia (NH3). They display various pathological and physiological activities including induction of apoptosis, edema, platelet aggregation/inhibition, hemorrhagic, and anticoagulant activities. They also show antibacterial, antiviral and leishmanicidal activity and have been used as therapeutic agents in some disease conditions like cancer and anti-HIV drugs. Although the crystal structures of six SV-LAAOs are present in the Protein Data Bank (PDB), there is no single article that describes all of them in particular. To better understand their structural properties and correlate it with their function, the current work describes structure characterization, structure-based mechanism of catalysis, inhibition and substrate specificity of SV-LAAOs. Sequence analysis indicates a high sequence identity (>84%) among SV-LAAOs, comparatively lower sequence identity with Pig kidney D-amino acid oxidase (<50%) and very low sequence identity (<24%) with bacterial LAAOs, Fugal (L-lysine oxidase), and Zea mays Polyamine oxidase (PAAO). The three-dimensional structure of these enzymes are composed of three-domains, a FAD-binding domain, a substrate-binding domain and a helical domain. The sequence and structural analysis indicate that the amino acid residues in the loops vary in length and composition due to which the surface charge distribution also varies that may impart variable substrate specificity to these enzymes. The active site cavity volume and its average depth also vary in these enzymes. The inhibition of these enzymes by synthetic inhibitors will lead to the production of more potent antivenoms against snakebite envenomation.
Collapse
Affiliation(s)
- Anwar Ullah
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| |
Collapse
|
24
|
Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 2019; 87:1149-1164. [PMID: 31365149 PMCID: PMC6851476 DOI: 10.1002/prot.25792] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/14/2019] [Accepted: 07/27/2019] [Indexed: 12/28/2022]
Abstract
We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
25
|
Li Y, Zhang C, Bell EW, Yu DJ, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 2019; 87:1082-1091. [PMID: 31407406 PMCID: PMC6851483 DOI: 10.1002/prot.25798] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/20/2019] [Accepted: 08/08/2019] [Indexed: 12/26/2022]
Abstract
We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
Collapse
Affiliation(s)
- Yang Li
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Dong-Jun Yu
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| |
Collapse
|
26
|
Ullah A, Ullah K, Ali H, Betzel C, Ur Rehman S. The Sequence and a Three-Dimensional Structural Analysis Reveal Substrate Specificity Among Snake Venom Phosphodiesterases. Toxins (Basel) 2019; 11:E625. [PMID: 31661911 PMCID: PMC6891707 DOI: 10.3390/toxins11110625] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 08/21/2019] [Accepted: 09/03/2019] [Indexed: 02/06/2023] Open
Abstract
(1) Background. Snake venom phosphodiesterases (SVPDEs) are among the least studied venom enzymes. In envenomation, they display various pathological effects, including induction of hypotension, inhibition of platelet aggregation, edema, and paralysis. Until now, there have been no 3D structural studies of these enzymes, thereby preventing structure-function analysis. To enable such investigations, the present work describes the model-based structural and functional characterization of a phosphodiesterase from Crotalusadamanteus venom, named PDE_Ca. (2) Methods. The PDE_Ca structure model was produced and validated using various software (model building: I-TESSER, MODELLER 9v19, Swiss-Model, and validation tools: PROCHECK, ERRAT, Molecular Dynamic Simulation, and Verif3D). (3) Results. The proposed model of the enzyme indicates that the 3D structure of PDE_Ca comprises four domains, a somatomedin B domain, a somatomedin B-like domain, an ectonucleotide pyrophosphatase domain, and a DNA/RNA non-specific domain. Sequence and structural analyses suggest that differences in length and composition among homologous snake venom sequences may account for their differences in substrate specificity. Other properties that may influence substrate specificity are the average volume and depth of the active site cavity. (4) Conclusion. Sequence comparisons indicate that SVPDEs exhibit high sequence identity but comparatively low identity with mammalian and bacterial PDEs.
Collapse
Affiliation(s)
- Anwar Ullah
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad 45550, Pakistan.
| | - Kifayat Ullah
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad 45550, Pakistan.
| | - Hamid Ali
- Department of Biosciences, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad 45550, Pakistan.
| | - Christian Betzel
- Institute of Biochemistry and Molecular Biology, University of Hamburg, Laboratory for Structural Biology of Infection and Inflammation, c/o DESY. Build. 22a, Notkestrasse 85, 22607 Hamburg, Germany.
| | - Shafiq Ur Rehman
- Department of Botany, University of Okara, Okara, Punjab 56300, Pakistan.
| |
Collapse
|
27
|
Zheng W, Zhang C, Bell EW, Zhang Y. I-TASSER gateway: A protein structure and function prediction server powered by XSEDE. FUTURE GENERATIONS COMPUTER SYSTEMS : FGCS 2019; 99:73-85. [PMID: 31427836 PMCID: PMC6699767 DOI: 10.1016/j.future.2019.04.011] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is an increasing gap between the number of known protein sequences and the number of proteins with experimentally characterized structure and function. To alleviate this issue, we have developed the I-TASSER gateway, an online server for automated and reliable protein structure and function prediction. For a given sequence, I-TASSER starts with template recognition from a known structure library, followed by full-length atomic model construction by iterative assembly simulations of the continuous structural fragments excised from the template alignments. Functional insights are then derived from comparative matching of the predicted model with a library of proteins with known function. The I-TASSER pipeline has been recently integrated with the XSEDE Gateway system to accommodate pressing demand from the user community and increasing computing costs. This report summarizes the configuration of the I-TASSER Gateway with the XSEDE-Comet supercomputer cluster, together with an overview of the I-TASSER method and milestones of its development.
Collapse
|
28
|
Wang Y, Wang J, Li R, Shi Q, Xue Z, Zhang Y. ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly. Nucleic Acids Res 2019; 45:W400-W407. [PMID: 28498994 PMCID: PMC5793814 DOI: 10.1093/nar/gkx410] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 04/28/2017] [Indexed: 12/21/2022] Open
Abstract
We develop a hierarchical pipeline, ThreaDomEx, for both continuous domain (CD) and discontinuous domain (DCD) structure predictions. Starting from a query sequence, ThreaDomEx first threads it through the PDB to identify multiple structure templates, where a profile of domain conservation score (DC-score) is derived for domain-segment assignment. To further detect DCDs that consist of separated segments along the sequence, a boundary-clustering algorithm is used to refine the DCD-linker locations. In case that the templates do not contain DCDs, a domain-segment assembly process, guided by symmetry comparison, is applied for further DCD detections. ThreaDomEx was tested a set of 1111 proteins and achieved a normalized domain overlap score of 89.3% compared to experimental data, which is significantly higher than other state-of-the-art methods. It also recalls 26.7% of DCDs with 72.7% precision on the proteins for which threading failed to detect any DCDs. The server provides facilities for users to interactively refine the domain models by adjusting DC-score threshold, deleting and adding domain linkers, and assembling domain segments, which are particularly helpful for the hard targets for which current methods have a low accuracy while human-expert knowledge and experimental insights can be used for refining models. ThreaDomEX server is available at http://zhanglab.ccmb.med.umich.edu/ThreaDomEx.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jian Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Ruiming Li
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Qiang Shi
- School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Software, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
29
|
El Hefnawi MM, Hasan ME, Mahmoud A, Khidr YA, El Behaidy WH, El-Absawy ESA, Hemeida AA. Prediction and Analysis of Three-Dimensional Structure of the p7- Transactivated Protein1 of Hepatitis C Virus. Infect Disord Drug Targets 2019; 19:55-66. [PMID: 29243584 DOI: 10.2174/1871526518666171215123214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 06/07/2017] [Accepted: 06/11/2017] [Indexed: 01/06/2023]
Abstract
BACKGROUND The p7-transactivated protein1 of Hepatitis C virus is a small integral membrane protein of 127 amino acids, which is crucial for assembly and release of infectious virions. Ab initio or comparative modelling, is an essential tool to solve the problem of protein structure prediction and to comprehend the physicochemical fundamental of how proteins fold in nature. RESULTS Only one domain (1-127) of p7-transactivated protein1 has been predicted using the systematic in silico approach, ThreaDom. I-TASSER was ranked as the best server for full-length 3-D protein structural predictions of p7-transactivated protein1 where the benchmarked scoring system such as C-score, TM-score, RMSD and Z-score are used to obtain quantitative assessments of the I-TASSER models. Scanning protein motif databases, along with secondary and surface accessibility predictions integrated with post translational modification sites (PTMs) prediction revealed functional and protein binding motifs. Three protein binding motifs (two Asp/Glutamnse, CTNNB1- bd_N) with high sequence conservation and two PTMs prediction: Camp_phospho_site and Myristyl site were predicted using BLOCKS and PROSITE scan. These motifs and PTMs were related to the function of p7-transactivated protein1 protein in inducing ion channel/pore and release of infectious virions. Using SCOP, only one hit matched protein sequence at 71-120 was classified as small proteins and FYVE/PHD zinc finger superfamily. CONCLUSION Integrating this information about the p7-transactivated protein1 with SCOP and CATH annotations of the templates facilitates the assignment of structure-function/ evolution relationships to the known and the newly determined protein structures.
Collapse
Affiliation(s)
- Mahmoud M El Hefnawi
- Informatics and Systems Department, Division of Engineering Research Sciences, the National Research Centre, Giza, Egypt
| | - Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | - Amal Mahmoud
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt.,Department of Biology, College of Science, Imam Abdulrahman Bin Faisal University, Damam, Saudi Arabia
| | - Yehia A Khidr
- Plant Biotechnology Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | | | - El-Sayed A El-Absawy
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| | - Alaa A Hemeida
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, Sadat City University, Sadat, Egypt
| |
Collapse
|
30
|
Kamal H, Minhas FUAA, Farooq M, Tripathi D, Hamza M, Mustafa R, Khan MZ, Mansoor S, Pappu HR, Amin I. In silico Prediction and Validations of Domains Involved in Gossypium hirsutum SnRK1 Protein Interaction With Cotton Leaf Curl Multan Betasatellite Encoded βC1. FRONTIERS IN PLANT SCIENCE 2019; 10:656. [PMID: 31191577 PMCID: PMC6546731 DOI: 10.3389/fpls.2019.00656] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 05/01/2019] [Indexed: 05/19/2023]
Abstract
Cotton leaf curl disease (CLCuD) caused by viruses of genus Begomovirus is a major constraint to cotton (Gossypium hirsutum) production in many cotton-growing regions of the world. Symptoms of the disease are caused by Cotton leaf curl Multan betasatellite (CLCuMB) that encodes a pathogenicity determinant protein, βC1. Here, we report the identification of interacting regions in βC1 protein by using computational approaches including sequence recognition, and binding site and interface prediction methods. We show the domain-level interactions based on the structural analysis of G. hirsutum SnRK1 protein and its domains with CLCuMB-βC1. To verify and validate the in silico predictions, three different experimental approaches, yeast two hybrid, bimolecular fluorescence complementation and pull down assay were used. Our results showed that ubiquitin-associated domain (UBA) and autoinhibitory sequence (AIS) domains of G. hirsutum-encoded SnRK1 are involved in CLCuMB-βC1 interaction. This is the first comprehensive investigation that combined in silico interaction prediction followed by experimental validation of interaction between CLCuMB-βC1 and a host protein. We demonstrated that data from computational biology could provide binding site information between CLCuD-associated viruses/satellites and new hosts that lack known binding site information for protein-protein interaction studies. Implications of these findings are discussed.
Collapse
Affiliation(s)
- Hira Kamal
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
- Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan
- Department of Plant Pathology, Washington State University, Pullman, WA, United States
| | | | - Muhammad Farooq
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| | - Diwaker Tripathi
- Department of Biology, University of Washington, Seattle, WA, United States
| | - Muhammad Hamza
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| | - Roma Mustafa
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| | - Muhammad Zuhaib Khan
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| | - Shahid Mansoor
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| | - Hanu R. Pappu
- Department of Plant Pathology, Washington State University, Pullman, WA, United States
| | - Imran Amin
- National Institute for Biotechnology and Genetic Engineering, Faisalabad, Pakistan
| |
Collapse
|
31
|
Rajesh Y, Banerjee A, Pal I, Biswas A, Das S, Dey KK, Kapoor N, Ghosh AK, Mitra P, Mandal M. Delineation of crosstalk between HSP27 and MMP-2/MMP-9: A synergistic therapeutic avenue for glioblastoma management. Biochim Biophys Acta Gen Subj 2019; 1863:1196-1209. [PMID: 31028823 DOI: 10.1016/j.bbagen.2019.04.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/21/2019] [Accepted: 04/22/2019] [Indexed: 12/29/2022]
Abstract
BACKGROUND Epithelial to mesenchymal transition (EMT) and extracellular matrix (ECM) remodeling, are the two elemental processes promoting glioblastoma (GBM). In the present work we propose a mechanistic modelling of GBM and in process establish a hypothesis elucidating critical crosstalk between heat shock proteins (HSPs) and matrix metalloproteinases (MMPs) with synergistic upregulation of EMT-like process and ECM remodeling. METHODS The interaction and the precise binding site between the HSP and MMP proteins was assayed computationally, in-vitro and in GBM clinical samples. RESULTS A positive crosstalk of HSP27 with MMP-2 and MMP-9 was established in both GBM patient tissues and cell-lines. This association was found to be of prime significance for ECM remodeling and promotion of EMT-like characteristics. In-silico predictions revealed 3 plausible interaction sites of HSP27 interacting with MMP-2 and MMP-9. Site-directed mutagenesis followed by in-vitro immunoprecipitation assay (IP) with 3 mutated recombinant HSP27, confirmed an interface stretch containing residues 29-40 of HSP27 to be a common interaction site for both MMP-2 and MMP-9. This was further validated with in-vitro IP of truncated (sans AA 29-40) recombinant HSP27 with MMP-2 and MMP-9. CONCLUSION The association of HSP27 with MMP-2 and MMP-9 proteins along with the identified interacting stretch has the potential to contribute towards drug development to inhibit GBM infiltration and migration. GENERAL SIGNIFICANCE Current findings provide a novel therapeutic target for GBM opening a new horizon in the field of GBM management.
Collapse
Affiliation(s)
- Y Rajesh
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India
| | - Anupam Banerjee
- Advanced Technology Development Centre, Indian Institute of Technology, Kharagpur, India
| | - Ipsita Pal
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India
| | - Angana Biswas
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India
| | - Subhayan Das
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India
| | - Kaushik Kumar Dey
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India; Structural Biology & Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, USA
| | - Neelkamal Kapoor
- Department of Pathology and Lab Medicine, All India Institute of Medical Sciences, Bhopal, India
| | - Ananta Kumar Ghosh
- Department of Biotechnology, Indian Institute of Technology, Kharagpur, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, India
| | - Mahitosh Mandal
- School of Medical Science and Technology, Indian Institute of Technology, Kharagpur, India.
| |
Collapse
|
32
|
Olaya C, Adhikari B, Raikhy G, Cheng J, Pappu HR. Identification and localization of Tospovirus genus-wide conserved residues in 3D models of the nucleocapsid and the silencing suppressor proteins. Virol J 2019; 16:7. [PMID: 30634979 PMCID: PMC6330412 DOI: 10.1186/s12985-018-1106-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 10/16/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Tospoviruses (genus Tospovirus, family Peribunyaviridae, order Bunyavirales) cause significant losses to a wide range of agronomic and horticultural crops worldwide. Identification and characterization of specific sequences and motifs that are critical for virus infection and pathogenicity could provide useful insights and targets for engineering virus resistance that is potentially both broad spectrum and durable. Tomato spotted wilt virus (TSWV), the most prolific member of the group, was used to better understand the structure-function relationships of the nucleocapsid gene (N), and the silencing suppressor gene (NSs), coded by the TSWV small RNA. METHODS Using a global collection of orthotospoviral sequences, several amino acids that were conserved across the genus and the potential location of these conserved amino acid motifs in these proteins was determined. We used state of the art 3D modeling algorithms, MULTICOM-CLUSTER, MULTICOM-CONSTRUCT, MULTICOM-NOVEL, I-TASSER, ROSETTA and CONFOLD to predict the secondary and tertiary structures of the N and the NSs proteins. RESULTS We identified nine amino acid residues in the N protein among 31 known tospoviral species, and ten amino acid residues in NSs protein among 27 tospoviral species that were conserved across the genus. For the N protein, all three algorithms gave nearly identical tertiary models. While the conserved residues were distributed throughout the protein on a linear scale, at the tertiary level, three residues were consistently located in the coil in all the models. For NSs protein models, there was no agreement among the three algorithms. However, with respect to the localization of the conserved motifs, G18 was consistently located in coil, while H115 was localized in the coil in three models. CONCLUSIONS This is the first report of predicting the 3D structure of any tospoviral NSs protein and revealed a consistent location for two of the ten conserved residues. The modelers used gave accurate prediction for N protein allowing the localization of the conserved residues. Results form the basis for further work on the structure-function relationships of tospoviral proteins and could be useful in developing novel virus control strategies targeting the conserved residues.
Collapse
Affiliation(s)
- Cristian Olaya
- Department of Plant Pathology, Washington State University, Pullman, WA, 99164, USA
| | - Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri, St. Louis, MO, 63121, USA
| | - Gaurav Raikhy
- Department of Microbiology and Immunology, Louisiana State University, Shreverport, LA, 71101, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Hanu R Pappu
- Department of Plant Pathology, Washington State University, Pullman, WA, 99164, USA.
| |
Collapse
|
33
|
Jiang Y, Wang D, Xu D. DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:66-75. [PMID: 30864311 PMCID: PMC6417825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein domain boundary prediction is usually an early step to understand protein function and structure. Most of the current computational domain boundary prediction methods suffer from low accuracy and limitation in handling multi-domain types, or even cannot be applied on certain targets such as proteins with discontinuous domain. We developed an ab-initio protein domain predictor using a stacked bidirectional LSTM model in deep learning. Our model is trained by a large amount of protein sequences without using feature engineering such as sequence profiles. Hence, the predictions using our method is much faster than others, and the trained model can be applied to any type of target proteins without constraint. We evaluated DeepDom by a 10-fold cross validation and also by applying it on targets in different categories from CASP 8 and CASP 9. The comparison with other methods has shown that DeepDom outperforms most of the current ab-initio methods and even achieves better results than the top-level template-based method in certain cases. The code of DeepDom and the test data we used in CASP 8, 9 can be accessed through GitHub at https://github.com/yuexujiang/DeepDom.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
| | | | | |
Collapse
|
34
|
Zeeshan N, Naz S, Naz S, Afroz A, Zahur M, Zia S. Heterologous expression and enhanced production of β-1,4-glucanase of Bacillus halodurans C-125 in Escherichia coli. ELECTRON J BIOTECHN 2018. [DOI: 10.1016/j.ejbt.2018.05.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022] Open
|
35
|
Gong Z, Liu Z, Dong X, Ding YH, Dong MQ, Tang C. Protocol for analyzing protein ensemble structures from chemical cross-links using DynaXL. BIOPHYSICS REPORTS 2017; 3:100-108. [PMID: 29238747 PMCID: PMC5719800 DOI: 10.1007/s41048-017-0044-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 09/18/2017] [Indexed: 12/16/2022] Open
Abstract
Chemical cross-linking coupled with mass spectroscopy (CXMS) is a powerful technique for investigating protein structures. CXMS has been mostly used to characterize the predominant structure for a protein, whereas cross-links incompatible with a unique structure of a protein or a protein complex are often discarded. We have recently shown that the so-called over-length cross-links actually contain protein dynamics information. We have thus established a method called DynaXL, which allow us to extract the information from the over-length cross-links and to visualize protein ensemble structures. In this protocol, we present the detailed procedure for using DynaXL, which comprises five steps. They are identification of highly confident cross-links, delineation of protein domains/subunits, ensemble rigid-body refinement, and final validation/assessment. The DynaXL method is generally applicable for analyzing the ensemble structures of multi-domain proteins and protein–protein complexes, and is freely available at www.tanglab.org/resources.
Collapse
Affiliation(s)
- Zhou Gong
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, and National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China.,National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China
| | - Zhu Liu
- Department of Pharmacology, Institute of Neuroscience, Key Laboratory of Medical Neurobiology of the Ministry of Health of China, Zhejiang University School of Medicine, Hangzhou, 310057 China
| | - Xu Dong
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, and National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China.,National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China
| | - Yue-He Ding
- RNA Therapeutics Institute, University of Massachusetts Medical School, 368 Plantation Street, Worcester, MA 01605 USA
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, 102206 China
| | - Chun Tang
- CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, and National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China.,National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, 430071 China
| |
Collapse
|
36
|
Zhang C, Mortuza SM, He B, Wang Y, Zhang Y. Template-based and free modeling of I-TASSER and QUARK pipelines using predicted contact maps in CASP12. Proteins 2017; 86 Suppl 1:136-151. [PMID: 29082551 DOI: 10.1002/prot.25414] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/09/2017] [Accepted: 10/27/2017] [Indexed: 12/26/2022]
Abstract
We develop two complementary pipelines, "Zhang-Server" and "QUARK", based on I-TASSER and QUARK pipelines for template-based modeling (TBM) and free modeling (FM), and test them in the CASP12 experiment. The combination of I-TASSER and QUARK successfully folds three medium-size FM targets that have more than 150 residues, even though the interplay between the two pipelines still awaits further optimization. Newly developed sequence-based contact prediction by NeBcon plays a critical role to enhance the quality of models, particularly for FM targets, by the new pipelines. The inclusion of NeBcon predicted contacts as restraints in the QUARK simulations results in an average TM-score of 0.41 for the best in top five predicted models, which is 37% higher than that by the QUARK simulations without contacts. In particular, there are seven targets that are converted from non-foldable to foldable (TM-score >0.5) due to the use of contact restraints in the simulations. Another additional feature in the current pipelines is the local structure quality prediction by ResQ, which provides a robust residue-level modeling error estimation. Despite the success, significant challenges still remain in ab initio modeling of multi-domain proteins and folding of β-proteins with complicated topologies bound by long-range strand-strand interactions. Improvements on domain boundary and long-range contact prediction, as well as optimal use of the predicted contacts and multiple threading alignments, are critical to address these issues seen in the CASP12 experiment.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
37
|
Rimmer MA, Nadeau OW, Yang J, Artigues A, Zhang Y, Carlson GM. The structure of the large regulatory α subunit of phosphorylase kinase examined by modeling and hydrogen-deuterium exchange. Protein Sci 2017; 27:472-484. [PMID: 29098725 DOI: 10.1002/pro.3339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 10/19/2017] [Accepted: 10/19/2017] [Indexed: 01/31/2023]
Abstract
Phosphorylase kinase (PhK), a 1.3 MDa regulatory enzyme complex in the glycogenolysis cascade, has four copies each of four subunits, (αβγδ)4 , and 325 kDa of unique sequence (the mass of an αβγδ protomer). The α, β and δ subunits are regulatory, and contain allosteric activation sites that stimulate the activity of the catalytic γ subunit in response to diverse signaling molecules. Due to its size and complexity, no high resolution structures have been solved for the intact complex or its regulatory α and β subunits. Of PhK's four subunits, the least is known about the structure and function of its largest subunit, α. Here, we have modeled the full-length α subunit, compared that structure against previously predicted domains within this subunit, and performed hydrogen-deuterium exchange on the intact subunit within the PhK complex. Our modeling results show α to comprise two major domains: an N-terminal glycoside hydrolase domain and a large C-terminal importin α/β-like domain. This structure is similar to our previously published model for the homologous β subunit, although clear structural differences are present. The overall highly helical structure with several intervening hinge regions is consistent with our hydrogen-deuterium exchange results obtained for this subunit as part of the (αβγδ)4 PhK complex. Several low exchanging regions predicted to lack ordered secondary structure are consistent with inter-subunit contact sites for α in the quaternary structure of PhK; of particular interest is a low-exchanging region in the C-terminus of α that is known to bind the regulatory domain of the catalytic γ subunit.
Collapse
Affiliation(s)
- Mary Ashley Rimmer
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, KS, 66160
| | - Owen W Nadeau
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, KS, 66160
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, MI, 48109
| | - Antonio Artigues
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, KS, 66160
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, MI, 48109
| | - Gerald M Carlson
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, KS, 66160
| |
Collapse
|
38
|
Meng F, Kurgan L. DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences. Bioinformatics 2017; 32:i341-i350. [PMID: 27307636 PMCID: PMC4908364 DOI: 10.1093/bioinformatics/btw280] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation:http://biomine.ece.ualberta.ca/DFLpred/. Contact:lkurgan@vcu.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, U.S.A
| |
Collapse
|
39
|
Ding YH, Gong Z, Dong X, Liu K, Liu Z, Liu C, He SM, Dong MQ, Tang C. Modeling Protein Excited-state Structures from "Over-length" Chemical Cross-links. J Biol Chem 2017; 292:1187-1196. [PMID: 27994050 PMCID: PMC5270465 DOI: 10.1074/jbc.m116.761841] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 11/25/2016] [Indexed: 11/06/2022] Open
Abstract
Chemical cross-linking coupled with mass spectroscopy (CXMS) provides proximity information for the cross-linked residues and is used increasingly for modeling protein structures. However, experimentally identified cross-links are sometimes incompatible with the known structure of a protein, as the distance calculated between the cross-linked residues far exceeds the maximum length of the cross-linker. The discrepancies may persist even after eliminating potentially false cross-links and excluding intermolecular ones. Thus the "over-length" cross-links may arise from alternative excited-state conformation of the protein. Here we present a method and associated software DynaXL for visualizing the ensemble structures of multidomain proteins based on intramolecular cross-links identified by mass spectrometry with high confidence. Representing the cross-linkers and cross-linking reactions explicitly, we show that the protein excited-state structure can be modeled with as few as two over-length cross-links. We demonstrate the generality of our method with three systems: calmodulin, enzyme I, and glutamine-binding protein, and we show that these proteins alternate between different conformations for interacting with other proteins and ligands. Taken together, the over-length chemical cross-links contain valuable information about protein dynamics, and our findings here illustrate the relationship between dynamic domain movement and protein function.
Collapse
Affiliation(s)
- Yue-He Ding
- the National Institute of Biological Sciences, Beijing 102206
| | - Zhou Gong
- From the CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, Hubei Province 430071
| | - Xu Dong
- From the CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, Hubei Province 430071
| | - Kan Liu
- From the CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, Hubei Province 430071
| | - Zhu Liu
- the Department of Pharmacology, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, and
| | - Chao Liu
- the Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, CAS, Beijing 100190, China
| | - Si-Min He
- the Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences, Institute of Computing Technology, CAS, Beijing 100190, China
| | - Meng-Qiu Dong
- the National Institute of Biological Sciences, Beijing 102206,
| | - Chun Tang
- From the CAS Key Laboratory of Magnetic Resonance in Biological Systems, State Key Laboratory of Magnetic Resonance and Atomic Molecular Physics, National Center for Magnetic Resonance at Wuhan, Wuhan Institute of Physics and Mathematics of the Chinese Academy of Sciences, Wuhan, Hubei Province 430071,
- the Department of Pharmacology, Zhejiang University School of Medicine, Hangzhou, Zhejiang Province 310058, and
| |
Collapse
|
40
|
Richa T, Ide S, Suzuki R, Ebina T, Kuroda Y. Fast H-DROP: A thirty times accelerated version of H-DROP for interactive SVM-based prediction of helical domain linkers. J Comput Aided Mol Des 2016; 31:237-244. [PMID: 28028736 DOI: 10.1007/s10822-016-9999-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2016] [Accepted: 12/10/2016] [Indexed: 10/20/2022]
Abstract
Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely ~2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/ .
Collapse
Affiliation(s)
- Tambi Richa
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Soichiro Ide
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Ryosuke Suzuki
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan
| | - Teppei Ebina
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan.,Department of Physiology, Graduate school of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yutaka Kuroda
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 12-24-16 Nakamachi, Koganei-shi, Tokyo, 184-8588, Japan.
| |
Collapse
|
41
|
Vo JN, Campbell PR, Mahfuz NN, Ramli R, Pagendam D, Barnard R, Geering ADW. Characterization of the banana streak virus capsid protein and mapping of the immunodominant continuous B-cell epitopes to the surface-exposed N terminus. J Gen Virol 2016; 97:3446-3457. [DOI: 10.1099/jgv.0.000643] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Jenny N. Vo
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, GPO Box 267, Brisbane, Queensland 4001, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Queensland 4072, Australia
- Plant Biosecurity Cooperative Research Centre, LPO Box 5012, Bruce, Australian Capital Territory 2617, Australia
| | - Paul R. Campbell
- Plant Biosecurity Cooperative Research Centre, LPO Box 5012, Bruce, Australian Capital Territory 2617, Australia
- Queensland Department of Agriculture, Fisheries and Forestry, GPO Box 267, Brisbane, Queensland 4001, Australia
| | - Nur N. Mahfuz
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Ras Ramli
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Daniel Pagendam
- CSIRO Mathematics, Informatics and Statistics, Ecosciences Precinct, 41 Boggo Road, Dutton Park, Queensland 4102, Australia
| | - Ross Barnard
- School of Chemistry and Molecular Biosciences, The University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Andrew D. W. Geering
- Plant Biosecurity Cooperative Research Centre, LPO Box 5012, Bruce, Australian Capital Territory 2617, Australia
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, GPO Box 267, Brisbane, Queensland 4001, Australia
| |
Collapse
|
42
|
Faheem M, Martins-de-Sa D, Vidal JFD, Álvares ACM, Brandão-Neto J, Bird LE, Tully MD, von Delft F, Souto BM, Quirino BF, Freitas SM, Barbosa JARG. Functional and structural characterization of a novel putative cysteine protease cell wall-modifying multi-domain enzyme selected from a microbial metagenome. Sci Rep 2016; 6:38031. [PMID: 27934875 PMCID: PMC5146660 DOI: 10.1038/srep38031] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 11/04/2016] [Indexed: 12/13/2022] Open
Abstract
A current metagenomics focus is to interpret and transform collected genomic data into biological information. By combining structural, functional and genomic data we have assessed a novel bacterial protein selected from a carbohydrate-related activity screen in a microbial metagenomic library from Capra hircus (domestic goat) gut. This uncharacterized protein was predicted as a bacterial cell wall-modifying enzyme (CWME) and shown to contain four domains: an N-terminal, a cysteine protease, a peptidoglycan-binding and an SH3 bacterial domain. We successfully cloned, expressed and purified this putative cysteine protease (PCP), which presented autoproteolytic activity and inhibition by protease inhibitors. We observed cell wall hydrolytic activity and ampicillin binding capacity, a characteristic of most bacterial CWME. Fluorimetric binding analysis yielded a Kb of 1.8 × 105 M-1 for ampicillin. Small-angle X-ray scattering (SAXS) showed a maximum particle dimension of 95 Å with a real-space Rg of 28.35 Å. The elongated molecular envelope corroborates the dynamic light scattering (DLS) estimated size. Furthermore, homology modeling and SAXS allowed the construction of a model that explains the stability and secondary structural changes observed by circular dichroism (CD). In short, we report a novel cell wall-modifying autoproteolytic PCP with insight into its biochemical, biophysical and structural features.
Collapse
Affiliation(s)
- Muhammad Faheem
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
- Programa de Pós Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| | - Diogo Martins-de-Sa
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Julia F. D. Vidal
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Alice C. M. Álvares
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - José Brandão-Neto
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot, OX11 0QX, England
| | - Louise E. Bird
- OPPF-UK, Research Complex at Harwell, Rutherford Appleton Laboratory, Oxford, OX11 0FA, United Kingdom
| | - Mark D. Tully
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot, OX11 0QX, England
| | - Frank von Delft
- Diamond Light Source Ltd, Harwell Science and Innovation Campus, Didcot, OX11 0QX, England
- Structural Genomics Consortium, Nuffield Department of Medicine, University of Oxford, Roosevelt Drive, Oxford, OX3 7DQ, UK
- Department of Biochemistry, University of Johannesburg, Auckland Park, 2006, South Africa
| | - Betulia M. Souto
- Embrapa Agroenergia, Parque Estação Biológica - PqEB s/n°, Brasília, DF, 70770-901, Brazil
| | - Betania F. Quirino
- Programa de Pós Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
- Embrapa Agroenergia, Parque Estação Biológica - PqEB s/n°, Brasília, DF, 70770-901, Brazil
| | - Sonia M. Freitas
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - João Alexandre R. G. Barbosa
- Laboratório de Biofísica Molecular, Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
- Programa de Pós Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| |
Collapse
|
43
|
Stojanoski V, Sankaran B, Prasad BVV, Poirel L, Nordmann P, Palzkill T. Structure of the catalytic domain of the colistin resistance enzyme MCR-1. BMC Biol 2016; 14:81. [PMID: 27655155 PMCID: PMC5031297 DOI: 10.1186/s12915-016-0303-0] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 08/31/2016] [Indexed: 11/18/2022] Open
Abstract
Background Due to the paucity of novel antibiotics, colistin has become a last resort antibiotic for treating multidrug resistant bacteria. Colistin acts by binding the lipid A component of lipopolysaccharides and subsequently disrupting the bacterial membrane. The recently identified plasmid-encoded MCR-1 enzyme is the first transmissible colistin resistance determinant and is a cause for concern for the spread of this resistance trait. MCR-1 is a phosphoethanolamine transferase that catalyzes the addition of phosphoethanolamine to lipid A to decrease colistin affinity. Results The structure of the catalytic domain of MCR-1 at 1.32 Å reveals the active site is similar to that of related phosphoethanolamine transferases. Conclusions The putative nucleophile for catalysis, threonine 285, is phosphorylated in cMCR-1 and a zinc is present at a conserved site in addition to three zincs more peripherally located in the active site. As noted for catalytic domains of other phosphoethanolamine transferases, binding sites for the lipid A and phosphatidylethanolamine substrates are not apparent in the cMCR-1 structure, suggesting that they are present in the membrane domain. Electronic supplementary material The online version of this article (doi:10.1186/s12915-016-0303-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vlatko Stojanoski
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Banumathi Sankaran
- Berkeley Center for Structural Biology, Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - B V Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Laurent Poirel
- Department of Medicine, Medical and Molecular Microbiology "Emerging Antibiotic Resistance" Unit and European INSERM Laboratory, IAME, University of Fribourg, Fribourg, Switzerland
| | - Patrice Nordmann
- Department of Medicine, Medical and Molecular Microbiology "Emerging Antibiotic Resistance" Unit and European INSERM Laboratory, IAME, University of Fribourg, Fribourg, Switzerland.,University of Lausanne, University Hospital Center, Lausanne, Switzerland
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Pharmacology, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
44
|
Chatterjee P, Basu S, Zubek J, Kundu M, Nasipuri M, Plewczynski D. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach. J Mol Model 2016; 22:72. [PMID: 26969678 PMCID: PMC4788683 DOI: 10.1007/s00894-016-2933-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 02/17/2016] [Indexed: 01/04/2023]
Abstract
The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.
Collapse
Affiliation(s)
- Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, Kolkata, 700152, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India.
| | - Julian Zubek
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Center of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| | - Mahantapas Kundu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland. .,Faculty of Pharmacy, Medical University of Warsaw, Warsaw, Poland.
| |
Collapse
|
45
|
Belsom A, Schneider M, Fischer L, Brock O, Rappsilber J. Serum Albumin Domain Structures in Human Blood Serum by Mass Spectrometry and Computational Biology. Mol Cell Proteomics 2016; 15:1105-16. [PMID: 26385339 PMCID: PMC4813692 DOI: 10.1074/mcp.m115.048504] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 09/16/2015] [Indexed: 01/12/2023] Open
Abstract
Chemical cross-linking combined with mass spectrometry has proven useful for studying protein-protein interactions and protein structure, however the low density of cross-link data has so far precluded its use in determining structures de novo. Cross-linking density has been typically limited by the chemical selectivity of the standard cross-linking reagents that are commonly used for protein cross-linking. We have implemented the use of a heterobifunctional cross-linking reagent, sulfosuccinimidyl 4,4'-azipentanoate (sulfo-SDA), combining a traditional sulfo-N-hydroxysuccinimide (sulfo-NHS) ester and a UV photoactivatable diazirine group. This diazirine yields a highly reactive and promiscuous carbene species, the net result being a greatly increased number of cross-links compared with homobifunctional, NHS-based cross-linkers. We present a novel methodology that combines the use of this high density photo-cross-linking data with conformational space search to investigate the structure of human serum albumin domains, from purified samples, and in its native environment, human blood serum. Our approach is able to determine human serum albumin domain structures with good accuracy: root-mean-square deviation to crystal structure are 2.8/5.6/2.9 Å (purified samples) and 4.5/5.9/4.8Å (serum samples) for domains A/B/C for the first selected structure; 2.5/4.9/2.9 Å (purified samples) and 3.5/5.2/3.8 Å (serum samples) for the best out of top five selected structures. Our proof-of-concept study on human serum albumin demonstrates initial potential of our approach for determining the structures of more proteins in the complex biological contexts in which they function and which they may require for correct folding. Data are available via ProteomeXchange with identifier PXD001692.
Collapse
Affiliation(s)
- Adam Belsom
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Michael Schneider
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Lutz Fischer
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Oliver Brock
- §Robotics and Biology Laboratory, Technische Universität Berlin, 10587 Berlin, Germany
| | - Juri Rappsilber
- From the ‡Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom; ¶Department of Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany.
| |
Collapse
|
46
|
Yang J, Zhang Y. Protein Structure and Function Prediction Using I-TASSER. CURRENT PROTOCOLS IN BIOINFORMATICS 2015; 52:5.8.1-5.8.15. [PMID: 26678386 PMCID: PMC4871818 DOI: 10.1002/0471250953.bi0508s52] [Citation(s) in RCA: 304] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.
Collapse
Affiliation(s)
- Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- School of Mathematical Sciences, Nankai University, Tianjin, People's Republic of China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
47
|
Xue Z, Jang R, Govindarajoo B, Huang Y, Wang Y. Extending Protein Domain Boundary Predictors to Detect Discontinuous Domains. PLoS One 2015; 10:e0141541. [PMID: 26502173 PMCID: PMC4621036 DOI: 10.1371/journal.pone.0141541] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Accepted: 10/10/2015] [Indexed: 11/18/2022] Open
Abstract
A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.
Collapse
Affiliation(s)
- Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| | - Richard Jang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, United States of America
| | - Yichu Huang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Yan Wang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
- * E-mail: (ZX); (YW)
| |
Collapse
|
48
|
Zhang W, Yang J, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11. Proteins 2015; 84 Suppl 1:76-86. [PMID: 26370505 DOI: 10.1002/prot.24930] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/26/2015] [Accepted: 09/10/2015] [Indexed: 11/12/2022]
Abstract
We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109. .,Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|
49
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
50
|
Yang J, Zhang W, He B, Walker SE, Zhang H, Govindarajoo B, Virtanen J, Xue Z, Shen HB, Zhang Y. Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins 2015; 84 Suppl 1:233-46. [PMID: 26343917 DOI: 10.1002/prot.24918] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 08/13/2015] [Accepted: 08/31/2015] [Indexed: 01/26/2023]
Abstract
We report the structure prediction results of a new composite pipeline for template-based modeling (TBM) in the 11th CASP experiment. Starting from multiple structure templates identified by LOMETS based meta-threading programs, the QUARK ab initio folding program is extended to generate initial full-length models under strong constraints from template alignments. The final atomic models are then constructed by I-TASSER based fragment reassembly simulations, followed by the fragment-guided molecular dynamic simulation and the MQAP-based model selection. It was found that the inclusion of QUARK-TBM simulations as an intermediate modeling step could help improve the quality of the I-TASSER models for both Easy and Hard TBM targets. Overall, the average TM-score of the first I-TASSER model is 12% higher than that of the best LOMETS templates, with the RMSD in the same threading-aligned regions reduced from 5.8 to 4.7 Å. Nevertheless, there are nearly 18% of TBM domains with the templates deteriorated by the structure assembly pipeline, which may be attributed to the errors of secondary structure and domain orientation predictions that propagate through and degrade the procedures of template identification and final model selections. To examine the record of progress, we made a retrospective report of the I-TASSER pipeline in the last five CASP experiments (CASP7-11). The data show no clear progress of the LOMETS threading programs over PSI-BLAST; but obvious progress on structural improvement relative to threading templates was witnessed in recent CASP experiments, which is probably attributed to the integration of the extended ab initio folding simulation with the threading assembly pipeline and the introduction of atomic-level structure refinements following the reduced modeling simulations. Proteins 2016; 84(Suppl 1):233-246. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jianyi Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Wenxuan Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Baoji He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Sara Elizabeth Walker
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Brandon Govindarajoo
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, 48109.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.
| |
Collapse
|