1
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Power KM, Nguyen KC, Silva A, Singh S, Hall DH, Rongo C, Barr MM. NEKL-4 regulates microtubule stability and mitochondrial health in ciliated neurons. J Cell Biol 2024; 223:e202402006. [PMID: 38767515 PMCID: PMC11104396 DOI: 10.1083/jcb.202402006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/10/2024] [Accepted: 05/06/2024] [Indexed: 05/22/2024] Open
Abstract
Ciliopathies are often caused by defects in the ciliary microtubule core. Glutamylation is abundant in cilia, and its dysregulation may contribute to ciliopathies and neurodegeneration. Mutation of the deglutamylase CCP1 causes infantile-onset neurodegeneration. In C. elegans, ccpp-1 loss causes age-related ciliary degradation that is suppressed by a mutation in the conserved NEK10 homolog nekl-4. NEKL-4 is absent from cilia, yet it negatively regulates ciliary stability via an unknown, glutamylation-independent mechanism. We show that NEKL-4 was mitochondria-associated. Additionally, nekl-4 mutants had longer mitochondria, a higher baseline mitochondrial oxidation state, and suppressed ccpp-1∆ mutant lifespan extension in response to oxidative stress. A kinase-dead nekl-4(KD) mutant ectopically localized to ccpp-1∆ cilia and rescued degenerating microtubule doublet B-tubules. A nondegradable nekl-4(PEST∆) mutant resembled the ccpp-1∆ mutant with dye-filling defects and B-tubule breaks. The nekl-4(PEST∆) Dyf phenotype was suppressed by mutation in the depolymerizing kinesin-8 KLP-13/KIF19A. We conclude that NEKL-4 influences ciliary stability by activating ciliary kinesins and promoting mitochondrial homeostasis.
Collapse
Affiliation(s)
- Kaiden M. Power
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Ken C. Nguyen
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andriele Silva
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, USA
| | - Shaneen Singh
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, USA
| | - David H. Hall
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Christopher Rongo
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, USA
| | - Maureen M. Barr
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| |
Collapse
|
3
|
Morehead A, Liu J, Cheng J. Protein structure accuracy estimation using geometry-complete perceptron networks. Protein Sci 2024; 33:e4932. [PMID: 38380738 PMCID: PMC10880424 DOI: 10.1002/pro.4932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 01/05/2024] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
Estimating the accuracy of protein structural models is a critical task in protein bioinformatics. The need for robust methods in the estimation of protein model accuracy (EMA) is prevalent in the field of protein structure prediction, where computationally-predicted structures need to be screened rapidly for the reliability of the positions predicted for each of their amino acid residues and their overall quality. Current methods proposed for EMA are either coupled tightly to existing protein structure prediction methods or evaluate protein structures without sufficiently leveraging the rich, geometric information available in such structures to guide accuracy estimation. In this work, we propose a geometric message passing neural network referred to as the geometry-complete perceptron network for protein structure EMA (GCPNet-EMA), where we demonstrate through rigorous computational benchmarks that GCPNet-EMA's accuracy estimations are 47% faster and more than 10% (6%) more correlated with ground-truth measures of per-residue (per-target) structural accuracy compared to baseline state-of-the-art methods for tertiary (multimer) structure EMA including AlphaFold 2. The source code and data for GCPNet-EMA are available on GitHub, and a public web server implementation is freely available.
Collapse
Affiliation(s)
- Alex Morehead
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
4
|
Power KM, Nguyen KC, Silva A, Singh S, Hall DH, Rongo C, Barr MM. NEKL-4 regulates microtubule stability and mitochondrial health in C. elegans ciliated neurons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.14.580304. [PMID: 38405845 PMCID: PMC10888866 DOI: 10.1101/2024.02.14.580304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Ciliopathies are often caused by defects in the ciliary microtubule core. Glutamylation is abundant in cilia, and its dysregulation may contribute to ciliopathies and neurodegeneration. Mutation of the deglutamylase CCP1 causes infantile-onset neurodegeneration. In C. elegans, ccpp-1 loss causes age-related ciliary degradation that is suppressed by mutation in the conserved NEK10 homolog nekl-4. NEKL-4 is absent from cilia, yet negatively regulates ciliary stability via an unknown, glutamylation-independent mechanism. We show that NEKL-4 was mitochondria-associated. nekl-4 mutants had longer mitochondria, a higher baseline mitochondrial oxidation state, and suppressed ccpp-1 mutant lifespan extension in response to oxidative stress. A kinase-dead nekl-4(KD) mutant ectopically localized to ccpp-1 cilia and rescued degenerating microtubule doublet B-tubules. A nondegradable nekl-4(PESTΔ) mutant resembled the ccpp-1 mutant with dye filling defects and B-tubule breaks. The nekl-4(PESTΔ) Dyf phenotype was suppressed by mutation in the depolymerizing kinesin-8 KLP-13/KIF19A. We conclude that NEKL-4 influences ciliary stability by activating ciliary kinesins and promoting mitochondrial homeostasis.
Collapse
Affiliation(s)
- Kaiden M Power
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, United States of America
| | - Ken C Nguyen
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Andriele Silva
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, United States of America
| | - Shaneen Singh
- Department of Biology, Brooklyn College of the City University of New York, Brooklyn, NY, United States of America
| | - David H Hall
- Center for C. elegans Anatomy, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Christopher Rongo
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, United States of America
| | - Maureen M Barr
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, United States of America
| |
Collapse
|
5
|
Saeed A, Alharazi T, Alshaghdali K, Rezgui R, Elnaem I, Alreshidi BAT, Tasleem M, Saeed M. Targeting GluR3 in Depression and Alzheimer's Disease: Novel Compounds and Therapeutic Prospects. J Alzheimers Dis 2024; 97:1299-1312. [PMID: 38277291 DOI: 10.3233/jad-230821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
BACKGROUND The present study investigates the interrelated pathophysiology of depression and Alzheimer's disease (AD), with the objective of elucidating common underlying mechanisms. OBJECTIVE Our objective is to identify previously undiscovered biogenic compounds from the NuBBE database that specifically interact with GluR3. This study examines the bidirectional association between depression and AD, specifically focusing on the role of depression as a risk factor in the onset and progression of the disease. METHODS In this study, we utilize pharmacokinetics, homology modeling, and molecular docking-based virtual screening techniques to examine the GluR3 AMPA receptor subunit. RESULTS The compounds, namely ZINC000002558953, ZINC000001228056, ZINC000000187911, ZINC000003954487, and ZINC000002040988, exhibited favorable pharmacokinetic profiles and drug-like characteristics, displaying high binding affinities to the GluR3 binding pocket. CONCLUSIONS These findings suggest that targeting GluR3 could hold promise for the development of therapies for depression and AD. Further validation through in vitro, in vivo, and clinical studies is necessary to explore the potential of these compounds as lead candidates for potent and selective GluR3 inhibitors. The shared molecular mechanisms between depression and AD provide an opportunity for novel treatment approaches that address both conditions simultaneously.
Collapse
Affiliation(s)
- Amir Saeed
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
- Department of Medical Microbiology, Faculty of Medical Laboratory Sciences, University of Medical Sciences & Technology, Khartoum, Sudan
| | - Talal Alharazi
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Khalid Alshaghdali
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Raja Rezgui
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, University of Hail, Hail, Saudi Arabia
| | - Ibtihag Elnaem
- Department of oral and maxillofacial surgery and diagnostic science College of Dentistry, University of Hail, Hail, Saudi Arabia
| | | | - Munazzah Tasleem
- School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Mohd Saeed
- Department of Biology, College of Science, University of Hail, Hail, Saudi Arabia
| |
Collapse
|
6
|
Liu J, Liu D, He G, Zhang G. Estimating protein complex model accuracy based on ultrafast shape recognition and deep learning in CASP15. Proteins 2023; 91:1861-1870. [PMID: 37553848 DOI: 10.1002/prot.26564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/10/2023]
Abstract
This article reports and analyzes the results of protein complex model accuracy estimation by our methods (DeepUMQA3 and GraphGPSM) in the 15th Critical Assessment of techniques for protein Structure Prediction (CASP15). The new deep learning-based multimeric complex model accuracy estimation methods are proposed based on the ensemble of three-level features coupling with deep residual/graph neural networks. For the input multimeric complex model, we describe it from three levels: overall complex features, intra-monomer features, and inter-monomer features. We designed an overall ultrafast shape recognition (USR) to characterize the relationship between local residues and the overall complex topology, and an inter-monomer USR to characterize the relationship between the residues of one monomer and the topology of other monomers. DeepUMQA3 (Group name: GuijunLab-RocketX) ranked first in the interface residue accuracy estimation of CASP15. The Pearson correlation between the interface residue Local Distance Difference Test (lDDT) predicted by DeepUMQA3 and the real lDDT is 0.570, the only method that exceeds 0.5. Among the top 5 methods, DeepUMQA3 achieved the highest Pearson correlation of lDDT on 25 out of 39 targets. GraphGPSM (Group name: GuijunLab-PAthreader) has TM-score Pearson correlations greater than 0.9 on 14 targets, showing a good ability to estimate the overall fold accuracy. The DeepUMQA3 server is available at http://zhanglab-bioinf.com/DeepUMQA/ and the GraphGPSM server is available at http://zhanglab-bioinf.com/GraphGPSM/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guangxing He
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
7
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
8
|
Roy S, Ben-Hur A. Protein quality assessment with a loss function designed for high-quality decoys. FRONTIERS IN BIOINFORMATICS 2023; 3:1198218. [PMID: 37915563 PMCID: PMC10616882 DOI: 10.3389/fbinf.2023.1198218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 09/29/2023] [Indexed: 11/03/2023] Open
Abstract
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
Collapse
Affiliation(s)
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
9
|
Tasleem M, El-Sayed AAAA, Hussein WM, Alrehaily A. Pseudomonas putida Metallothionein: Structural Analysis and Implications of Sustainable Heavy Metal Detoxification in Madinah. TOXICS 2023; 11:864. [PMID: 37888714 PMCID: PMC10611128 DOI: 10.3390/toxics11100864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/09/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Heavy metals, specifically cadmium (Cd) and lead (Pb), contaminating water bodies of Madinah (Saudi Arabia), is a significant environmental concern that necessitates prompt action. Madinah is exposed to toxic metals from multiple sources, such as tobacco, fresh and canned foods, and industrial activities. This influx of toxic metals presents potential hazards to both human health and the surrounding environment. The aim of this study is to explore the viability of utilizing metallothionein from Pseudomonas putida (P. putida) as a method of bioremediation to mitigate the deleterious effects of pollution attributable to Pb and Cd. The use of various computational approaches, such as physicochemical assessments, structural modeling, molecular docking, and protein-protein interaction investigations, has enabled us to successfully identify the exceptional metal-binding properties that metallothionein displays in P. putida. The identification of specific amino acid residues, namely GLU30 and GLN21, is crucial in understanding their pivotal role in facilitating the coordination of lead and cadmium. In addition, post-translational modifications present opportunities for augmenting the capacity to bind metals, thereby creating possibilities for focused engineering. The intricate web of interactions among proteins serves to emphasize the protein's participation in essential cellular mechanisms, thereby emphasizing its potential contributions to detoxification pathways. The present study establishes a strong basis for forthcoming experimental inquiries, offering potential novel approaches in bioremediation to tackle the issue of heavy metal contamination. Metallothionein from P. putida presents a highly encouraging potential as a viable remedy for environmental remediation, as it is capable of proficiently alleviating the detrimental consequences related to heavy metal pollution.
Collapse
Affiliation(s)
- Munazzah Tasleem
- School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
| | | | - Wesam M. Hussein
- Chemistry Department, Faculty of Science, Islamic University of Madinah, Madinah 42351, Saudi Arabia
| | - Abdulwahed Alrehaily
- Biology Department, Faculty of Science, Islamic University of Madinah, Madinah 42351, Saudi Arabia
| |
Collapse
|
10
|
Tasleem M, Hussein WM, El-Sayed AAAA, Alrehaily A. An In Silico Bioremediation Study to Identify Essential Residues of Metallothionein Enhancing the Bioaccumulation of Heavy Metals in Pseudomonas aeruginosa. Microorganisms 2023; 11:2262. [PMID: 37764106 PMCID: PMC10537150 DOI: 10.3390/microorganisms11092262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023] Open
Abstract
Microorganisms are ubiquitously present in the environment and exert significant influence on numerous natural phenomena. The soil and groundwater systems, precipitation, and effluent outfalls from factories, refineries, and waste treatment facilities are all sources of heavy metal contamination. For example, Madinah, Saudi Arabia, has alarmingly high levels of lead and cadmium. The non-essential minerals cadmium (Cd) and lead (Pb) have been linked to damage to vital organs. Bioremediation is an essential component in the process of cleaning up polluted soil and water where biological agents such as bacteria are used to remove the contaminants. It is demonstrated that Pseudomonas aeruginosa (P. aeruginosa) isolated from activated sludge was able to remove Cd and Pb from water. The protein sequence of metallothionein from P. aeruginosa was retrieved to explore it for physicoparameters, orthologs, domain, family, motifs, and conserved residues. The homology structure was generated, and models were validated. Docking of the best model with the heavy metals was carried out to inspect the intramolecular interactions. The target protein was found to belong to the "metallothionein_pro" family, containing six motifs, and showed a close orthologous relationship with other heavy metal-resistant bacteria. The best model was generated by Phyre2. In this study, three key residues of metallothionein were identified that participate in heavy metal (Pb and Cd) binding, viz., Ala33, Ser34, and Glu59. In addition, the study provides an essential basis to explore protein engineering for the optimum use of metallothionein protein to reduce/remove heavy metals from the environment.
Collapse
Affiliation(s)
- Munazzah Tasleem
- School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China;
| | - Wesam M. Hussein
- Chemistry Department, Faculty of Science, Islamic University of Madinah, Madinah 42351, Saudi Arabia;
| | | | - Abdulwahed Alrehaily
- Biology Department, Faculty of Science, Islamic University of Madinah, Madinah 42351, Saudi Arabia;
| |
Collapse
|
11
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
12
|
He G, Liu J, Liu D, Zhang G. GraphGPSM: a global scoring model for protein structure using graph neural networks. Brief Bioinform 2023:bbad219. [PMID: 37317619 DOI: 10.1093/bib/bbad219] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 04/14/2023] [Accepted: 05/22/2023] [Indexed: 06/16/2023] Open
Abstract
The scoring models used for protein structure modeling and ranking are mainly divided into unified field and protein-specific scoring functions. Although protein structure prediction has made tremendous progress since CASP14, the modeling accuracy still cannot meet the requirements to a certain extent. Especially, accurate modeling of multi-domain and orphan proteins remains a challenge. Therefore, an accurate and efficient protein scoring model should be developed urgently to guide the protein structure folding or ranking through deep learning. In this work, we propose a protein structure global scoring model based on equivariant graph neural network (EGNN), named GraphGPSM, to guide protein structure modeling and ranking. We construct an EGNN architecture, and a message passing mechanism is designed to update and transmit information between nodes and edges of the graph. Finally, the global score of the protein model is output through a multilayer perceptron. Residue-level ultrafast shape recognition is used to describe the relationship between residues and the overall structure topology, and distance and direction encoded by Gaussian radial basis functions are designed to represent the overall topology of the protein backbone. These two features are combined with Rosetta energy terms, backbone dihedral angles and inter-residue distance and orientations to represent the protein model and embedded into the nodes and edges of the graph neural network. The experimental results on the CASP13, CASP14 and CAMEO test sets show that the scores of our developed GraphGPSM have a strong correlation with the TM-score of the models, which are significantly better than those of the unified field score function REF2015 and the state-of-the-art local lDDT-based scoring models ModFOLD8, ProQ3D and DeepAccNet, etc. The modeling experimental results on 484 test proteins demonstrate that GraphGPSM can greatly improve the modeling accuracy. GraphGPSM is further used to model 35 orphan proteins and 57 multi-domain proteins. The results show that the average TM-score of the models predicted by GraphGPSM is 13.2 and 7.1% higher than that of the models predicted by AlphaFold2. GraphGPSM also participates in CASP15 and achieves competitive performance in global accuracy estimation.
Collapse
Affiliation(s)
- Guangxing He
- College of Information Engineering, Zhejiang University of Technology
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
13
|
Zhang P, Xia C, Shen HB. High-accuracy protein model quality assessment using attention graph neural networks. Brief Bioinform 2023; 24:7025462. [PMID: 36736352 DOI: 10.1093/bib/bbac614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/23/2022] [Accepted: 12/12/2022] [Indexed: 02/05/2023] Open
Abstract
Great improvement has been brought to protein tertiary structure prediction through deep learning. It is important but very challenging to accurately rank and score decoy structures predicted by different models. CASP14 results show that existing quality assessment (QA) approaches lag behind the development of protein structure prediction methods, where almost all existing QA models degrade in accuracy when the target is a decoy of high quality. How to give an accurate assessment to high-accuracy decoys is particularly useful with the available of accurate structure prediction methods. Here we propose a fast and effective single-model QA method, QATEN, which can evaluate decoys only by their topological characteristics and atomic types. Our model uses graph neural networks and attention mechanisms to evaluate global and amino acid level scores, and uses specific loss functions to constrain the network to focus more on high-precision decoys and protein domains. On the CASP14 evaluation decoys, QATEN performs better than other QA models under all correlation coefficients when targeting average LDDT. QATEN shows promising performance when considering only high-accuracy decoys. Compared to the embedded evaluation modules of predicted ${C}_{\alpha^{-}} RMSD$ (pRMSD) in RosettaFold and predicted LDDT (pLDDT) in AlphaFold2, QATEN is complementary and capable of achieving better evaluation on some decoy structures generated by AlphaFold2 and RosettaFold. These results suggest that the new QATEN approach can be used as a reliable independent assessment algorithm for high-accuracy protein structure decoys.
Collapse
Affiliation(s)
- Peidong Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Chunqiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|
14
|
Liu J, Zhao K, Zhang G. Improved model quality assessment using sequence and structural information by enhanced deep neural networks. Brief Bioinform 2023; 24:6865134. [PMID: 36460624 DOI: 10.1093/bib/bbac507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/02/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
Protein model quality assessment plays an important role in protein structure prediction, protein design and drug discovery. In this work, DeepUMQA2, a substantially improved version of DeepUMQA for protein model quality assessment, is proposed. First, sequence features containing protein co-evolution information and structural features reflecting family information are extracted to complement model-dependent features. Second, a novel backbone network based on triangular multiplication update and axial attention mechanism is designed to enhance information exchange between inter-residue pairs. On CASP13 and CASP14 datasets, the performance of DeepUMQA2 increases by 20.5 and 20.4% compared with DeepUMQA, respectively (measured by top 1 loss). Moreover, on the three-month CAMEO dataset (11 March to 04 June 2022), DeepUMQA2 outperforms DeepUMQA by 15.5% (measured by local AUC0,0.2) and ranks first among all competing server methods in CAMEO blind test. Experimental results show that DeepUMQA2 outperforms state-of-the-art model quality assessment methods, such as ProQ3D-LDDT, ModFOLD8, and DeepAccNet and DeepUMQA2 can select more suitable best models than state-of-the-art protein structure methods, such as AlphaFold2, RoseTTAFold and I-TASSER, provided themselves.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology
| |
Collapse
|
15
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
16
|
Murph M, Singh S, Schvarzstein M. A combined in silico and in vivo approach to the structure-function annotation of SPD-2 provides mechanistic insight into its functional diversity. Cell Cycle 2022; 21:1958-1979. [PMID: 35678569 PMCID: PMC9415446 DOI: 10.1080/15384101.2022.2078458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 04/10/2022] [Accepted: 05/04/2022] [Indexed: 11/03/2022] Open
Abstract
Centrosomes are organelles that function as hubs of microtubule nucleation and organization, with key roles in organelle positioning, asymmetric cell division, ciliogenesis, and signaling. Aberrant centrosome number, structure or function is linked to neurodegenerative diseases, developmental abnormalities, ciliopathies, and tumor development. A major regulator of centrosome biogenesis and function in C. elegans is the conserved Spindle-defective protein 2 (SPD-2), a homolog of the human CEP-192 protein. CeSPD-2 is required for centrosome maturation, centriole duplication, spindle assembly and possibly cell polarity establishment. Despite its importance, the specific molecular mechanism of CeSPD-2 regulation and function is poorly understood. Here, we combined computational analysis with cell biology approaches to uncover possible structure-function relationships of CeSPD-2 that may shed mechanistic light on its function. Domain prediction analysis corroborated and refined previously identified coiled-coils and ASH (Aspm-SPD-2 Hydin) domains and identified new domains: a GEF domain, an Ig-like domain, and a PDZ-like domain. In addition to these predicted structural features, CeSPD-2 is also predicted to be intrinsically disordered. Surface electrostatic maps identified a large basic region unique to the ASH domain of CeSPD-2. This basic region overlaps with most of the residues predicted to be involved in protein-protein interactions. In vivo, ASH::GFP localized to centrosomes and centrosome-associated microtubules. Our analysis groups ASH domains, PapD, Usher chaperone domains, and Major Sperm Protein (MSP) domains into a single superfold within the larger Immunoglobulin superfamily. This study lays the groundwork for designing rational hypothesis-based experiments to uncover the mechanisms of CeSPD-2 function in vivo.Abbreviations: AIR, Aurora kinase; ASH, Aspm-SPD-2 Hydin; ASP, Abnormal Spindle Protein; ASPM, Abnormal Spindle-like Microcephaly-associated Protein; CC, coiled-coil; CDK, Cyclin-dependent Kinase; Ce, Caenorhabditis elegans; CEP, Centrosomal Protein; CPAP, centrosomal P4.1-associated protein; D, Drosophila; GAP, GTPase activating protein; GEF, GTPase guanine nucleotide exchange factor; Hs, Homo sapiens/Human; Ig, Immunoglobulin; MAP, Microtubule associated Protein; MSP, Major Sperm Protein; MDP, Major Sperm Domain-Containing Protein; OCRL-1, Golgi endocytic trafficking protein Inositol polyphosphate 5-phosphatase; PAR, abnormal embryonic PARtitioning of the cytosol; PCM, Pericentriolar material; PCMD, pericentriolar matrix deficient; PDZ, PSD95/Dlg-1/zo-1; PLK, Polo like kinase; RMSD, Root Mean Square Deviation; SAS, Spindle assembly abnormal proteins; SPD, Spindle-defective protein; TRAPP, TRAnsport Protein Particle; Xe, Xenopus; ZYG, zygote defective protein.
Collapse
Affiliation(s)
- Mikaela Murph
- Department of Biology, City University of New York, Brooklyn College, New York, NY, USA
| | - Shaneen Singh
- Department of Biology, City University of New York, Brooklyn College, New York, NY, USA
- Department of Biology, The Graduate Center at City University of New York, New York, NY, USA
- Department Biochemistry, The Graduate Center at City University of New York, New York, NY, USA
| | - Mara Schvarzstein
- Department of Biology, City University of New York, Brooklyn College, New York, NY, USA
- Department of Biology, The Graduate Center at City University of New York, New York, NY, USA
- Department Biochemistry, The Graduate Center at City University of New York, New York, NY, USA
| |
Collapse
|
17
|
Kurniawan J, Ishida T. Protein Model Quality Estimation Using Molecular Dynamics Simulation. ACS OMEGA 2022; 7:24274-24281. [PMID: 35874260 PMCID: PMC9301944 DOI: 10.1021/acsomega.2c01475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The estimation of protein model quality remains a challenging task and is important for protein structural model utilization. In the last decade, existing methods that rely on machine learning to deep learning have been developed and shown progressive improvement. Despite utilizing more sophisticated techniques and introducing new features, none of these methods employ explicit protein structure stability information. Hypothetically, protein model quality might be indicated by its structural stability in an in silico system disclosed by the structural difference from its initial structure. One of the possible methods to exploit such information is by implementing molecular dynamics simulations that have shown successful applications in many research fields. We present a novel approach by introducing explicit protein structure stability information using molecular dynamics simulation. Despite using only simple features, small data with no training process required, and a short molecular dynamics simulation time, our method shows comparable performance to the state-of-the-art deep learning-based method.
Collapse
|
18
|
Monroe L, Kihara D. Using steered molecular dynamic tension for assessing quality of computational protein structure models. J Comput Chem 2022; 43:1140-1150. [PMID: 35475517 DOI: 10.1002/jcc.26876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/16/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022]
Abstract
The native structures of proteins, except for notable exceptions of intrinsically disordered proteins, in general take their most stable conformation in the physiological condition to maintain their structural framework so that their biological function can be properly carried out. Experimentally, the stability of a protein can be measured by several means, among which the pulling experiment using the atomic force microscope (AFM) stands as a unique method. AFM directly measures the resistance from unfolding, which can be quantified from the observed force-extension profile. It has been shown that key features observed in an AFM pulling experiment can be well reproduced by computational molecular dynamics simulations. Here, we applied computational pulling for estimating the accuracy of computational protein structure models under the hypothesis that the structural stability would positively correlated with the accuracy, i.e. the closeness to the native, of a model. We used in total 4929 structure models for 24 target proteins from the Critical Assessment of Techniques of Structure Prediction (CASP) and investigated if the magnitude of the break force, that is, the force required to rearrange the model's structure, from the force profile was sufficient information for selecting near-native models. We found that near-native models can be successfully selected by examining their break forces suggesting that high break force indeed indicates high stability of models. On the other hand, there were also near-native models that had relatively low peak forces. The mechanisms of the stability exhibited by the break forces were explored and discussed.
Collapse
Affiliation(s)
- Lyman Monroe
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA.,Department of Computer Science, Purdue University, West Lafayette, Indiana, USA.,Purdue Center for Cancer Research, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
19
|
Chen X, Cheng J. DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network. BMC Bioinformatics 2022; 23:141. [PMID: 35439931 PMCID: PMC9019949 DOI: 10.1186/s12859-022-04683-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Estimation of the accuracy (quality) of protein structural models is important for both prediction and use of protein structural models. Deep learning methods have been used to integrate protein structure features to predict the quality of protein models. Inter-residue distances are key information for predicting protein's tertiary structures and therefore have good potentials to predict the quality of protein structural models. However, few methods have been developed to fully take advantage of predicted inter-residue distance maps to estimate the accuracy of a single protein structural model. RESULT We developed an attentive 2D convolutional neural network (CNN) with channel-wise attention to take only a raw difference map between the inter-residue distance map calculated from a single protein model and the distance map predicted from the protein sequence as input to predict the quality of the model. The network comprises multiple convolutional layers, batch normalization layers, dense layers, and Squeeze-and-Excitation blocks with attention to automatically extract features relevant to protein model quality from the raw input without using any expert-curated features. We evaluated DISTEMA's capability of selecting the best models for CASP13 targets in terms of ranking loss of GDT-TS score. The ranking loss of DISTEMA is 0.079, lower than several state-of-the-art single-model quality assessment methods. CONCLUSION This work demonstrates that using raw inter-residue distance information with deep learning can predict the quality of protein structural models reasonably well. DISTEMA is freely at https://github.com/jianlin-cheng/DISTEMA.
Collapse
Affiliation(s)
- Xiao Chen
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri Columbia, Columbia, MO 65211 USA
| |
Collapse
|
20
|
Guo SS, Liu J, Zhou XG, Zhang GJ. DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning. Bioinformatics 2022; 38:1895-1903. [PMID: 35134108 DOI: 10.1093/bioinformatics/btac056] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/26/2021] [Accepted: 01/27/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Protein model quality assessment is a key component of protein structure prediction. In recent research, the voxelization feature was used to characterize the local structural information of residues, but it may be insufficient for describing residue-level topological information. Design features that can further reflect residue-level topology when combined with deep learning methods are therefore crucial to improve the performance of model quality assessment. RESULTS We developed a deep-learning method, DeepUMQA, based on Ultrafast Shape Recognition (USR) for the residue-level single-model quality assessment. In the framework of the deep residual neural network, the residue-level USR feature was introduced to describe the topological relationship between the residue and overall structure by calculating the first moment of a set of residue distance sets and then combined with 1D, 2D and voxelization features to assess the quality of the model. Experimental results on the CASP13, CASP14 test datasets and CAMEO blind test show that USR could supplement the voxelization features to comprehensively characterize residue structure information and significantly improve model assessment accuracy. The performance of DeepUMQA ranks among the top during the state-of-the-art single-model quality assessment methods, including ProQ2, ProQ3, ProQ3D, Ornate, VoroMQA, ProteinGCN, ResNetQA, QDeep, GraphQA, ModFOLD6, ModFOLD7, ModFOLD8, QMEAN3, QMEANDisCo3 and DeepAccNet. AVAILABILITY AND IMPLEMENTATION The DeepUMQA server is freely available at http://zhanglab-bioinf.com/DeepUMQA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sai-Sai Guo
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
21
|
Philip J, Örd M, Silva A, Singh S, Diffley JFX, Remus D, Loog M, Ikui AE. Cdc6 is sequentially regulated by PP2A-Cdc55, Cdc14, and Sic1 for origin licensing in S. cerevisiae. eLife 2022; 11:e74437. [PMID: 35142288 PMCID: PMC8830886 DOI: 10.7554/elife.74437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 12/15/2021] [Indexed: 01/31/2023] Open
Abstract
Cdc6, a subunit of the pre-replicative complex (pre-RC), contains multiple regulatory cyclin-dependent kinase (Cdk1) consensus sites, SP or TP motifs. In Saccharomyces cerevisiae, Cdk1 phosphorylates Cdc6-T7 to recruit Cks1, the Cdk1 phospho-adaptor in S phase, for subsequent multisite phosphorylation and protein degradation. Cdc6 accumulates in mitosis and is tightly bound by Clb2 through N-terminal phosphorylation in order to prevent premature origin licensing and degradation. It has been extensively studied how Cdc6 phosphorylation is regulated by the cyclin-Cdk1 complex. However, a detailed mechanism on how Cdc6 phosphorylation is reversed by phosphatases has not been elucidated. Here, we show that PP2ACdc55 dephosphorylates Cdc6 N-terminal sites to release Clb2. Cdc14 dephosphorylates the C-terminal phospho-degron, leading to Cdc6 stabilization in mitosis. In addition, Cdk1 inhibitor Sic1 releases Clb2·Cdk1·Cks1 from Cdc6 to load Mcm2-7 on the chromatin upon mitotic exit. Thus, pre-RC assembly and origin licensing are promoted by phosphatases through the attenuation of distinct Cdk1-dependent Cdc6 inhibitory mechanisms.
Collapse
Affiliation(s)
- Jasmin Philip
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | | | - Andriele Silva
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | - Shaneen Singh
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| | | | - Dirk Remus
- Memorial Sloan-Kettering Cancer CenterNew YorkUnited States
| | | | - Amy E Ikui
- The PhD Program in Biochemistry, The Graduate Center, CUNYBrooklynUnited States
- Brooklyn CollegeBrooklynUnited States
| |
Collapse
|
22
|
Hippe K, Lilley C, William Berkenpas J, Chandana Pocha C, Kishaba K, Ding H, Hou J, Si D, Cao R. ZoomQA: residue-level protein model accuracy estimation with machine learning on sequential and 3D structural features. Brief Bioinform 2022; 23:bbab384. [PMID: 34553747 PMCID: PMC8499977 DOI: 10.1093/bib/bbab384] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 08/02/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The Estimation of Model Accuracy problem is a cornerstone problem in the field of Bioinformatics. As of CASP14, there are 79 global QA methods, and a minority of 39 residue-level QA methods with very few of them working on protein complexes. Here, we introduce ZoomQA, a novel, single-model method for assessing the accuracy of a tertiary protein structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering the change in chemical and physical features of a fragment structure (a portion of a protein within a radius $r$ of the target amino acid) as the radius of contact increases. Fourteen physical and chemical properties of amino acids are used to build a comprehensive representation of every residue within a protein and grade their placement within the protein as a whole. Moreover, we have shown the potential of ZoomQA to identify problematic regions of the SARS-CoV-2 protein complex. RESULTS We benchmark ZoomQA on CASP14, and it outperforms other state-of-the-art local QA methods and rivals state of the art QA methods in global prediction metrics. Our experiment shows the efficacy of these new features and shows that our method is able to match the performance of other state-of-the-art methods without the use of homology searching against databases or PSSM matrices. AVAILABILITY http://zoomQA.renzhitech.com.
Collapse
Affiliation(s)
- Kyle Hippe
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, USA
| | - Cade Lilley
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, USA
| | | | | | - Kiyomi Kishaba
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, USA
| | - Hui Ding
- Center for Informational Biology at University of Electronic Science and Technology of China
| | | | - Dong Si
- University of Washington Bothell, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, USA
| |
Collapse
|
23
|
Jiang Z, Wang C, Wu Z, Chen K, Yang W, Deng H, Song H, Zhou X. Enzymatic deamination of the epigenetic nucleoside N6-methyladenosine regulates gene expression. Nucleic Acids Res 2021; 49:12048-12068. [PMID: 34850126 PMCID: PMC8643624 DOI: 10.1093/nar/gkab1124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 10/20/2021] [Accepted: 11/16/2021] [Indexed: 12/26/2022] Open
Abstract
N6-methyladenosine (m6A) modification is the most extensively studied epigenetic modification due to its crucial role in regulating an array of biological processes. Herein, Bsu06560, formerly annotated as an adenine deaminase derived from Bacillus subtilis 168, was recognized as the first enzyme capable of metabolizing the epigenetic nucleoside N6-methyladenosine. A model of Bsu06560 was constructed, and several critical residues were putatively identified via mutational screening. Two mutants, F91L and Q150W, provided a superiorly enhanced conversion ratio of adenosine and N6-methyladenosine. The CRISPR-Cas9 system generated Bsu06560-knockout, F91L, and Q150W mutations from the B. subtilis 168 genome. Transcriptional profiling revealed a higher global gene expression level in BS-F91L and BS-Q150W strains with enhanced N6-methyladenosine deaminase activity. The differentially expressed genes were categorized using GO, COG, KEGG and verified through RT-qPCR. This study assessed the crucial roles of Bsu06560 in regulating adenosine and N6-methyladenosine metabolism, which influence a myriad of biological processes. This is the first systematic research to identify and functionally annotate an enzyme capable of metabolizing N6-methyladenosine and highlight its significant roles in regulation of bacterial metabolism. Besides, this study provides a novel method for controlling gene expression through the mutations of critical residues.
Collapse
Affiliation(s)
- Zhuoran Jiang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Chao Wang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Zixin Wu
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Kun Chen
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Wei Yang
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Hexiang Deng
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Heng Song
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| | - Xiang Zhou
- The Institute of Advanced Studies, and Key Laboratory of Biomedical Polymers-Ministry of Education, College of Chemistry and Molecular Sciences, Wuhan University, 40072 Wuhan, P.R. China
| |
Collapse
|
24
|
Wang W, Wang J, Li Z, Xu D, Shang Y. MUfoldQA_G: High-accuracy protein model QA via retraining and transformation. Comput Struct Biotechnol J 2021; 19:6282-6290. [PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/10/2021] [Accepted: 11/14/2021] [Indexed: 11/21/2022] Open
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.
Collapse
Affiliation(s)
- Wenbo Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Junlin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Zhaoyu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yi Shang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
25
|
Ye L, Wu P, Peng Z, Gao J, Liu J, Yang J. Improved estimation of model quality using predicted inter-residue distance. Bioinformatics 2021; 37:3752-3759. [PMID: 34473228 DOI: 10.1093/bioinformatics/btab632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/27/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. RESULTS Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/QDistance. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Peikun Wu
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Jianzhao Gao
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
26
|
Liu J, Wu T, Guo Z, Hou J, Cheng J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins 2021; 90:58-72. [PMID: 34291486 PMCID: PMC8671168 DOI: 10.1002/prot.26186] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/21/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]
Abstract
Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
27
|
Igashov I, Pavlichenko N, Grudinin S. Spherical convolutions on molecular graphs for protein model quality assessment. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abf856] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstract
Processing information on three-dimensional (3D) objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose spherical graph convolutional network that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on critical assessment of structure prediction benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems. The method is available at https://team.inria.fr/nano-d/software/s-gcn/.
Collapse
|
28
|
Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep 2021; 11:10943. [PMID: 34035363 PMCID: PMC8149836 DOI: 10.1038/s41598-021-90303-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open
Abstract
The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.
Collapse
|
29
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
30
|
Jiang V, Khare SD, Banta S. Computational structure prediction provides a plausible mechanism for electron transfer by the outer membrane protein Cyc2 from Acidithiobacillus ferrooxidans. Protein Sci 2021; 30:1640-1652. [PMID: 33969560 DOI: 10.1002/pro.4106] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/30/2021] [Accepted: 05/03/2021] [Indexed: 12/14/2022]
Abstract
Cyc2 is the key protein in the outer membrane of Acidithiobacillus ferrooxidans that mediates electron transfer between extracellular inorganic iron and the intracellular central metabolism. This cytochrome c is specific for iron and interacts with periplasmic proteins to complete a reversible electron transport chain. A structure of Cyc2 has not yet been characterized experimentally. Here we describe a structural model of Cyc2, and associated proteins, to highlight a plausible mechanism for the ferrous iron electron transfer chain. A comparative modeling protocol specific for trans membrane beta barrel (TMBB) proteins in acidophilic conditions (pH ~ 2) was applied to the primary sequence of Cyc2. The proposed structure has three main regimes: Extracellular loops exposed to low-pH conditions, a TMBB, and an N-terminal cytochrome-like region within the periplasmic space. The Cyc2 model was further refined by identifying likely iron and heme docking sites. This represents the first computational model of Cyc2 that accounts for the membrane microenvironment and the acidity in the extracellular matrix. This approach can be used to model other TMBBs which can be critical for chemolithotrophic microbial growth.
Collapse
Affiliation(s)
- Virginia Jiang
- Department of Chemical Engineering, Columbia University in the City of New York, New York, New York, USA
| | - Sagar D Khare
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Scott Banta
- Department of Chemical Engineering, Columbia University in the City of New York, New York, New York, USA
| |
Collapse
|
31
|
Baldassarre F, Menéndez Hurtado D, Elofsson A, Azizpour H. GraphQA: protein model quality assessment using graph convolutional networks. Bioinformatics 2021; 37:360-366. [PMID: 32780838 PMCID: PMC8058777 DOI: 10.1093/bioinformatics/btaa714] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 07/03/2020] [Accepted: 08/05/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency. Results GraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated. Availability and implementation PyTorch implementation, datasets, experiments and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Federico Baldassarre
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
| | - David Menéndez Hurtado
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Arne Elofsson
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Hossein Azizpour
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
- To whom correspondence should be addressed.
| |
Collapse
|
32
|
Akaberi D, Båhlström A, Chinthakindi PK, Nyman T, Sandström A, Järhult JD, Palanisamy N, Lundkvist Å, Lennerstrand J. Targeting the NS2B-NS3 protease of tick-borne encephalitis virus with pan-flaviviral protease inhibitors. Antiviral Res 2021; 190:105074. [PMID: 33872674 DOI: 10.1016/j.antiviral.2021.105074] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 03/07/2021] [Accepted: 03/30/2021] [Indexed: 12/20/2022]
Abstract
Tick-borne encephalitis (TBE) is a severe neurological disorder caused by tick-borne encephalitis virus (TBEV), a member of the Flavivirus genus. Currently, two vaccines are available in Europe against TBEV. However, TBE cases have been rising in Sweden for the past twenty years, and thousands of cases are reported in Europe, emphasizing the need for antiviral treatments against this virus. The NS2B-NS3 protease is essential for flaviviral life cycle and has been studied as a target for the design of inhibitors against several well-known flaviviruses, but not TBEV. In the present study, Compound 86, a known tripeptidic inhibitor of dengue (DENV), West Nile (WNV) and Zika (ZIKV) proteases, was predicted to be active against TBEV protease using a combination of in silico techniques. Further, Compound 86 was found to inhibit recombinant TBEV protease with an IC50 = 0.92 μM in the in vitro enzymatic assay. Additionally, two more peptidic analogues were synthetized and they displayed inhibitory activities against both TBEV and ZIKV proteases. In particular, Compound 104 inhibited ZIKV protease with an IC50 = 0.25 μM. These compounds represent the first reported inhibitors of TBEV protease to date and provides valuable information for the further development of TBEV as well as pan-flavivirus protease inhibitors.
Collapse
Affiliation(s)
- Dario Akaberi
- Department of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala University, Uppsala, Sweden
| | - Amanda Båhlström
- The Beijer Laboratory, Department of Medicinal Chemistry, Drug Design and Discovery, Uppsala University, Uppsala, Sweden
| | - Praveen K Chinthakindi
- The Beijer Laboratory, Department of Medicinal Chemistry, Drug Design and Discovery, Uppsala University, Uppsala, Sweden
| | - Tomas Nyman
- Protein Science Facility, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| | - Anja Sandström
- The Beijer Laboratory, Department of Medicinal Chemistry, Drug Design and Discovery, Uppsala University, Uppsala, Sweden
| | - Josef D Järhult
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, Sweden
| | | | - Åke Lundkvist
- Department of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala University, Uppsala, Sweden
| | - Johan Lennerstrand
- Department of Medical Sciences, Clinical Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
33
|
Takei Y, Ishida T. P3CMQA: Single-Model Quality Assessment Using 3DCNN with Profile-Based Features. Bioengineering (Basel) 2021; 8:bioengineering8030040. [PMID: 33808604 PMCID: PMC8003382 DOI: 10.3390/bioengineering8030040] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 11/16/2022] Open
Abstract
Model quality assessment (MQA), which selects near-native structures from structure models, is an important process in protein tertiary structure prediction. The three-dimensional convolution neural network (3DCNN) was applied to the task, but the performance was comparable to existing methods because it used only atom-type features as the input. Thus, we added sequence profile-based features, which are also used in other methods, to improve the performance. We developed a single-model MQA method for protein structures based on 3DCNN using sequence profile-based features, namely, P3CMQA. Performance evaluation using a CASP13 dataset showed that profile-based features improved the assessment performance, and the proposed method was better than currently available single-model MQA methods, including the previous 3DCNN-based method. We also implemented a web-interface of the method to make it more user-friendly.
Collapse
Affiliation(s)
- Yuma Takei
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo 152-8550, Japan;
- Correspondence:
| |
Collapse
|
34
|
Shuvo MH, Bhattacharya S, Bhattacharya D. QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks. Bioinformatics 2021; 36:i285-i291. [PMID: 32657397 PMCID: PMC7355297 DOI: 10.1093/bioinformatics/btaa455] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Protein model quality estimation, in many ways, informs protein structure prediction. Despite their tight coupling, existing model quality estimation methods do not leverage inter-residue distance information or the latest technological breakthrough in deep learning that has recently revolutionized protein structure prediction. RESULTS We present a new distance-based single-model quality estimation method called QDeep by harnessing the power of stacked deep residual neural networks (ResNets). Our method first employs stacked deep ResNets to perform residue-level ensemble error classifications at multiple predefined error thresholds, and then combines the predictions from the individual error classifiers for estimating the quality of a protein structural model. Experimental results show that our method consistently outperforms existing state-of-the-art methods including ProQ2, ProQ3, ProQ3D, ProQ4, 3DCNN, MESHI, and VoroMQA in multiple independent test datasets across a wide-range of accuracy measures; and that predicted distance information significantly contributes to the improved performance of QDeep. AVAILABILITY AND IMPLEMENTATION https://github.com/Bhattacharya-Lab/QDeep. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA.,Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
35
|
Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nat Commun 2021; 12:1340. [PMID: 33637700 PMCID: PMC7910447 DOI: 10.1038/s41467-021-21511-x] [Citation(s) in RCA: 112] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 01/18/2021] [Indexed: 11/22/2022] Open
Abstract
We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.
Collapse
Affiliation(s)
- Naozumi Hiranuma
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Washington, WA, USA
| | - Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Minkyung Baek
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - Justas Dauparas
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Washington, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Washington, WA, USA.
| |
Collapse
|
36
|
Igashov I, Olechnovič L, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep convolutional neural network built on 3D Voronoi tessellation of protein structures. Bioinformatics 2021; 37:2332-2339. [PMID: 33620450 DOI: 10.1093/bioinformatics/btab118] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 01/08/2021] [Accepted: 02/22/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Effective use of evolutionary information has recently led to tremendous progress in computational prediction of three-dimensional (3D) structures of proteins and their complexes. Despite the progress, the accuracy of predicted structures tends to vary considerably from case to case. Since the utility of computational models depends on their accuracy, reliable estimates of deviation between predicted and native structures are of utmost importance. RESULTS For the first time, we present a deep convolutional neural network (CNN) constructed on a Voronoi tessellation of 3D molecular structures. Despite the irregular data domain, our data representation allows us to efficiently introduce both convolution and pooling operations and train the network in an end-to-end fashion without precomputed descriptors. The resultant model, VoroCNN, predicts local qualities of 3D protein folds. The prediction results are competitive to state of the art and superior to the previous 3D CNN architectures built for the same task. We also discuss practical applications of VoroCNN, for example, in recognition of protein binding interfaces. AVAILABILITY The model, data, and evaluation tests are available at https://team.inria.fr/nano-d/software/vorocnn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ilia Igashov
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Liment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Maria Kadukova
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.,Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, LT 10257, Lithuania
| | - Sergei Grudinin
- Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
37
|
Abstract
Biologists are increasingly aware of the importance of protein structure in revealing function. The computational tools now exist which allow researchers to model unknown proteins simply on the basis of their primary sequence. However, for the non-specialist bioinformatician, there is a dazzling array of terminology, acronyms, and competing computer software available for this process. This review is intended to highlight the key stages of computational protein structure prediction, as well as explain the reasons behind some of the procedures and list some established workarounds for common pitfalls. Thereafter follows a review of five one-stop servers for start-to-finish structure prediction.
Collapse
|
38
|
Ouyang J, Huang N, Jiang Y. A single-model quality assessment method for poor quality protein structure. BMC Bioinformatics 2020; 21:157. [PMID: 32334508 PMCID: PMC7183596 DOI: 10.1186/s12859-020-3499-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 04/15/2020] [Indexed: 11/13/2022] Open
Abstract
Background Quality assessment of protein tertiary structure prediction models, in which structures of the best quality are selected from decoys, is a major challenge in protein structure prediction, and is crucial to determine a model’s utility and potential applications. Estimating the quality of a single model predicts the model’s quality based on the single model itself. In general, the Pearson correlation value of the quality assessment method increases in tandem with an increase in the quality of the model pool. However, there is no consensus regarding the best method to select a few good models from the poor quality model pool. Results We introduce a novel single-model quality assessment method for poor quality models that uses simple linear combinations of six features. We perform weighted search and linear regression on a large dataset of models from the 12th Critical Assessment of Protein Structure Prediction (CASP12) and benchmark the results on CASP13 models. We demonstrate that our method achieves outstanding performance on poor quality models. Conclusions According to results of poor protein structure assessment based on six features, contact prediction and relying on fewer prediction features can improve selection accuracy.
Collapse
|
39
|
Ugarte-Alvarez O, Muñoz-López P, Moreno-Vargas LM, Prada-Gracia D, Mateos-Chávez AA, Becerra-Báez EI, Luria-Pérez R. Cell-Permeable Bak BH3 Peptide Induces Chemosensitization of Hematologic Malignant Cells. JOURNAL OF ONCOLOGY 2020; 2020:2679046. [PMID: 33312200 PMCID: PMC7721494 DOI: 10.1155/2020/2679046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/04/2020] [Accepted: 07/13/2020] [Indexed: 12/24/2022]
Abstract
Hematologic malignancies such as leukemias and lymphomas are among the leading causes of pediatric cancer death worldwide, and although survival rates have improved with conventional treatments, the development of drug-resistant cancer cells may lead to patient relapse and limited possibilities of a cure. Drug-resistant cancer cells in these hematologic neoplasms are induced by overexpression of the antiapoptotic B-cell lymphoma 2 (Bcl-2) protein families, such as Bcl-XL, Bcl-2, and Mcl-1. We have previously shown that peptides from the BH3 domain of the proapoptotic Bax protein that also belongs to the Bcl-2 family may antagonize the antiapoptotic activity of the Bcl-2 family proteins, restore apoptosis, and induce chemosensitization of tumor cells. Furthermore, cell-permeable Bax BH3 peptides also elicit antitumor activity and extend survival in a murine xenograft model of human B non-Hodgkin's lymphoma. However, the activity of the BH3 peptides of the proapoptotic Bak protein of the Bcl-2 family against these hematologic malignant cells requires further characterization. In this study, we report the ability of the cell-permeable Bak BH3 peptide to restore apoptosis and induce chemosensitization of acute lymphoblastic leukemia and non-Hodgkin's lymphoma cell lines, and this event is enhanced with the coadministration of cell-permeable Bax BH3 peptide and represents an attractive approach to improve the patient outcomes with relapsed or refractory hematological malignant cells.
Collapse
Affiliation(s)
- Omar Ugarte-Alvarez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Paola Muñoz-López
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
- Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Liliana Marisol Moreno-Vargas
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Diego Prada-Gracia
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Armando Alfredo Mateos-Chávez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| | - Elayne Irene Becerra-Báez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
- Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Rosendo Luria-Pérez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City 06720, Mexico
| |
Collapse
|
40
|
Guo Z, Hou J, Cheng J. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 2020; 89:207-217. [PMID: 32893403 DOI: 10.1002/prot.26007] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/07/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein secondary structure (alpha-helix, beta-strand and coil) is a crucial step for protein inter-residue contact prediction and ab initio tertiary structure prediction. In a previous study, we developed a deep belief network-based protein secondary structure method (DNSS1) and successfully advanced the prediction accuracy beyond 80%. In this work, we developed multiple advanced deep learning architectures (DNSS2) to further improve secondary structure prediction. The major improvements over the DNSS1 method include (a) designing and integrating six advanced one-dimensional deep convolutional/recurrent/residual/memory/fractal/inception networks to predict 3-state and 8-state secondary structure, and (b) using more sensitive profile features inferred from Hidden Markov model (HMM) and multiple sequence alignment (MSA). Most of the deep learning architectures are novel for protein secondary structure prediction. DNSS2 was systematically benchmarked on independent test data sets with eight state-of-art tools and consistently ranked as one of the best methods. Particularly, DNSS2 was tested on the protein targets of 2018 CASP13 experiment and achieved the Q3 score of 81.62%, SOV score of 72.19%, and Q8 score of 73.28%. DNSS2 is freely available at: https://github.com/multicom-toolbox/DNSS2.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
41
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
42
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
43
|
Liu T, Wang Z. MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinformatics 2020; 21:246. [PMID: 32631256 PMCID: PMC7336608 DOI: 10.1186/s12859-020-3383-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
44
|
Olechnovič K, Venclovas Č. VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes. Nucleic Acids Res 2020; 47:W437-W442. [PMID: 31073605 PMCID: PMC6602437 DOI: 10.1093/nar/gkz367] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/19/2019] [Accepted: 05/05/2019] [Indexed: 01/12/2023] Open
Abstract
The VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the local scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
45
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
46
|
Mateos-Chávez AA, Muñoz-López P, Becerra-Báez EI, Flores-Martínez LF, Prada-Gracia D, Moreno-Vargas LM, Baay-Guzmán GJ, Juárez-Hernández U, Chávez-Munguía B, Cabrera-Muñóz L, Luria-Pérez R. Live Attenuated Salmonella enterica Expressing and Releasing Cell-Permeable Bax BH3 Peptide Through the MisL Autotransporter System Elicits Antitumor Activity in a Murine Xenograft Model of Human B Non-hodgkin's Lymphoma. Front Immunol 2019; 10:2562. [PMID: 31798573 PMCID: PMC6874163 DOI: 10.3389/fimmu.2019.02562] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 10/16/2019] [Indexed: 01/01/2023] Open
Abstract
The survival of patients with non-Hodgkin's lymphoma (NHL) has substantially improved with current treatments. Nevertheless, the appearance of drug-resistant cancer cells leads to patient relapse. It is therefore necessary to find new antitumor therapies that can completely eradicate transformed cells. Chemotherapy-resistant cancer cells are characterized by the overexpression of members of the anti-apoptotic B-cell lymphoma 2 (Bcl-2) protein family, such as Bcl-XL, Bcl-2, and Mcl-1. We have recently shown that peptides derived from the BH3 domain of the pro-apoptotic Bax protein may antagonize the anti-apoptotic activity of the Bcl-2 family proteins, restore apoptosis, and induce chemosensitization of tumor cells. In this study, we investigated the feasibility of releasing this peptide into the tumor microenvironment using live attenuated Salmonella enterica, which has proven to be an ally in cancer therapy due to its high affinity for tumor tissue, its ability to activate the innate and adaptive antitumor immune responses, and its potential use as a delivery system of heterologous molecules. Thus, we expressed and released the cell-permeable Bax BH3 peptide from the surface of Salmonella enterica serovar Typhimurium SL3261 through the MisL autotransporter system. We demonstrated that this recombinant bacterium significantly decreased the viability and increased the apoptosis of Ramos cells, a human B NHL cell line. Indeed, the intravenous administration of this recombinant Salmonella enterica elicited antitumor activity and extended survival in a xenograft NHL murine model. This antitumor activity was mediated by apoptosis and an inflammatory response. Our approach may represent an eventual alternative to treat relapsing or refractory NHL.
Collapse
Affiliation(s)
- Armando Alfredo Mateos-Chávez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| | - Paola Muñoz-López
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico.,Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico
| | - Elayne Irene Becerra-Báez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico.,Posgrado en Biomedicina y Biotecnología Molecular, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico
| | - Luis Fernando Flores-Martínez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| | - Diego Prada-Gracia
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| | - Liliana Marisol Moreno-Vargas
- Research Unit on Computational Biology and Drug Design, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| | | | - Uriel Juárez-Hernández
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico.,Department of Molecular Biomedicine, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City, Mexico
| | - Bibiana Chávez-Munguía
- Department of Infectomics and Molecular Pathogenesis, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City, Mexico
| | - Lourdes Cabrera-Muñóz
- Department of Clinical and Experimental Pathology, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| | - Rosendo Luria-Pérez
- Unit of Investigative Research on Oncological Diseases, Children's Hospital of Mexico Federico Gomez, Mexico City, Mexico
| |
Collapse
|
47
|
Sato R, Ishida T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS One 2019; 14:e0221347. [PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/05/2019] [Indexed: 11/23/2022] Open
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366–0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.
Collapse
Affiliation(s)
- Rin Sato
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
48
|
Won J, Baek M, Monastyrskyy B, Kryshtafovych A, Seok C. Assessment of protein model structure accuracy estimation in CASP13: Challenges in the era of deep learning. Proteins 2019; 87:1351-1360. [PMID: 31436360 DOI: 10.1002/prot.25804] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 08/08/2019] [Accepted: 08/19/2019] [Indexed: 12/20/2022]
Abstract
Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue-wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.
Collapse
Affiliation(s)
- Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Minkyung Baek
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | | | | | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
49
|
Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019; 87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]
Abstract
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Myong-Ho Choe
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kun-Sop Han
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK
| | - David Menéndez-Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division, Linköping University, Linköping, Sweden
| |
Collapse
|
50
|
Conover M, Staples M, Si D, Sun M, Cao R. AngularQA: Protein Model Quality Assessment with LSTM Networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2019. [DOI: 10.1515/cmb-2019-0001] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Abstract
Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA
Collapse
Affiliation(s)
- Matthew Conover
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Max Staples
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Dong Si
- Division of Computing and Software Systems , University of Washington-Bothell , Bothell , WA 98011 , USA
| | - Miao Sun
- JingChi, Sunnyvale , CA 94089 , USA
| | - Renzhi Cao
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| |
Collapse
|