1
|
Tarafder S, Bhattacharya D. lociPARSE: A Locality-aware Invariant Point Attention Model for Scoring RNA 3D Structures. J Chem Inf Model 2024; 64:8655-8664. [PMID: 39523843 PMCID: PMC11600500 DOI: 10.1021/acs.jcim.4c01621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 10/17/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently available machine learning-based approaches. Here, we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root-mean-square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
2
|
Tarafder S, Bhattacharya D. lociPARSE: a locality-aware invariant point attention model for scoring RNA 3D structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.04.565599. [PMID: 37961488 PMCID: PMC10635153 DOI: 10.1101/2023.11.04.565599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
A scoring function that can reliably assess the accuracy of a 3D RNA structural model in the absence of experimental structure is not only important for model evaluation and selection but also useful for scoring-guided conformational sampling. However, high-fidelity RNA scoring has proven to be difficult using conventional knowledge-based statistical potentials and currently-available machine learning-based approaches. Here we present lociPARSE, a locality-aware invariant point attention architecture for scoring RNA 3D structures. Unlike existing machine learning methods that estimate superposition-based root mean square deviation (RMSD), lociPARSE estimates Local Distance Difference Test (lDDT) scores capturing the accuracy of each nucleotide and its surrounding local atomic environment in a superposition-free manner, before aggregating information to predict global structural accuracy. Tested on multiple datasets including CASP15, lociPARSE significantly outperforms existing statistical potentials (rsRNASP, cgRNASP, DFIRE-RNA, and RASP) and machine learning methods (ARES and RNA3DCNN) across complementary assessment metrics. lociPARSE is freely available at https://github.com/Bhattacharya-Lab/lociPARSE.
Collapse
Affiliation(s)
- Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, 24061, USA
| | | |
Collapse
|
3
|
Bernard C, Postic G, Ghannay S, Tahi F. RNAdvisor: a comprehensive benchmarking tool for the measure and prediction of RNA structural model quality. Brief Bioinform 2024; 25:bbae064. [PMID: 38436560 PMCID: PMC10939302 DOI: 10.1093/bib/bbae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/30/2024] [Accepted: 02/02/2024] [Indexed: 03/05/2024] Open
Abstract
RNA is a complex macromolecule that plays central roles in the cell. While it is well known that its structure is directly related to its functions, understanding and predicting RNA structures is challenging. Assessing the real or predictive quality of a structure is also at stake with the complex 3D possible conformations of RNAs. Metrics have been developed to measure model quality while scoring functions aim at assigning quality to guide the discrimination of structures without a known and solved reference. Throughout the years, many metrics and scoring functions have been developed, and no unique assessment is used nowadays. Each developed assessment method has its specificity and might be complementary to understanding structure quality. Therefore, to evaluate RNA 3D structure predictions, it would be important to calculate different metrics and/or scoring functions. For this purpose, we developed RNAdvisor, a comprehensive automated software that integrates and enhances the accessibility of existing metrics and scoring functions. In this paper, we present our RNAdvisor tool, as well as state-of-the-art existing metrics, scoring functions and a set of benchmarks we conducted for evaluating them. Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr.
Collapse
Affiliation(s)
- Clement Bernard
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Guillaume Postic
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN - CNRS/Université Paris-Saclay, France, 91400 Orsay, France
| | - Fariza Tahi
- Université Paris Saclay, Univ Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
4
|
Lian X, Fan K, Qin X, Liu Y. Amalgamated Pharmacoinformatics Study to Investigate the Mechanism of Xiao Jianzhong Tang against Chronic Atrophic Gastritis. Curr Comput Aided Drug Des 2024; 20:598-615. [PMID: 37475552 DOI: 10.2174/1573409919666230720141115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 05/24/2023] [Accepted: 06/14/2023] [Indexed: 07/22/2023]
Abstract
BACKGROUND Traditional Chinese medicine (TCM) Xiao Jianzhong Tang (XJZ) has a favorable efficacy in the treatment of chronic atrophic gastritis (CAG). However, its pharmacological mechanism has not been fully explained. OBJECTIVE The purpose of this study was to find the potential mechanism of XJZ in the treatment of CAG using pharmacocoinformatics approaches. METHODS Network pharmacology was used to screen out the key compounds and key targets, MODELLER and GNNRefine were used to repair and refine proteins, Autodock vina was employed to perform molecular docking, Δ Lin_F9XGB was used to score the docking results, and Gromacs was used to perform molecular dynamics simulations (MD). RESULTS Kaempferol, licochalcone A, and naringenin, were obtained as key compounds, while AKT1, MAPK1, MAPK14, RELA, STAT1, and STAT3 were acquired as key targets. Among docking results, 12 complexes scored greater than five. They were run for 50ns MD. The free binding energy of AKT1-licochalcone A and MAPK1-licochalcone A was less than -15 kcal/mol and AKT1-naringenin and STAT3-licochalcone A was less than -9 kcal/mol. These complexes were crucial in XJZ treating CAG. CONCLUSION Our findings suggest that licochalcone A could act on AKT1, MAPK1, and STAT3, and naringenin could act on AKT1 to play the potential therapeutic effect on CAG. The work also provides a powerful approach to interpreting the complex mechanism of TCM through the amalgamation of network pharmacology, deep learning-based protein refinement, molecular docking, machine learning-based binding affinity estimation, MD simulations, and MM-PBSA-based estimation of binding free energy.
Collapse
Affiliation(s)
- Xu Lian
- Modern Research Center for Traditional Chinese Medicine, The Key Laboratory of Chemical Biology and Molecular Engineering of Ministry of Education, Shanxi University, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
- Key Laboratory of Effective Substances Research and Utilization in TCM of Shanxi Province, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
| | - Kaidi Fan
- Modern Research Center for Traditional Chinese Medicine, The Key Laboratory of Chemical Biology and Molecular Engineering of Ministry of Education, Shanxi University, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
- Key Laboratory of Effective Substances Research and Utilization in TCM of Shanxi Province, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
| | - Xuemei Qin
- Modern Research Center for Traditional Chinese Medicine, The Key Laboratory of Chemical Biology and Molecular Engineering of Ministry of Education, Shanxi University, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
- Key Laboratory of Effective Substances Research and Utilization in TCM of Shanxi Province, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
| | - Yuetao Liu
- Modern Research Center for Traditional Chinese Medicine, The Key Laboratory of Chemical Biology and Molecular Engineering of Ministry of Education, Shanxi University, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
- Key Laboratory of Effective Substances Research and Utilization in TCM of Shanxi Province, No. 92, Wucheng Road, Taiyuan, 030006, Shanxi, P.R. China
| |
Collapse
|
5
|
Wu F, Wu L, Radev D, Xu J, Li SZ. Integration of pre-trained protein language models into geometric deep learning networks. Commun Biol 2023; 6:876. [PMID: 37626165 PMCID: PMC10457366 DOI: 10.1038/s42003-023-05133-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/11/2023] [Indexed: 08/27/2023] Open
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
Collapse
Affiliation(s)
- Fang Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Lirong Wu
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China
| | - Dragomir Radev
- Department of Computer Science, Yale University, New Haven, CT, 06511, USA
| | - Jinbo Xu
- Institute of AI Industry Research, Tsinghua University, Haidian Street, 100084, Beijing, China
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Stan Z Li
- AI Research and Innovation Laboratory, Westlake University, 310030, Hangzhou, China.
| |
Collapse
|
6
|
Leal CS, Carvalho CAM. In Silico Physicochemical Characterization of Fusion Proteins from Emerging Amazonian Arboviruses. Life (Basel) 2023; 13:1687. [PMID: 37629544 PMCID: PMC10455688 DOI: 10.3390/life13081687] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/31/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023] Open
Abstract
Mayaro (MAYV), Saint Louis encephalitis (SLEV), and Oropouche (OROV) viruses are neglected members of the three main families of arboviruses with medical relevance that circulate in the Amazon region as etiological agents of outbreaks of febrile illnesses in humans. As enveloped viruses, MAYV, SLEV, and OROV largely depend on their class II fusion proteins (E1, E, and Gc, respectively) for entry into the host cell. Since many aspects of the structural biology of such proteins remain unclear, the present study aimed at physicochemically characterizing them by an in silico approach. The complete amino acid sequences of MAYV E1, SLEV E, and OROV Gc proteins derived by conceptual translation from annotated coding regions in the reference sequence genome of the respective viruses were obtained from the NCBI Protein database in the FASTA format and then submitted to the ClustalO, Protcalc, Pepstats, Predator, Proscan, PCprof, Phyre2, and 3Drefine web servers for the determination of sequence identities, the estimation of residual properties, the prediction of secondary structures, the identification of potential post-translational modifications, the recognition of antigenic propensities, and the modeling/refinement of three-dimensional structures. Sequence identities were 20.44%, 18.82%, and 13.70% between MAYV/SLEV, SLEV/OROV, and MAYV/OROV fusion proteins, respectively. As for the residual properties, MAYV E1 and SLEV E proteins showed a predominance of the non-polar profile (56% and 55% of the residues, respectively), whereas the OROV Gc protein showed a predominance of the polar profile (52% of the residues). Regarding predicted secondary structures, MAYV E1 and SLEV E proteins showed fewer alpha-helices (16.51% and 15.17%, respectively) than beta-sheets (21.79% and 25.15%, respectively), while the opposite was observed in the OROV Gc protein (20.39% alpha-helices and 12.14% beta-sheets). Regarding post-translational modifications, MAYV E1, SLEV E, and OROV Gc proteins showed greater relative potential for protein kinase C phosphorylation, N-myristoylation, and casein kinase II phosphorylation, respectively. Finally, antigenic propensities were higher in the N-terminus half than in the C-terminus half of these three proteins, whose three-dimensional structures revealed three distinctive domains. In conclusion, MAYV E1 and SLEV E proteins were found to share more physicochemical characteristics with each other than the OROV Gc protein, although they are all grouped under the same class of viral fusion proteins.
Collapse
Affiliation(s)
| | - Carlos Alberto M. Carvalho
- Graduate Program in Parasite Biology in the Amazon, Center for Biological and Health Sciences, University of Pará State, Belém 66095-662, PA, Brazil;
| |
Collapse
|
7
|
Wang X, Yu S, Lou E, Tan YL, Tan ZJ. RNA 3D Structure Prediction: Progress and Perspective. Molecules 2023; 28:5532. [PMID: 37513407 PMCID: PMC10386116 DOI: 10.3390/molecules28145532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/05/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Ribonucleic acid (RNA) molecules play vital roles in numerous important biological functions such as catalysis and gene regulation. The functions of RNAs are strongly coupled to their structures or proper structure changes, and RNA structure prediction has been paid much attention in the last two decades. Some computational models have been developed to predict RNA three-dimensional (3D) structures in silico, and these models are generally composed of predicting RNA 3D structure ensemble, evaluating near-native RNAs from the structure ensemble, and refining the identified RNAs. In this review, we will make a comprehensive overview of the recent advances in RNA 3D structure modeling, including structure ensemble prediction, evaluation, and refinement. Finally, we will emphasize some insights and perspectives in modeling RNA 3D structures.
Collapse
Affiliation(s)
- Xunxun Wang
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - En Lou
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Ya-Lan Tan
- School of Bioengineering and Health, Wuhan Textile University, Wuhan 430200, China
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430200, China
| | - Zhi-Jie Tan
- Department of Physics, Key Laboratory of Artificial Micro & Nano-Structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| |
Collapse
|
8
|
Guerler A, Baker D, van den Beek M, Gruening B, Bouvier D, Coraor N, Shank SD, Zehr JD, Schatz MC, Nekrutenko A. Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy. BMC Bioinformatics 2023; 24:263. [PMID: 37353753 PMCID: PMC10288729 DOI: 10.1186/s12859-023-05389-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 06/15/2023] [Indexed: 06/25/2023] Open
Abstract
BACKGROUND Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.
Collapse
Affiliation(s)
- Aysam Guerler
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Dannon Baker
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Bjoern Gruening
- Department of Bioinformatics, Freiburg University, Freiburg, Germany
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Stephen D Shank
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Jordan D Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| |
Collapse
|
9
|
Tan YL, Wang X, Yu S, Zhang B, Tan ZJ. cgRNASP: coarse-grained statistical potentials with residue separation for RNA structure evaluation. NAR Genom Bioinform 2023; 5:lqad016. [PMID: 36879898 PMCID: PMC9985339 DOI: 10.1093/nargab/lqad016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/21/2023] [Accepted: 02/03/2023] [Indexed: 03/07/2023] Open
Abstract
Knowledge-based statistical potentials are very important for RNA 3-dimensional (3D) structure prediction and evaluation. In recent years, various coarse-grained (CG) and all-atom models have been developed for predicting RNA 3D structures, while there is still lack of reliable CG statistical potentials not only for CG structure evaluation but also for all-atom structure evaluation at high efficiency. In this work, we have developed a series of residue-separation-based CG statistical potentials at different CG levels for RNA 3D structure evaluation, namely cgRNASP, which is composed of long-ranged and short-ranged interactions by residue separation. Compared with the newly developed all-atom rsRNASP, the short-ranged interaction in cgRNASP was involved more subtly and completely. Our examinations show that, the performance of cgRNASP varies with CG levels and compared with rsRNASP, cgRNASP has similarly good performance for extensive types of test datasets and can have slightly better performance for the realistic dataset-RNA-Puzzles dataset. Furthermore, cgRNASP is strikingly more efficient than all-atom statistical potentials/scoring functions, and can be apparently superior to other all-atom statistical potentials and scoring functions trained from neural networks for the RNA-Puzzles dataset. cgRNASP is available at https://github.com/Tan-group/cgRNASP.
Collapse
Affiliation(s)
- Ya-Lan Tan
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China.,Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Xunxun Wang
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Shixiong Yu
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan 430073, China
| | - Zhi-Jie Tan
- Department of Physics and Key Laboratory of Artificial Micro & Nano-structures of Education, School of Physics and Technology, Wuhan University, Wuhan 430072, China
| |
Collapse
|
10
|
Bodie NM, Hashimoto R, Connolly D, Chu J, Takayama K, Uhal BD. Design of a chimeric ACE-2/Fc-silent fusion protein with ultrahigh affinity and neutralizing capacity for SARS-CoV-2 variants. Antib Ther 2023; 6:59-74. [PMID: 36741194 PMCID: PMC9889962 DOI: 10.1093/abt/tbad001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/14/2022] [Accepted: 01/03/2023] [Indexed: 01/22/2023] Open
Abstract
Background As SARS-CoV-2 continues to mutate into Variants of Concern (VOC), there is growing and urgent need to develop effective antivirals to combat COVID-19. Monoclonal antibodies developed earlier are no longer capable of effectively neutralizing currently active VOCs. This report describes the design of variant-agnostic chimeric molecules consisting of an Angiotensin-Converting Enzyme 2 (ACE-2) domain mutated to retain ultrahigh affinity binding to a wide variety of SARS-CoV-2 variants, coupled to an Fc-silent immunoglobulin domain that eliminates antibody-dependent enhancement and extends biological half-life. Methods Molecular modeling, Surrogate Viral Neutralization tests (sVNTs) and infection studies of human airway organoid cultures were performed with synthetic chimeras, SARS-CoV-2 spike protein mimics and SARS-CoV-2 Omicron variants B.1.1.214, BA.1, BA.2 and BA.5. Results ACE-2 mutations L27, V34 and E90 resulted in ultrahigh affinity binding of the LVE-ACE-2 domain to the widest variety of VOCs, with KDs of 93 pM and 73 pM for binding to the Alpha B1.1.7 and Omicron B.1.1.529 variants, and notably, 78fM, 133fM and 1.81pM affinities to the Omicron BA.2, BA2.75 and BQ.1.1 subvariants, respectively. sVNT assays revealed titers of ≥4.9 ng/ml, for neutralization of recombinant viral proteins corresponding to the Alpha, Delta and Omicron variants. The values above were obtained with LVE-ACE-2/mAB chimeras containing the FcRn-binding Y-T-E sequence which extends biological half-life 3-4-fold. Conclusions The ACE-2-mutant/Fc silent fusion proteins described have ultrahigh affinity to a wide variety of SARS-CoV-2 variants including Omicron. It is proposed that these chimeric ACE-2/mABs will constitute variant-agnostic and cost-effective prophylactics against SARS-CoV-2, particularly when administered nasally.
Collapse
Affiliation(s)
- Neil M Bodie
- Paradigm Immunotherapeutics Inc., Monrovia, CA 91016, USA
| | - Rina Hashimoto
- Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto 6068507, Japan
| | - David Connolly
- College of Osteopathic Medicine, Department of Medicine, Michigan State University, East Lansing, MI 48824, USA
| | - Jennifer Chu
- Innovation Lab, ACROBiosystems, 1 Innovation Way, Newark, DE 19711, USA
| | - Kazuo Takayama
- To whom correspondence should be addressed. Bruce D. Uhal, Department of Physiology, Michigan State University, 3197 Biomedical and Physical Sciences Building, 567 Wilson Road, East Lansing, MI 48824, USA. and Kazuo Takayama, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto 6068507, Japan.
| | - Bruce D Uhal
- To whom correspondence should be addressed. Bruce D. Uhal, Department of Physiology, Michigan State University, 3197 Biomedical and Physical Sciences Building, 567 Wilson Road, East Lansing, MI 48824, USA. and Kazuo Takayama, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto 6068507, Japan.
| |
Collapse
|
11
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
12
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
13
|
Bitton M, Keasar C. Estimation of model accuracy by a unique set of features and tree-based regressor. Sci Rep 2022; 12:14074. [PMID: 35982086 PMCID: PMC9388490 DOI: 10.1038/s41598-022-17097-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 07/20/2022] [Indexed: 11/26/2022] Open
Abstract
Computationally generated models of protein structures bridge the gap between the practically negligible price tag of sequencing and the high cost of experimental structure determination. By providing a low-cost (and often free) partial alternative to experimentally determined structures, these models help biologists design and interpret their experiments. Obviously, the more accurate the models the more useful they are. However, methods for protein structure prediction generate many structural models of various qualities, necessitating means for the estimation of their accuracy. In this work we present MESHI_consensus, a new method for the estimation of model accuracy. The method uses a tree-based regressor and a set of structural, target-based, and consensus-based features. The new method achieved high performance in the EMA (Estimation of Model Accuracy) track of the recent CASP14 community-wide experiment (https://predictioncenter.org/casp14/index.cgi). The tertiary structure prediction track of that experiment revealed an unprecedented leap in prediction performance by a single prediction group/method, namely AlphaFold2. This achievement would inevitably have a profound impact on the field of protein structure prediction, including the accuracy estimation sub-task. We conclude this manuscript with some speculations regarding the future role of accuracy estimation in a new era of accurate protein structure prediction.
Collapse
Affiliation(s)
- Mor Bitton
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| | - Chen Keasar
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| |
Collapse
|
14
|
Kurniawan J, Ishida T. Protein Model Quality Estimation Using Molecular Dynamics Simulation. ACS OMEGA 2022; 7:24274-24281. [PMID: 35874260 PMCID: PMC9301944 DOI: 10.1021/acsomega.2c01475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The estimation of protein model quality remains a challenging task and is important for protein structural model utilization. In the last decade, existing methods that rely on machine learning to deep learning have been developed and shown progressive improvement. Despite utilizing more sophisticated techniques and introducing new features, none of these methods employ explicit protein structure stability information. Hypothetically, protein model quality might be indicated by its structural stability in an in silico system disclosed by the structural difference from its initial structure. One of the possible methods to exploit such information is by implementing molecular dynamics simulations that have shown successful applications in many research fields. We present a novel approach by introducing explicit protein structure stability information using molecular dynamics simulation. Despite using only simple features, small data with no training process required, and a short molecular dynamics simulation time, our method shows comparable performance to the state-of-the-art deep learning-based method.
Collapse
|
15
|
Structure Prediction, Evaluation, and Validation of GPR18 Lipid Receptor Using Free Programs. Int J Mol Sci 2022; 23:ijms23147917. [PMID: 35887268 PMCID: PMC9319093 DOI: 10.3390/ijms23147917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/04/2022] [Accepted: 07/08/2022] [Indexed: 11/30/2022] Open
Abstract
The GPR18 receptor, often referred to as the N-arachidonylglycine receptor, although assigned (along with GPR55 and GPR119) to the new class A GPCR subfamily-lipid receptors, officially still has the status of a class A GPCR orphan. While its signaling pathways and biological significance have not yet been fully elucidated, increasing evidence points to the therapeutic potential of GPR18 in relation to immune, neurodegenerative, and cancer processes to name a few. Therefore, it is necessary to understand the interactions of potential ligands with the receptor and the influence of particular structural elements on their activity. Thus, given the lack of an experimentally solved structure, the goal of the present study was to obtain a homology model of the GPR18 receptor in the inactive state, meeting all requirements in terms of protein structure quality and recognition of active ligands. To increase the reliability and precision of the predictions, different contemporary protein structure prediction methods and software were used and compared herein. To test the usability of the resulting models, we optimized and compared the selected structures followed by the assessment of the ability to recognize known, active ligands. The stability of the predicted poses was then evaluated by means of molecular dynamics simulations. On the other hand, most of the best-ranking contemporary CADD software/platforms for its full usability require rather expensive licenses. To overcome this down-to-earth obstacle, the overarching goal of these studies was to test whether it is possible to perform the thorough CADD experiments with high scientific confidence while using only license-free/academic software and online platforms. The obtained results indicate that a wide range of freely available software and/or academic licenses allow us to carry out meaningful molecular modelling/docking studies.
Collapse
|
16
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
17
|
Statistical potentials from the Gaussian scaling behaviour of chain fragments buried within protein globules. PLoS One 2022; 17:e0254969. [PMID: 35085247 PMCID: PMC8794220 DOI: 10.1371/journal.pone.0254969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/28/2021] [Indexed: 11/19/2022] Open
Abstract
Knowledge-based approaches use the statistics collected from protein data-bank structures to estimate effective interaction potentials between amino acid pairs. Empirical relations are typically employed that are based on the crucial choice of a reference state associated to the null interaction case. Despite their significant effectiveness, the physical interpretation of knowledge-based potentials has been repeatedly questioned, with no consensus on the choice of the reference state. Here we use the fact that the Flory theorem, originally derived for chains in a dense polymer melt, holds also for chain fragments within the core of globular proteins, if the average over buried fragments collected from different non-redundant native structures is considered. After verifying that the ensuing Gaussian statistics, a hallmark of effectively non-interacting polymer chains, holds for a wide range of fragment lengths, although with significant deviations at short spatial scales, we use it to define a ‘bona fide’ reference state. Notably, despite the latter does depend on fragment length, deviations from it do not. This allows to estimate an effective interaction potential which is not biased by the presence of correlations due to the connectivity of the protein chain. We show how different sequence-independent effective statistical potentials can be derived using this approach by coarse-graining the protein representation at varying levels. The possibility of defining sequence-dependent potentials is explored.
Collapse
|
18
|
rsRNASP: A residue-separation-based statistical potential for RNA 3D structure evaluation. Biophys J 2022; 121:142-156. [PMID: 34798137 PMCID: PMC8758408 DOI: 10.1016/j.bpj.2021.11.016] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/23/2021] [Accepted: 11/10/2021] [Indexed: 01/07/2023] Open
Abstract
Knowledge-based statistical potentials have been shown to be rather effective in protein 3-dimensional (3D) structure evaluation and prediction. Recently, several statistical potentials have been developed for RNA 3D structure evaluation, while their performances are either still at a low level for the test datasets from structure prediction models or dependent on the "black-box" process through neural networks. In this work, we have developed an all-atom distance-dependent statistical potential based on residue separation for RNA 3D structure evaluation, namely rsRNASP, which is composed of short- and long-ranged potentials distinguished by residue separation. The extensive examinations against available RNA test datasets show that rsRNASP has apparently higher performance than the existing statistical potentials for the realistic test datasets with large RNAs from structure prediction models, including the newly released RNA-Puzzles dataset, and is comparable to the existing top statistical potentials for the test datasets with small RNAs or near-native decoys. In addition, rsRNASP is superior to RNA3DCNN, a recently developed scoring function through 3D convolutional neural networks. rsRNASP and the relevant databases are available to the public.
Collapse
|
19
|
Wang D, Wang Y, Chang J, Zhang L, Wang H, E W. Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics. NATURE COMPUTATIONAL SCIENCE 2022; 2:20-29. [PMID: 38177702 DOI: 10.1038/s43588-021-00173-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 11/15/2021] [Indexed: 01/06/2024]
Abstract
Enhanced sampling methods such as metadynamics and umbrella sampling have become essential tools for exploring the configuration space of molecules and materials. At the same time, they have long faced a number of issues such as the inefficiency when dealing with a large number of collective variables (CVs) or systems with high free energy barriers. Here we show that, with clustering and adaptive tuning techniques, the reinforced dynamics (RiD) scheme can be used to efficiently explore the configuration space and free energy landscapes with a large number of CVs or systems with high free energy barriers. We illustrate this by studying various representative and challenging examples. First we demonstrate the efficiency of adaptive RiD compared with other methods and construct the nine-dimensional (9D) free energy landscape of a peptoid trimer, which has energy barriers of more than 8 kcal mol-1. We then study the folding of the protein chignolin using 18 CVs. In this case, both the folding and unfolding rates are observed to be 4.30 μs-1. Finally, we propose a protein structure refinement protocol based on RiD. This protocol allows us to efficiently employ more than 100 CVs for exploring the landscape of protein structures and it gives rise to an overall improvement of 14.6 units over the initial global distance test-high accuracy (GDT-HA) score.
Collapse
Affiliation(s)
- Dongdong Wang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- DP Technology, Beijing, People's Republic of China
| | - Yanze Wang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Junhan Chang
- DP Technology, Beijing, People's Republic of China
- College of Chemistry and Molecular Engineering, Peking University, Beijing, People's Republic of China
| | - Linfeng Zhang
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA.
- DP Technology, Beijing, People's Republic of China.
| | - Han Wang
- Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing, People's Republic of China.
| | - Weinan E
- School of Mathematical Sciences, Peking University, Beijing, People's Republic of China
- Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
- Beijing Institute of Big Data Research, Beijing, People's Republic of China
| |
Collapse
|
20
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
21
|
Heo L, Janson G, Feig M. Physics-based protein structure refinement in the era of artificial intelligence. Proteins 2021; 89:1870-1887. [PMID: 34156124 PMCID: PMC8616793 DOI: 10.1002/prot.26161] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/31/2021] [Accepted: 06/08/2021] [Indexed: 12/21/2022]
Abstract
Protein structure refinement is the last step in protein structure prediction pipelines. Physics-based refinement via molecular dynamics (MD) simulations has made significant progress during recent years. During CASP14, we tested a new refinement protocol based on an improved sampling strategy via MD simulations. MD simulations were carried out at an elevated temperature (360 K). An optimized use of biasing restraints and the use of multiple starting models led to enhanced sampling. The new protocol generally improved the model quality. In comparison with our previous protocols, the CASP14 protocol showed clear improvements. Our approach was successful with most initial models, many based on deep learning methods. However, we found that our approach was not able to refine machine-learning models from the AlphaFold2 group, often decreasing already high initial qualities. To better understand the role of refinement given new types of models based on machine-learning, a detailed analysis via MD simulations and Markov state modeling is presented here. We continue to find that MD-based refinement has the potential to improve AI predictions. We also identified several practical issues that make it difficult to realize that potential. Increasingly important is the consideration of inter-domain and oligomeric contacts in simulations; the presence of large kinetic barriers in refinement pathways also continues to present challenges. Finally, we provide a perspective on how physics-based refinement could continue to play a role in the future for improving initial predictions based on machine learning-based methods.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
22
|
Wang W, Wang J, Li Z, Xu D, Shang Y. MUfoldQA_G: High-accuracy protein model QA via retraining and transformation. Comput Struct Biotechnol J 2021; 19:6282-6290. [PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/10/2021] [Accepted: 11/14/2021] [Indexed: 11/21/2022] Open
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.
Collapse
Affiliation(s)
- Wenbo Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Junlin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Zhaoyu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yi Shang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
23
|
Ye L, Wu P, Peng Z, Gao J, Liu J, Yang J. Improved estimation of model quality using predicted inter-residue distance. Bioinformatics 2021; 37:3752-3759. [PMID: 34473228 DOI: 10.1093/bioinformatics/btab632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/27/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction. RESULTS Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/QDistance. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisha Ye
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Peikun Wu
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Jianzhao Gao
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, 300071, China
| |
Collapse
|
24
|
Liu J, Wu T, Guo Z, Hou J, Cheng J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins 2021; 90:58-72. [PMID: 34291486 PMCID: PMC8671168 DOI: 10.1002/prot.26186] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/21/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]
Abstract
Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
25
|
Igashov I, Pavlichenko N, Grudinin S. Spherical convolutions on molecular graphs for protein model quality assessment. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abf856] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstract
Processing information on three-dimensional (3D) objects requires methods stable to rigid-body transformations, in particular rotations, of the input data. In image processing tasks, convolutional neural networks achieve this property using rotation-equivariant operations. However, contrary to images, graphs generally have irregular topology. This makes it challenging to define a rotation-equivariant convolution operation on these structures. In this work, we propose spherical graph convolutional network that processes 3D models of proteins represented as molecular graphs. In a protein molecule, individual amino acids have common topological elements. This allows us to unambiguously associate each amino acid with a local coordinate system and construct rotation-equivariant spherical filters that operate on angular information between graph nodes. Within the framework of the protein model quality assessment problem, we demonstrate that the proposed spherical convolution method significantly improves the quality of model assessment compared to the standard message-passing approach. It is also comparable to state-of-the-art methods, as we demonstrate on critical assessment of structure prediction benchmarks. The proposed technique operates only on geometric features of protein 3D models. This makes it universal and applicable to any other geometric-learning task where the graph structure allows constructing local coordinate systems. The method is available at https://team.inria.fr/nano-d/software/s-gcn/.
Collapse
|
26
|
Shuvo MH, Gulfam M, Bhattacharya D. DeepRefiner: high-accuracy protein structure refinement by deep network calibration. Nucleic Acids Res 2021; 49:W147-W152. [PMID: 33999209 PMCID: PMC8262753 DOI: 10.1093/nar/gkab361] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/18/2021] [Accepted: 04/23/2021] [Indexed: 12/20/2022] Open
Abstract
The DeepRefiner webserver, freely available at http://watson.cse.eng.auburn.edu/DeepRefiner/, is an interactive and fully configurable online system for high-accuracy protein structure refinement. Fuelled by deep learning, DeepRefiner offers the ability to leverage cutting-edge deep neural network architectures which can be calibrated for on-demand selection of adventurous or conservative refinement modes targeted at degree or consistency of refinement. The method has been extensively tested in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments under the group name 'Bhattacharya-Server' and was officially ranked as the No. 2 refinement server in CASP13 (second only to 'Seok-server' and outperforming all other refinement servers) and No. 2 refinement server in CASP14 (second only to 'FEIG-S' and outperforming all other refinement servers including 'Seok-server'). The DeepRefiner web interface offers a number of convenient features, including (i) fully customizable refinement job submission and validation; (ii) automated job status update, tracking, and notifications; (ii) interactive and interpretable web-based results retrieval with quantitative and visual analysis and (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Muhammad Gulfam
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
27
|
Jing X, Xu J. Fast and effective protein model refinement using deep graph neural networks. NATURE COMPUTATIONAL SCIENCE 2021; 1:462-469. [PMID: 35321360 DOI: 10.1038/s43588-021-00098-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein model refinement is the last step applied to improve the quality of a predicted protein model. Currently the most successful refinement methods rely on extensive conformational sampling and thus, take hours or days to refine even a single protein model. Here we propose a fast and effective model refinement method that applies GNN (graph neural networks) to predict refined inter-atom distance probability distribution from an initial model and then rebuilds 3D models from the predicted distance distribution. Tested on the CASP (Critical Assessment of Structure Prediction) refinement targets, our method has comparable accuracy as two leading human groups Feig and Baker, but runs substantially faster. Our method may refine one protein model within ~11 minutes on 1 CPU while Baker needs ~30 hours on 60 CPUs and Feig needs ~16 hours on 1 GPU. Finally, our study shows that GNN outperforms ResNet (convolutional residual neural networks) for model refinement when very limited conformational sampling is allowed.
Collapse
Affiliation(s)
- Xiaoyang Jing
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| |
Collapse
|
28
|
Mori T, Terashi G, Matsuoka D, Kihara D, Sugita Y. Efficient Flexible Fitting Refinement with Automatic Error Fixing for De Novo Structure Modeling from Cryo-EM Density Maps. J Chem Inf Model 2021; 61:3516-3528. [PMID: 34142833 PMCID: PMC9282639 DOI: 10.1021/acs.jcim.1c00230] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Structural modeling of proteins from cryo-electron microscopy (cryo-EM) density maps is one of the challenging issues in structural biology. De novo modeling combined with flexible fitting refinement (FFR) has been widely used to build a structure of new proteins. In de novo prediction, artificial conformations containing local structural errors such as chirality errors, cis peptide bonds, and ring penetrations are frequently generated and cannot be easily removed in the subsequent FFR. Moreover, refinement can be significantly suppressed due to the low mobility of atoms inside the protein. To overcome these problems, we propose an efficient scheme for FFR, in which the local structural errors are fixed first, followed by FFR using an iterative simulated annealing (SA) molecular dynamics protocol with the united atom (UA) model in an implicit solvent model; we call this scheme "SAUA-FFR". The best model is selected from multiple flexible fitting runs with various biasing force constants to reduce overfitting. We apply our scheme to the decoys obtained from MAINMAST and demonstrate an improvement of the best model of eight selected proteins in terms of the root-mean-square deviation, MolProbity score, and RWplus score compared to the original scheme of MAINMAST. Fixing the local structural errors can enhance the formation of secondary structures, and the UA model enables progressive refinement compared to the all-atom model owing to its high mobility in the implicit solvent. The SAUA-FFR scheme realizes efficient and accurate protein structure modeling from medium-resolution maps with less overfitting.
Collapse
Affiliation(s)
- Takaharu Mori
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States
| | - Daisuke Matsuoka
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, United States.,Department of Computer Science, Purdue University, West Lafayette, Indiana 47907, United States
| | - Yuji Sugita
- RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan.,RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,RIKEN Center for Biosystems Dynamics Research, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
29
|
Kadukova M, Machado KDS, Chacón P, Grudinin S. KORP-PL: a coarse-grained knowledge-based scoring function for protein-ligand interactions. Bioinformatics 2021; 37:943-950. [PMID: 32840574 DOI: 10.1093/bioinformatics/btaa748] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/27/2020] [Accepted: 08/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Despite the progress made in studying protein-ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations. RESULTS Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction. AVAILABILITYAND IMPLEMENTATION The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Kadukova
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Karina Dos Santos Machado
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Computational Biology Laboratory, Centro de Ciências Computacionais, Universidade Federal do Rio Grande - FURG, Rio Grande, RS 96201-090, Brazil
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid 28006, Spain
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
30
|
Pearce R, Zhang Y. Deep learning techniques have significantly impacted protein structure prediction and protein design. Curr Opin Struct Biol 2021; 68:194-207. [PMID: 33639355 PMCID: PMC8222070 DOI: 10.1016/j.sbi.2021.01.007] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 01/09/2021] [Accepted: 01/18/2021] [Indexed: 12/26/2022]
Abstract
Protein structure prediction and design can be regarded as two inverse processes governed by the same folding principle. Although progress remained stagnant over the past two decades, the recent application of deep neural networks to spatial constraint prediction and end-to-end model training has significantly improved the accuracy of protein structure prediction, largely solving the problem at the fold level for single-domain proteins. The field of protein design has also witnessed dramatic improvement, where noticeable examples have shown that information stored in neural-network models can be used to advance functional protein design. Thus, incorporation of deep learning techniques into different steps of protein folding and design approaches represents an exciting future direction and should continue to have a transformative impact on both fields.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
31
|
Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep 2021; 11:10943. [PMID: 34035363 PMCID: PMC8149836 DOI: 10.1038/s41598-021-90303-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open
Abstract
The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.
Collapse
|
32
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
33
|
Xenakis MN, Kapetis D, Yang Y, Gerrits MM, Heijman J, Waxman SG, Lauria G, Faber CG, Westra RL, Lindsey PJ, Smeets HJ. Hydropathicity-based prediction of pain-causing NaV1.7 variants. BMC Bioinformatics 2021; 22:212. [PMID: 33892629 PMCID: PMC8063372 DOI: 10.1186/s12859-021-04119-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Mutation-induced variations in the functional architecture of the NaV1.7 channel protein are causally related to a broad spectrum of human pain disorders. Predicting in silico the phenotype of NaV1.7 variant is of major clinical importance; it can aid in reducing costs of in vitro pathophysiological characterization of NaV1.7 variants, as well as, in the design of drug agents for counteracting pain-disease symptoms. Results In this work, we utilize spatial complexity of hydropathic effects toward predicting which NaV1.7 variants cause pain (and which are neutral) based on the location of corresponding mutation sites within the NaV1.7 structure. For that, we analyze topological and scaling hydropathic characteristics of the atomic environment around NaV1.7’s pore and probe their spatial correlation with mutation sites. We show that pain-related mutation sites occupy structural locations in proximity to a hydrophobic patch lining the pore while clustering at a critical hydropathic-interactions distance from the selectivity filter (SF). Taken together, these observations can differentiate pain-related NaV1.7 variants from neutral ones, i.e., NaV1.7 variants not causing pain disease, with 80.5\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\%$$\end{document}% sensitivity and 93.7\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\%$$\end{document}% specificity [area under the receiver operating characteristics curve = 0.872]. Conclusions Our findings suggest that maintaining hydrophobic NaV1.7 interior intact, as well as, a finely-tuned (dictated by hydropathic interactions) distance from the SF might be necessary molecular conditions for physiological NaV1.7 functioning. The main advantage for using the presented predictive scheme is its negligible computational cost, as well as, hydropathicity-based biophysical rationalization. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04119-2.
Collapse
Affiliation(s)
- Makros N Xenakis
- Department of Toxicogenomics, Section Clinical Genomics, Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands. .,Research School for Mental Health and Neuroscience (MHeNS), Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands.
| | - Dimos Kapetis
- Neuroalgology Unit, Fondazione IRCCS Istituto Neurologico "Carlo Besta", Via Celoria 11, 20133, Milan, Italy
| | - Yang Yang
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University College of Pharmacy, West Lafayette, IN, 47907, USA.,Purdue Institute for Integrative Neuroscience, West Lafayette, IN, 47907, USA
| | - Monique M Gerrits
- Department of Clinical Genetics, Maastricht University Medical Center, PO box 5800, 6202 AZ, Maastricht, The Netherlands
| | - Jordi Heijman
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands
| | - Stephen G Waxman
- Department of Neurology and Center for Neuroscience and Regeneration Research, Yale University School of Medicine, New Haven, CT, 06510, USA.,Rehabilitation Research Center, Veterans Affairs Connecticut Healthcare System, West Haven, CT, 06516, USA
| | - Giuseppe Lauria
- Neuroalgology Unit, Fondazione IRCCS Istituto Neurologico "Carlo Besta", Via Celoria 11, 20133, Milan, Italy.,Department of Biomedical and Clinical Sciences "Luigi Sacco", University of Milan, Via G.B. Grassi 74, 20157, Milan, Italy
| | - Catharina G Faber
- Department of Neurology, Maastricht University Medical Center, PO Box 5800, 6202 AZ, Maastricht, The Netherlands
| | - Ronald L Westra
- Department of Data Science and Knowledge Engineering, Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands
| | - Patrick J Lindsey
- Department of Toxicogenomics, Section Clinical Genomics, Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands.,Research School for Oncology and Developmental Biology (GROW), Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands
| | - Hubert J Smeets
- Department of Toxicogenomics, Section Clinical Genomics, Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands.,Research School for Mental Health and Neuroscience (MHeNS), Maastricht University, PO Box 616, 6200 MD, Maastricht, The Netherlands
| |
Collapse
|
34
|
Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy. Int J Mol Sci 2021; 22:ijms22094408. [PMID: 33922489 PMCID: PMC8122964 DOI: 10.3390/ijms22094408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/02/2022] Open
Abstract
Protein structure refinement is a crucial step for more accurate protein structure predictions. Most existing approaches treat it as an energy minimization problem to intuitively improve the quality of initial models by searching for structures with lower energy. Considering that a single energy function could not reflect the accurate energy landscape of all the proteins, our previous AIR 1.0 pipeline uses multiple energy functions to realize a multi-objectives particle swarm optimization-based model refinement. It is expected to provide a general balanced conformation search protocol guided from different energy evaluations. However, AIR 1.0 solves the multi-objective optimization problem as a whole, which could not result in good solution diversity and convergence on some targets. In this study, we report a decomposition-based method AIR 2.0, which is an updated version of AIR, for protein structure refinement. AIR 2.0 decomposes a multi-objective optimization problem into a number of subproblems and optimizes them simultaneously using particle swarm optimization algorithm. The solutions yielded by AIR 2.0 show better convergence and diversity compared to its previous version, which increases the possibilities of digging out better structure conformations. The experimental results on CASP13 refinement benchmark targets and blind tests in CASP 14 demonstrate the efficacy of AIR 2.0.
Collapse
|
35
|
Cao X, Tian P. Molecular free energy optimization on a computational graph. RSC Adv 2021; 11:12929-12937. [PMID: 35423805 PMCID: PMC8697515 DOI: 10.1039/d1ra01455b] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 03/26/2021] [Indexed: 11/21/2022] Open
Abstract
Free energy is arguably the most important property of molecular systems. Despite great progress in both its efficient estimation by scoring functions/potentials and more rigorous computation based on extensive sampling, we remain far from accurately predicting and manipulating biomolecular structures and their interactions. There are fundamental limitations, including accuracy of interaction description and difficulty of sampling in high dimensional space, to be tackled. Computational graph underlies major artificial intelligence platforms and is proven to facilitate training, optimization and learning. Combining autodifferentiation, coordinates transformation and generalized solvation free energy theory, we construct a computational graph infrastructure to realize seamless integration of fully trainable local free energy landscape with end to end differentiable iterative free energy optimization. This new framework drastically improves efficiency by replacing local sampling with differentiation. Its specific implementation in protein structure refinement achieves superb efficiency and competitive accuracy when compared with state of the art all-atom mainstream methods.
Collapse
Affiliation(s)
- Xiaoyong Cao
- School of Life Sciences, Jilin University Changchun 130012 China +86 431 85155287
| | - Pu Tian
- School of Life Sciences, Jilin University Changchun 130012 China +86 431 85155287
- School of Artificial Intelligence, Jilin University Changchun 130012 China
| |
Collapse
|
36
|
Gong W, Guerler A, Zhang C, Warner E, Li C, Zhang Y. Integrating Multimeric Threading With High-throughput Experiments for Structural Interactome of Escherichia coli. J Mol Biol 2021; 433:166944. [PMID: 33741411 DOI: 10.1016/j.jmb.2021.166944] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 03/06/2021] [Accepted: 03/09/2021] [Indexed: 10/21/2022]
Abstract
Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using traditional structural biology techniques. We proposed a uniform pipeline, Threpp, to address both problems. Starting from a pair of monomer sequences, Threpp first threads both sequences through a complex structure library, where the alignment score is combined with HTE data using a naïve Bayesian classifier model to predict the likelihood of two chains to interact with each other. Next, quaternary complex structures of the identified PPIs are constructed by reassembling monomeric alignments with dimeric threading frameworks through interface-specific structural alignments. The pipeline was applied to the Escherichia coli genome and created 35,125 confident PPIs which is 4.5-fold higher than HTE alone. Graphic analyses of the PPI networks show a scale-free cluster size distribution, consistent with previous studies, which was found critical to the robustness of genome evolution and the centrality of functionally important proteins that are essential to E. coli survival. Furthermore, complex structure models were constructed for all predicted E. coli PPIs based on the quaternary threading alignments, where 6771 of them were found to have a high confidence score that corresponds to the correct fold of the complexes with a TM-score >0.5, and 39 showed a close consistency with the later released experimental structures with an average TM-score = 0.73. These results demonstrated the significant usefulness of threading-based homologous modeling in both genome-wide PPI network detection and complex structural construction.
Collapse
Affiliation(s)
- Weikang Gong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Aysam Guerler
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Elisa Warner
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chunhua Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
37
|
Heo L, Arbour CF, Janson G, Feig M. Improved Sampling Strategies for Protein Model Refinement Based on Molecular Dynamics Simulation. J Chem Theory Comput 2021; 17:1931-1943. [PMID: 33562962 DOI: 10.1021/acs.jctc.0c01238] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. These methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore the conformational space more broadly. Based on the insights of this analysis, we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Collin F Arbour
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
38
|
Hernandez R, Facelli JC. Understanding protein structural changes for oncogenic missense variants. Heliyon 2021; 7:e06013. [PMID: 33553733 PMCID: PMC7846930 DOI: 10.1016/j.heliyon.2021.e06013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 08/20/2020] [Accepted: 01/15/2021] [Indexed: 12/31/2022] Open
Abstract
Understanding and predicting the changes of protein structure and function upon mutation and their relationship to human health is a critical element to translate the genomic revolution into actionable interventions. Therefore, it is pertinent to explore how mutations result in structural changes leading to pathogenic proteins, but due to the protein structural knowledge gap, experimental approaches are lacking. Protein structure prediction methods, such as I-TASSER, have made it possible to predict the structure of a given amino acid sequence, thus opening a new way to explore protein structure changes upon mutations when experimental information is not available. Using known mutations from the Catalogue of Somatic Mutation in Cancer (COSMIC) and ClinVar databases, we compare predicted structure-derived properties from wild type (WT) and mutated proteins and find differences between the local and global 3D protein structures of the WT and the mutants. The studies in this relatively small sample reveal that the structural changes are quite diverse.
Collapse
Affiliation(s)
- Rolando Hernandez
- Department of Biomedical Informatics and Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah, USA
| | - Julio C. Facelli
- Department of Biomedical Informatics and Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
39
|
Chen X, Song S, Ji J, Tang Z, Todo Y. Incorporating a multiobjective knowledge-based energy function into differential evolution for protein structure prediction. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
40
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
41
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
42
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
43
|
Rigby MJ, Ding Y, Farrugia MA, Feig M, Cortese GP, Mitchell H, Burger C, Puglielli L. The endoplasmic reticulum acetyltransferases ATase1/NAT8B and ATase2/NAT8 are differentially regulated to adjust engagement of the secretory pathway. J Neurochem 2020; 154:404-423. [PMID: 31945187 PMCID: PMC7363514 DOI: 10.1111/jnc.14958] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 12/20/2019] [Accepted: 01/09/2020] [Indexed: 01/13/2023]
Abstract
Nε-lysine acetylation of nascent glycoproteins within the endoplasmic reticulum (ER) lumen regulates the efficiency of the secretory pathway. The ER acetylation machinery consists of the membrane transporter, acetyl-CoA transporter 1 (AT-1/SLC33A1), and two acetyltransferases, ATase1/NAT8B and ATase2/NAT8. Dysfunctional ER acetylation is associated with severe neurological diseases with duplication of AT-1/SLC33A1 being associated with autism spectrum disorder, intellectual disability, and dysmorphism. Neuron-specific AT-1 over-expression in the mouse alters neuron morphology and function, causing an autism-like phenotype, indicating that ER acetylation plays a key role in neurophysiology. As such, characterizing the molecular mechanisms that regulate the acetylation machinery could reveal critical information about its biology. By using structure-biochemistry approaches, we discovered that ATase1 and ATase2 share enzymatic properties but differ in that ATase1 is post-translationally regulated via acetylation. Furthermore, gene expression studies revealed that the promoters of AT-1, ATase1, and ATase2 contain functional binding sites for the neuron-related transcription factors cAMP response element-binding protein and the immediate-early genes c-FOS and c-JUN, and that ATase1 and ATase2 exhibit additional modes of transcriptional regulation relevant to aging and Alzheimer's disease. In vivo rodent gene expression experiments revealed that Atase2 is specifically induced following activity-dependent events. Finally, over-expression of either ATase1 or ATase2 was sufficient to increase the engagement of the secretory pathway in PC12 cells. Our results indicate important regulatory roles for ATase1 and ATase2 in neuron function with induction of ATase2 expression potentially serving as a critical event that adjusts the efficiency of the secretory pathway for activity-dependent neuronal functions.
Collapse
Affiliation(s)
- Michael J. Rigby
- Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI 53705
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705
| | - Yun Ding
- Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI 53705
| | - Mark A. Farrugia
- Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824
| | | | | | - Corinna Burger
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI 53705
- Department of Neurology, University of Wisconsin-Madison, Madison, WI 53705
| | - Luigi Puglielli
- Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705
- Neuroscience Training Program, University of Wisconsin-Madison, Madison, WI 53705
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705
- Geriatric Research Education Clinical Center, Veterans Affairs Medical Center, Madison, WI 53705
| |
Collapse
|
44
|
Liu T, Wang Z. MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials. BMC Bioinformatics 2020; 21:246. [PMID: 32631256 PMCID: PMC7336608 DOI: 10.1186/s12859-020-3383-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 01/22/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable. RESULTS We developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13. CONCLUSIONS MASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
45
|
Wang W, Wang J, Xu D, Shang Y. Two New Heuristic Methods for Protein Model Quality Assessment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1430-1439. [PMID: 30418914 PMCID: PMC8988942 DOI: 10.1109/tcbb.2018.2880202] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein tertiary structure prediction is an important open challenge in bioinformatics and requires effective methods to accurately evaluate the quality of protein 3-D models generated computationally. Many quality assessment (QA) methods have been proposed over the past three decades. However, the accuracy or robustness is unsatisfactory for practical applications. In this paper, two new heuristic QA methods are proposed: MUfoldQA_S and MUfoldQA_C. The MUfoldQA_S is a quasi-single-model QA method that assesses the model quality based on the known protein structures with similar sequences. This algorithm can be directly applied to protein fragments without the necessity of building a full structural model. A BLOSUM-based heuristic is also introduced to help differentiate accurate templates from poor ones. In MUfoldQA_C, the ideas from MUfoldQA_S were combined with the consensus approach to create a multi-model QA method that could also utilize information from existing reference models and have demonstrated improved performance. Extensive experimental results of these two methods have shown significant improvement over existing methods. In addition, both methods have been blindly tested in the CASP12 world-wide competition in the protein structure prediction field and ranked as top performers in their respective categories.
Collapse
|
46
|
Xu G, Wang Q, Ma J. OPUS-Fold: An Open-Source Protein Folding Framework Based on Torsion-Angle Sampling. J Chem Theory Comput 2020; 16:3970-3976. [DOI: 10.1021/acs.jctc.0c00186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
47
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
48
|
Heo L, Feig M. Modeling of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Proteins by Machine Learning and Physics-Based Refinement. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.03.25.008904. [PMID: 32511334 PMCID: PMC7239069 DOI: 10.1101/2020.03.25.008904] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Protein structures are crucial for understanding their biological activities. Since the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there is an urgent need to understand the biological behavior of the virus and provide a basis for developing effective therapies. Since the proteome of the virus was determined, some of the protein structures could be determined experimentally, and others were predicted via template-based modeling approaches. However, tertiary structures for several proteins are still not available from experiment nor they could be accurately predicted by template-based modeling because of lack of close homolog structures. Previous efforts to predict structures for these proteins include efforts by DeepMind and the Zhang group via machine learning-based structure prediction methods, i.e. AlphaFold and C-I-TASSER. However, the predicted models vary greatly and have not yet been subjected to refinement. Here, we are reporting new predictions from our in-house structure prediction pipeline. The pipeline takes advantage of inter-residue contact predictions from trRosetta, a machine learning-based method. The predicted models were further improved by applying molecular dynamics simulation-based refinement. We also took the AlphaFold models and refined them by applying the same refinement method. Models based on our structure prediction pipeline and the refined AlphaFold models were analyzed and compared with the C-I-TASSER models. All of our models are available at https://github.com/feiglab/sars-cov-2-proteins.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
49
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
50
|
Xu G, Wang Q, Ma J. OPUS-Refine: A Fast Sampling-Based Framework for Refining Protein Backbone Torsion Angles and Global Conformation. J Chem Theory Comput 2020; 16:1359-1366. [DOI: 10.1021/acs.jctc.9b01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|