1
|
Drogalin A, Monteiro LS, Alves MJ, Castro TG. Golgi α-mannosidase: opposing structures of Drosophila melanogaster and novel human model using molecular dynamics simulations and docking at different pHs. J Biomol Struct Dyn 2024; 42:2714-2725. [PMID: 37158092 DOI: 10.1080/07391102.2023.2209184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 04/19/2023] [Indexed: 05/10/2023]
Abstract
The search for Golgi α-mannosidase II (GMII) potent and specific inhibitors has been a focus of many studies for the past three decades since this enzyme is a key target for cancer treatment. α-Mannosidases, such as those from Drosophila melanogaster or Jack bean, have been used as functional models of the human Golgi α-mannosidase II (hGMII) because mammalian mannosidases are difficult to purify and characterize experimentally. Meanwhile, computational studies have been seen as privileged tools able to explore assertive solutions to specific enzymes, providing molecular details of these macromolecules, their protonation states and their interactions. Thus, modelling techniques can successfully predict hGMII 3D structure with high confidence, speeding up the development of new hits. In this study, Drosophila melanogaster Golgi mannosidase II (dGMII) and a novel human model, developed in silico and equilibrated via molecular dynamics simulations, were both opposed for docking. Our findings highlight that the design of novel inhibitors should be carried out considering the human model's characteristics and the enzyme operating pH. A reliable model is evidenced, showing a good correlation between Ki/IC50 experimental data and theoretical ΔGbinding estimations in GMII, opening the possibility of optimizing the rational drug design of new derivatives.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Artem Drogalin
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Luís S Monteiro
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Maria José Alves
- Chemistry Centre, School of Sciences, University of Minho, Braga, Portugal
| | - Tarsila G Castro
- CEB - Centre of Biological Engineering, University of Minho, Braga, Portugal
- LABBELS -Associate Laboratory, Braga/Guimarães, Portugal
| |
Collapse
|
2
|
Lu W, Zhang J, Huang W, Zhang Z, Jia X, Wang Z, Shi L, Li C, Wolynes PG, Zheng S. DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat Commun 2024; 15:1071. [PMID: 38316797 PMCID: PMC10844226 DOI: 10.1038/s41467-024-45461-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/24/2024] [Indexed: 02/07/2024] Open
Abstract
While significant advances have been made in predicting static protein structures, the inherent dynamics of proteins, modulated by ligands, are crucial for understanding protein function and facilitating drug discovery. Traditional docking methods, frequently used in studying protein-ligand interactions, typically treat proteins as rigid. While molecular dynamics simulations can propose appropriate protein conformations, they're computationally demanding due to rare transitions between biologically relevant equilibrium states. In this study, we present DynamicBind, a deep learning method that employs equivariant geometric diffusion networks to construct a smooth energy landscape, promoting efficient transitions between different equilibrium states. DynamicBind accurately recovers ligand-specific conformations from unbound protein structures without the need for holo-structures or extensive sampling. Remarkably, it demonstrates state-of-the-art performance in docking and virtual screening benchmarks. Our experiments reveal that DynamicBind can accommodate a wide range of large protein conformational changes and identify cryptic pockets in unseen protein targets. As a result, DynamicBind shows potential in accelerating the development of small molecules for previously undruggable targets and expanding the horizons of computational drug discovery.
Collapse
Affiliation(s)
- Wei Lu
- Galixir Technologies, 200100, Shanghai, China.
| | | | - Weifeng Huang
- School of Pharmaceutical Science, Sun Yat-sen University, 510006, Guangzhou, China
| | | | - Xiangyu Jia
- Galixir Technologies, 200100, Shanghai, China
| | - Zhenyu Wang
- Galixir Technologies, 200100, Shanghai, China
| | - Leilei Shi
- Galixir Technologies, 200100, Shanghai, China
| | - Chengtao Li
- Galixir Technologies, 200100, Shanghai, China
| | - Peter G Wolynes
- Center for Theoretical Biological Physics and Department of Chemistry, Rice University, Houston, TX, 77005, USA
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, 200240, Shanghai, China.
| |
Collapse
|
3
|
Shan MA, Khan MU, Ishtiaq W, Rehman R, Khan S, Javed MA, Ali Q. In silico analysis of the Val66Met mutation in BDNF protein: implications for psychological stress. AMB Express 2024; 14:11. [PMID: 38252222 PMCID: PMC10803716 DOI: 10.1186/s13568-024-01664-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/08/2024] [Indexed: 01/23/2024] Open
Abstract
The brain-derived neurotrophic factor (BDNF) involves stress regulation and psychiatric disorders. The Val66Met polymorphism in the BDNF gene has been linked to altered protein function and susceptibility to stress-related conditions. This in silico analysis aimed to predict and analyze the consequences of the Val66Met mutation in the BDNF gene of stressed individuals. Computational techniques, including ab initio, comparative, and I-TASSER modeling, were used to evaluate the functional and stability effects of the Val66Met mutation in BDNF. The accuracy and reliability of the models were validated. Sequence alignment and secondary structure analysis compared amino acid residues and structural components. The phylogenetic analysis assessed the conservation of the mutation site. Functional and stability prediction analyses provided mixed results, suggesting potential effects on protein function and stability. Structural models revealed the importance of BDNF in key biological processes. Sequence alignment analysis showed the conservation of amino acid residues across species. Secondary structure analysis indicated minor differences between the wild-type and mutant forms. Phylogenetic analysis supported the evolutionary conservation of the mutation site. This computational study suggests that the Val66Met mutation in BDNF may have implications for protein stability, structural conformation, and function. Further experimental validation is needed to confirm these findings and elucidate the precise effects of this mutation on stress-related disorders.
Collapse
Affiliation(s)
- Muhammad Adnan Shan
- Center for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Muhammad Umer Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan.
| | - Warda Ishtiaq
- Center for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Raima Rehman
- Center for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Samiullah Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Muhammad Arshad Javed
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab Lahore, Lahore, Pakistan
| | - Qurban Ali
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab Lahore, Lahore, Pakistan.
| |
Collapse
|
4
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
5
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
6
|
Cho HJ, Gurbuz F, Stamou M, Kotan LD, Farmer SM, Can S, Tompkins MF, Mammadova J, Altincik SA, Gokce C, Catli G, Bugrul F, Bartlett K, Turan I, Balasubramanian R, Yuksel B, Seminara SB, Wray S, Topaloglu AK. POU6F2 mutation in humans with pubertal failure alters GnRH transcript expression. Front Endocrinol (Lausanne) 2023; 14:1203542. [PMID: 37600690 PMCID: PMC10436210 DOI: 10.3389/fendo.2023.1203542] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/23/2023] [Indexed: 08/22/2023] Open
Abstract
Idiopathic hypogonadotropic hypogonadism (IHH) is characterized by the absence of pubertal development and subsequent impaired fertility often due to gonadotropin-releasing hormone (GnRH) deficits. Exome sequencing of two independent cohorts of IHH patients identified 12 rare missense variants in POU6F2 in 15 patients. POU6F2 encodes two distinct isoforms. In the adult mouse, expression of both isoform1 and isoform2 was detected in the brain, pituitary, and gonads. However, only isoform1 was detected in mouse primary GnRH cells and three immortalized GnRH cell lines, two mouse and one human. To date, the function of isoform2 has been verified as a transcription factor, while the function of isoform1 has been unknown. In the present report, bioinformatics and cell assays on a human-derived GnRH cell line reveal a novel function for isoform1, demonstrating it can act as a transcriptional regulator, decreasing GNRH1 expression. In addition, the impact of the two most prevalent POU6F2 variants, identified in five IHH patients, that were located at/or close to the DNA-binding domain was examined. Notably, one of these mutations prevented the repression of GnRH transcripts by isoform1. Normally, GnRH transcription increases as GnRH cells mature as they near migrate into the brain. Augmentation earlier during development can disrupt normal GnRH cell migration, consistent with some POU6F2 variants contributing to the IHH pathogenesis.
Collapse
Affiliation(s)
- Hyun-Ju Cho
- Cellular and Developmental Neurobiology Section, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Fatih Gurbuz
- Division of Pediatric Endocrinology, Faculty of Medicine, Cukurova University, Adana, Türkiye
| | - Maria Stamou
- Harvard Reproductive Sciences Center, The Reproductive Endocrine Unit and The Endocrine Unit of the Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Leman Damla Kotan
- Division of Pediatric Endocrinology, Faculty of Medicine, Cukurova University, Adana, Türkiye
| | - Stephen Matthew Farmer
- Cellular and Developmental Neurobiology Section, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Sule Can
- Division of Pediatric Endocrinology, İzmir Tepecik Training and Research Hospital, Health Sciences University, İzmir, Türkiye
| | - Miranda Faith Tompkins
- Cellular and Developmental Neurobiology Section, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Jamala Mammadova
- Division of Pediatric Endocrinology, Faculty of Medicine, Ondokuz Mayis University, Samsun, Türkiye
| | - S. Ayca Altincik
- Division of Pediatric Endocrinology, Faculty of Medicine, Pamukkale University, Denizli, Türkiye
| | - Cumali Gokce
- Division of Endocrinology, Faculty of Medicine, Mustafa Kemal University, Hatay, Türkiye
| | - Gonul Catli
- Division of Pediatric Endocrinology, İzmir Tepecik Training and Research Hospital, Health Sciences University, İzmir, Türkiye
| | - Fuat Bugrul
- Division of Pediatric Endocrinology, Faculty of Medicine, Selcuk University, Konya, Türkiye
| | - Keenan Bartlett
- Cellular and Developmental Neurobiology Section, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Ihsan Turan
- Division of Pediatric Endocrinology, Faculty of Medicine, Cukurova University, Adana, Türkiye
| | - Ravikumar Balasubramanian
- Harvard Reproductive Sciences Center, The Reproductive Endocrine Unit and The Endocrine Unit of the Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Bilgin Yuksel
- Division of Pediatric Endocrinology, Faculty of Medicine, Cukurova University, Adana, Türkiye
| | - Stephanie B. Seminara
- Harvard Reproductive Sciences Center, The Reproductive Endocrine Unit and The Endocrine Unit of the Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Susan Wray
- Cellular and Developmental Neurobiology Section, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - A. Kemal Topaloglu
- Department of Pediatrics, Division of Pediatric Endocrinology, University of Mississippi Medical Center, Jackson, MS, United States
- Division of Pediatric Endocrinology, Massachusetts General Hospital for Children and Harvard Medical School, Boston, MS, United States
| |
Collapse
|
7
|
Kapoor L, Udhaya Kumar S, De S, Vijayakumar S, Kapoor N, Ashok Kumar SK, Priya Doss C G, Ramamoorthy S. Multispectroscopic, virtual and in vivo insights into the photoaging defense mediated by the natural food colorant bixin. Food Funct 2023; 14:319-334. [PMID: 36503930 DOI: 10.1039/d2fo02338e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
An upsurge in early onset of photoaging due to repeated skin exposure to environmental stressors such as UV radiation is a challenge for pharmaceutical and cosmeceutical divisions. Current reports indicate severe side effects because of chemical or synthetic inhibitors of matrix metalloproteases (MMPs) in anti-skin aging cosmeceuticals. We evaluated the adequacy of bixin, a well-known FDA certified food additive, as a scavenger of free radicals and its inhibitory mechanism of action on MMP1, collagenase, elastase, and hyaluronidase. The anti-skin aging potential of bixin was evaluated by several biotechnological tools in silico, in vitro and in vivo. Molecular docking and simulation dynamics studies gave a virtual insight into the robust binding interaction between bixin and skin aging-related enzymes. Absorbance and fluorescence studies, enzyme inhibition assays, enzyme kinetics and in vitro bioassays of human dermal fibroblast (HDF) cells highlighted bixin's role as a potent antioxidant and inhibitor of skin aging-related enzymes. Furthermore, in vivo protocols were carried out to study the impact of bixin administration on UVA induced photoaging in C57BL/6 mice skin. Here, we uncover the UVA shielding effect of bixin and its efficacy as a novel anti-photoaging agent. Furthermore, the findings of this study provide a strong foundation to explore the pharmaceutical applications of bixin in several other biochemical pathways linked to MMP1, collagenase, elastase, and hyaluronidase.
Collapse
Affiliation(s)
- Leepica Kapoor
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| | - S Udhaya Kumar
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| | - Sourav De
- Department of Chemical Engineering, National Chung Cheng University, Chia-Yi, 62102, Taiwan
| | - Sujithra Vijayakumar
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| | - Nitin Kapoor
- Department of Endocrinology, Diabetes and Metabolism, Christian Medical College, Vellore 632004, Tamil Nadu, India.,Non Communicable Disease Unit and Implementation Science Lab, The Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
| | - S K Ashok Kumar
- School of Advanced Sciences, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - George Priya Doss C
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| | - Siva Ramamoorthy
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
8
|
Khan AS, Hichami A, Murtaza B, Louillat-Habermeyer ML, Ramseyer C, Azadi M, Yesylevskyy S, Mangin F, Lirussi F, Leemput J, Merlin JF, Schmitt A, Suliman M, Bayardon J, Semnanian S, Jugé S, Khan NA. Novel Fat Taste Receptor Agonists Curtail Progressive Weight Gain in Obese Male Mice. Cell Mol Gastroenterol Hepatol 2023; 15:633-663. [PMID: 36410709 PMCID: PMC9871744 DOI: 10.1016/j.jcmgh.2022.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/10/2022] [Accepted: 11/10/2022] [Indexed: 11/20/2022]
Abstract
BACKGROUND & AIMS The spontaneous preference for dietary lipids is principally regulated by 2 lingual fat taste receptors, CD36 and GPR120. Obese animals and most of human subjects exhibit low orosensory perception of dietary fat because of malfunctioning of these taste receptors. Our aim was to target the 2 fat taste receptors by newly synthesized high affinity fatty acid agonists to decrease fat-rich food intake and obesity. METHODS We synthesized 2 fat taste receptor agonists (FTA), NKS-3 (CD36 agonist) and NKS-5 (CD36 and GPR120 agonist). We determined their molecular dynamic interactions with fat taste receptors and the effect on Ca2+ signaling in mouse and human taste bud cells (TBC). In C57Bl/6 male mice, we assessed their gustatory perception and effects of their lingual application on activation of tongue-gut loop. We elucidated their effects on obesity and its related parameters in male mice fed a high-fat diet. RESULTS The two FTA, NKS-3 and NKS-5, triggered higher Ca2+ signaling than a dietary long-chain fatty acid in human and mouse TBC. Mice exhibited a gustatory attraction for these compounds. In conscious mice, the application of FTA onto the tongue papillae induced activation of tongue-gut loop, marked by the release of pancreato-bile juice into collecting duct and cholecystokinin and peptide YY into blood stream. Daily intake of NKS-3 or NKS-5 via feeding bottles decreased food intake and progressive weight gain in obese mice but not in control mice. CONCLUSIONS Our results show that targeting fat sensors in the tongue by novel chemical fat taste agonists might represent a new strategy to reduce obesity.
Collapse
Affiliation(s)
- Amira Sayed Khan
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Aziz Hichami
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Babar Murtaza
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | | | - Christophe Ramseyer
- Laboratoire ChronoEnvironnement, UMR CNRS6249, Université de Bourgogne Franche-Comté (UBFC), Besançon, France
| | - Maryam Azadi
- Department of Physiology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Semen Yesylevskyy
- Laboratoire ChronoEnvironnement, UMR CNRS6249, Université de Bourgogne Franche-Comté (UBFC), Besançon, France; Department of Physics of Biological Systems, Institute of Physics of the National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Floriane Mangin
- ICMUB-OCS, UMR CNRS 6302, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Frederic Lirussi
- HSP-pathies, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Julia Leemput
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Jean-Francois Merlin
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Antonin Schmitt
- HSP-pathies, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Muhtadi Suliman
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Jérôme Bayardon
- ICMUB-OCS, UMR CNRS 6302, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Saeed Semnanian
- Department of Physiology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sylvain Jugé
- ICMUB-OCS, UMR CNRS 6302, Université de Bourgogne-Franche Comté (UBFC), Dijon, France
| | - Naim Akhtar Khan
- NUTox, UMR UB/AgroSup/INSERM U1231, Lipides, Nutrition & Cancer, LABEX-LipStick, Université de Bourgogne-Franche Comté (UBFC), Dijon, France.
| |
Collapse
|
9
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
10
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
11
|
Liu Z, Yu DJ. cpxDeepMSA: A Deep Cascade Algorithm for Constructing Multiple Sequence Alignments of Protein–Protein Interactions. Int J Mol Sci 2022; 23:ijms23158459. [PMID: 35955594 PMCID: PMC9369210 DOI: 10.3390/ijms23158459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 07/18/2022] [Accepted: 07/28/2022] [Indexed: 12/10/2022] Open
Abstract
Protein–protein interactions (PPIs) are fundamental to many biological processes. The coevolution-based prediction of interacting residues has made great strides in protein complexes that are known to interact. A multiple sequence alignment (MSA) is the basis of coevolution analysis. MSAs have recently made significant progress in the protein monomer sequence analysis. However, no standard or efficient pipelines are available for the sensitive protein complex MSA (cpxMSA) collection. How to generate cpxMSA is one of the most challenging problems of sequence coevolution analysis. Although several methods have been developed to address this problem, no standalone program exists. Furthermore, the number of built-in properties is limited; hence, it is often difficult for users to analyze sequence coevolution according to their desired cpxMSA. In this article, we developed a novel cpxMSA approach (cpxDeepMSA. We used different protein monomer databases and incorporated the three strategies (genomic distance, phylogeny information, and STRING interaction network) used to join the monomer MSA results of protein complexes, which can prevent using a single method fail to the joint two-monomer MSA causing the cpxMSA construction failure. We anticipate that the cpxDeepMSA algorithm will become a useful high-throughput tool in protein complex structure predictions, inter-protein residue-residue contacts, and the biological sequence coevolution analysis.
Collapse
|
12
|
Paiva VDA, Gomes IDS, Monteiro CR, Mendonça MV, Martins PM, Santana CA, Gonçalves-Almeida V, Izidoro SC, Melo-Minardi RCD, Silveira SDA. Protein structural bioinformatics: An overview. Comput Biol Med 2022; 147:105695. [DOI: 10.1016/j.compbiomed.2022.105695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 06/01/2022] [Accepted: 06/02/2022] [Indexed: 11/27/2022]
|
13
|
Kostenko DO, Korotkov EV. Application of the MAHDS Method for Multiple Alignment of Highly Diverged Amino Acid Sequences. Int J Mol Sci 2022; 23:ijms23073764. [PMID: 35409125 PMCID: PMC8998981 DOI: 10.3390/ijms23073764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/10/2022] Open
Abstract
The aim of this work was to compare the multiple alignment methods MAHDS, T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK in their ability to align highly divergent amino acid sequences. To accomplish this, we created test amino acid sequences with an average number of substitutions per amino acid (x) from 0.6 to 5.6, a total of 81 sets. Comparison of the performance of sequence alignments constructed by MAHDS and previously developed algorithms using the CS and Z score criteria and the benchmark alignment database (BAliBASE) indicated that, although the quality of the alignments built with MAHDS was somewhat lower than that of the other algorithms, it was compensated by greater statistical significance. MAHDS could construct statistically significant alignments of artificial sequences with x ≤ 4.8, whereas the other algorithms (T-Coffee, MUSCLE, Clustal Omega, Kalign, MAFFT, and PRANK) could not perform that at x > 2.4. The application of MAHDS to align 21 families of highly diverged proteins (identity < 20%) from Pfam and HOMSTRAD databases showed that it could calculate statistically significant alignments in cases when the other methods failed. Thus, MAHDS could be used to construct statistically significant multiple alignments of highly divergent protein sequences, which accumulated multiple mutations during evolution.
Collapse
|
14
|
Goh NY, Mohamad Razif MF, Yap YHY, Ng CL, Fung SY. In silico analysis and characterization of medicinal mushroom cystathionine beta-synthase as an angiotensin converting enzyme (ACE) inhibitory protein. Comput Biol Chem 2021; 96:107620. [PMID: 34971900 DOI: 10.1016/j.compbiolchem.2021.107620] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/20/2021] [Accepted: 12/20/2021] [Indexed: 12/17/2022]
Abstract
Angiotensin-converting enzyme (ACE) regulates blood pressure and has been implicated in several conditions including lung injury, fibrosis and Alzheimer's disease. Medicinal mushroom Ganordema lucidum (Reishi) cystathionine beta-synthase (GlCBS) was previously reported to possess ACE inhibitory activities. However, the inhibitory mechanism of CBS protein remains unreported. Therefore, this study integrates in silico sequencing, structural and functional based-analysis, protein modelling, molecular docking and binding affinity calculation to elucidate the inhibitory mechanism of GlCBS and Lignosus rhinocerus (Tiger milk mushroom) CBS protein (LrCBS) towards ACE. In silico analysis indicates that CBSs from both mushrooms share high similarities in terms of physical properties, structural properties and domain distribution. Protein-protein docking analysis revealed that both GlCBS and LrCBS potentially modulate the C-terminal domain of ACE (C-ACE) activity via regulation of chloride activation and/or prevention of substrate entry. GICBS and LrCBS were also shown to interact with ACE at the same region that presumably inhibits the function of ACE.
Collapse
Affiliation(s)
- Neng-Yao Goh
- Medicinal Mushroom Research Group (MMRG), Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Muhammad Fazril Mohamad Razif
- Medicinal Mushroom Research Group (MMRG), Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Yeannie Hui-Yeng Yap
- Department of Oral Biology and Biomedical Sciences, MAHSA University, Selangor, Malaysia
| | - Chyan Leong Ng
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.
| | - Shin-Yee Fung
- Medicinal Mushroom Research Group (MMRG), Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia.
| |
Collapse
|
15
|
Mary C, Fouillen A, Moffatt P, Guadarrama Bello D, Wazen RM, Grenier D, Nanci A. Effect of human secretory calcium-binding phosphoprotein proline-glutamine rich 1 protein on Porphyromonas gingivalis and identification of its active portions. Sci Rep 2021; 11:23724. [PMID: 34887426 PMCID: PMC8660882 DOI: 10.1038/s41598-021-02661-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 11/10/2021] [Indexed: 12/19/2022] Open
Abstract
The mouth environment comprises the second most significant microbiome in the body, and its equilibrium is critical in oral health. Secretory calcium-binding phosphoprotein proline-glutamine rich 1 (SCPPPQ1), a protein normally produced by the gingival epithelium to mediate its attachment to teeth, was suggested to be bactericidal. Our aim was to further explore the antibacterial potential of human SCPPPQ1 by characterizing its mode of action and identifying its active portions. In silico analysis showed that it has molecular parallels with antimicrobial peptides. Incubation of Porphyromonasgingivalis, a major periodontopathogen, with the full-length protein resulted in decrease in bacterial number, formation of aggregates and membrane disruptions. Analysis of SCPPPQ1-derived peptides indicated that these effects are sustained by specific regions of the molecule. Altogether, these data suggest that human SCPPPQ1 exhibits antibacterial capacity and provide new insight into its mechanism of action.
Collapse
Affiliation(s)
- Charline Mary
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada.,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada
| | - Aurélien Fouillen
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada.,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada
| | - Pierre Moffatt
- Department of Human Genetics, McGill University, Montreal, Québec, H3A 0G4, Canada.,Shriners Hospitals for Children-Canada, Montreal, Québec, H4A 0A9, Canada
| | - Dainelys Guadarrama Bello
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada
| | - Rima M Wazen
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada
| | - Daniel Grenier
- Oral Ecology Research Group, Faculty of Dental Medicine, Université Laval, Québec, Québec, G1V 0A6, Canada
| | - Antonio Nanci
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada. .,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, Québec, H3T 1J4, Canada.
| |
Collapse
|
16
|
Overhoff B, Falls Z, Mangione W, Samudrala R. A Deep-Learning Proteomic-Scale Approach for Drug Design. Pharmaceuticals (Basel) 2021; 14:1277. [PMID: 34959678 PMCID: PMC8709297 DOI: 10.3390/ph14121277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 11/27/2021] [Accepted: 11/29/2021] [Indexed: 12/26/2022] Open
Abstract
Computational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach that computes interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning-based autoencoder to first reduce the dimensionality of CANDO-computed drug-proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded "objective" signature. Using this approach, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds were predicted to be significantly (p-value ≤ 0.05) more behaviorally similar relative to all corresponding controls, and 20/20 were predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design performed significantly better than those derived from natural sources (p-value ≤ 0.05), suggesting that the model learned an abstraction of rational drug design. We also show that the designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhanced thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. his study represents a significant step forward in automating holistic therapeutic design with machine learning, enabling the rapid generation of novel, effective, and safe drug leads for any indication.
Collapse
Affiliation(s)
| | | | | | - Ram Samudrala
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, USA; (B.O.); (Z.F.); (W.M.)
| |
Collapse
|
17
|
Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. CELL REPORTS METHODS 2021; 1:100014. [PMID: 34355210 PMCID: PMC8336924 DOI: 10.1016/j.crmeth.2021.100014] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022]
Abstract
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
18
|
Pearce R, Zhang Y. Deep learning techniques have significantly impacted protein structure prediction and protein design. Curr Opin Struct Biol 2021; 68:194-207. [PMID: 33639355 PMCID: PMC8222070 DOI: 10.1016/j.sbi.2021.01.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 01/09/2021] [Accepted: 01/18/2021] [Indexed: 12/26/2022]
Abstract
Protein structure prediction and design can be regarded as two inverse processes governed by the same folding principle. Although progress remained stagnant over the past two decades, the recent application of deep neural networks to spatial constraint prediction and end-to-end model training has significantly improved the accuracy of protein structure prediction, largely solving the problem at the fold level for single-domain proteins. The field of protein design has also witnessed dramatic improvement, where noticeable examples have shown that information stored in neural-network models can be used to advance functional protein design. Thus, incorporation of deep learning techniques into different steps of protein folding and design approaches represents an exciting future direction and should continue to have a transformative impact on both fields.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
19
|
Mulligan VK. Current directions in combining simulation-based macromolecular modeling approaches with deep learning. Expert Opin Drug Discov 2021; 16:1025-1044. [PMID: 33993816 DOI: 10.1080/17460441.2021.1918097] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Introduction: Structure-guided drug discovery relies on accurate computational methods for modeling macromolecules. Simulations provide means of predicting macromolecular folds, of discovering function from structure, and of designing macromolecules to serve as drugs. Success rates are limited for any of these tasks, however. Recently, deep neural network-based methods have greatly enhanced the accuracy of predictions of protein structure from sequence, generating excitement about the potential impact of deep learning.Areas covered: This review introduces biologists to deep neural network architecture, surveys recent successes of deep learning in structure prediction, and discusses emerging deep learning-based approaches for structure-function analysis and design. Particular focus is given to the interplay between simulation-based and neural network-based approaches.Expert opinion: As deep learning grows integral to macromolecular modeling, simulation- and neural network-based approaches must grow more tightly interconnected. Modular software architecture must emerge allowing both types of tools to be combined with maximal versatility. Open sharing of code under permissive licenses will be essential. Although experiments will remain the gold standard for reliable information to guide drug discovery, we may soon see successful drug development projects based on high-accuracy predictions from algorithms that combine simulation with deep learning - the ultimate validation of this combination's power.
Collapse
|
20
|
Xu L, Tong J, Wu Y, Zhao S, Lin BL. A computational evaluation of targeted oxidation strategy (TOS) for potential inhibition of SARS-CoV-2 by disulfiram and analogues. Biophys Chem 2021; 276:106610. [PMID: 34089978 PMCID: PMC8161800 DOI: 10.1016/j.bpc.2021.106610] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 04/29/2021] [Accepted: 05/01/2021] [Indexed: 12/29/2022]
Abstract
In the new millennium, the outbreak of new coronavirus has happened three times: SARS-CoV, MERS-CoV, and SARS-CoV-2. Unfortunately, we still have no pharmaceutical weapons against the diseases caused by these viruses. The pandemic of SARS-CoV-2 reminds us the urgency to search new drugs with totally different mechanism that may target the weaknesses specific to coronaviruses. Herein, we disclose a computational evaluation of targeted oxidation strategy (TOS) for potential inhibition of SARS-CoV-2 by disulfiram, a 70-year-old anti-alcoholism drug, and predict a multiple-target mechanism. A preliminary list of promising TOS drug candidates targeting the two thiol proteases of SARS-CoV-2 are proposed upon virtual screening of 32,143 disulfides.
Collapse
Affiliation(s)
- Luyan Xu
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China; Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiahui Tong
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China; Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiran Wu
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China.
| | - Bo-Lin Lin
- School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China; Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China; Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai 200032, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
21
|
Abstract
Protein aggregation is a widespread phenomenon with important implications in many scientific areas. Although amyloid formation is typically considered as detrimental, functional amyloids that perform physiological roles have been identified in all kingdoms of life. Despite their functional and pathological relevance, the structural details of the majority of molecular species involved in the amyloidogenic process remains elusive. Here, we explore the application of AlphaFold, a highly accurate protein structure predictor, in the field of protein aggregation. While we envision a straightforward application of AlphaFold in assisting the design of globular proteins with improved solubility for biomedical and industrial purposes, the use of this algorithm for predicting the structure of aggregated species seems far from trivial. First, in amyloid diseases, the presence of multiple amyloid polymorphs and the heterogeneity of aggregation intermediates challenges the "one sequence, one structure" paradigm, inherent to sequence-based predictions. Second, aberrant aggregation is not the subject of positive selective pressure, precluding the use of evolutionary-based approaches, which are the core of the AlphaFold pipeline. Instead, amyloid polymorphism seems to be constrained by the need for a defined structure-activity relationship in functional amyloids. They may thus provide a starting point for the application of AlphaFold in the amyloid landscape.
Collapse
|
22
|
Suárez H, Andreu Z, Mazzeo C, Toribio V, Pérez‐Rivera AE, López‐Martín S, García‐Silva S, Hurtado B, Morato E, Peláez L, Arribas EA, Tolentino‐Cortez T, Barreda‐Gómez G, Marina AI, Peinado H, Yáñez‐Mó M. CD9 inhibition reveals a functional connection of extracellular vesicle secretion with mitophagy in melanoma cells. J Extracell Vesicles 2021; 10:e12082. [PMID: 34012515 PMCID: PMC8114031 DOI: 10.1002/jev2.12082] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 03/16/2021] [Accepted: 03/16/2021] [Indexed: 12/19/2022] Open
Abstract
Tetraspanins are often used as Extracellular Vesicle (EV) detection markers because of their abundance on these secreted vesicles. However, data on their function on EV biogenesis are controversial and compensatory mechanisms often occur upon gene deletion. To overcome this handicap, we have compared the effects of tetraspanin CD9 gene deletion with those elicited by cytopermeable peptides with blocking properties against tetraspanin CD9. Both CD9 peptide or gene deletion reduced the number of early endosomes. CD9 peptide induced an increase in lysosome numbers, while CD9 deletion augmented the number of MVB and EV secretion, probably because of compensatory CD63 expression upregulation. In vivo, CD9 peptide delayed primary tumour cell growth and reduced metastasis size. These effects on cell proliferation were shown to be concomitant with an impairment in mitochondrial quality control. CD9 KO cells were able to compensate the mitochondrial malfunction by increasing total mitochondrial mass reducing mitophagy. Our data thus provide the first evidence for a functional connection of tetraspanin CD9 with mitophagy in melanoma cells.
Collapse
Affiliation(s)
- Henar Suárez
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| | - Zoraida Andreu
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| | - Carla Mazzeo
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| | - Víctor Toribio
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| | | | - Soraya López‐Martín
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| | | | - Begoña Hurtado
- Spanish National Cancer Research Centre (CNIO)MadridSpain
| | | | | | | | | | | | | | - Héctor Peinado
- Spanish National Cancer Research Centre (CNIO)MadridSpain
| | - María Yáñez‐Mó
- Departamento de Biología MolecularUniversidad Autónoma de Madrid (UAM)MadridSpain
- Centro de Biología Molecular Severo OchoaInstituto de Investigación Sanitaria La Princesa (IIS‐IP)MadridSpain
| |
Collapse
|
23
|
Membrane Environment Modulates Ligand-Binding Propensity of P2Y12 Receptor. Pharmaceutics 2021; 13:pharmaceutics13040524. [PMID: 33918934 PMCID: PMC8069422 DOI: 10.3390/pharmaceutics13040524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/02/2021] [Accepted: 04/06/2021] [Indexed: 01/17/2023] Open
Abstract
The binding of natural ligands and synthetic drugs to the P2Y12 receptor is of great interest because of its crucial role in platelets activation and the therapy of arterial thrombosis. Up to now, all computational studies of P2Y12 concentrated on the available crystal structures, while the role of intrinsic protein dynamics and the membrane environment in the functioning of P2Y12 was not clear. In this work, we performed all-atom molecular dynamics simulations of the full-length P2Y12 receptor in three different membrane environments and in two possible conformations derived from available crystal structures. The binding of ticagrelor, its two major metabolites, adenosine diphosphate (ADP) and 2-Methylthioadenosine diphosphate (2MeS-ADP) as agonist, and ethyl 6-[4-(benzylsulfonylcarbamoyl)piperidin-1-yl]-5-cyano-2-methylpyridine-3-carboxylate (AZD1283)as antagonist were assessed systematically by means of ensemble docking. It is shown that the binding of all ligands becomes systematically stronger with the increase of the membrane rigidity. Binding of all ligands to the agonist-bound-like conformations is systematically stronger in comparison to antagonist-bound-likes ones. This is dramatically opposite to the results obtained for static crystal structures. Our results show that accounting for internal protein dynamics, strongly modulated by its lipid environment, is crucial for correct assessment of the ligand binding to P2Y12.
Collapse
|
24
|
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021; 17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open
Abstract
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.
Collapse
|
25
|
Zhang GJ, Xie TY, Zhou XG, Wang LJ, Hu J. Protein Structure Prediction Using Population-Based Algorithm Guided by Information Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:697-707. [PMID: 31180869 DOI: 10.1109/tcbb.2019.2921958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ab initio protein structure prediction is one of the most challenging problems in computational biology. Multistage algorithms are widely used in ab initio protein structure prediction. The different computational costs of a multistage algorithm for different proteins are important to be considered. In this study, a population-based algorithm guided by information entropy (PAIE), which includes exploration and exploitation stages, is proposed for protein structure prediction. In PAIE, an entropy-based stage switch strategy is designed to switch from the exploration stage to the exploitation stage. Torsion angle statistical information is also deduced from the first stage and employed to enhance the exploitation in the second stage. Results indicate that an improvement in the performance of protein structure prediction in a benchmark of 30 proteins and 17 other free modeling targets in CASP.
Collapse
|
26
|
Fouillen A, Mary C, Ponce KJ, Moffatt P, Nanci A. A proline rich protein from the gingival seal around teeth exhibits antimicrobial properties against Porphyromonas gingivalis. Sci Rep 2021; 11:2353. [PMID: 33504866 PMCID: PMC7840901 DOI: 10.1038/s41598-021-81791-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 01/08/2021] [Indexed: 12/22/2022] Open
Abstract
The gingival seal around teeth prevents bacteria from destroying the tooth-supporting tissues and disseminating throughout the body. Porphyromonas gingivalis, a major periodontopathogen, degrades components of the specialized extracellular matrix that mediates attachment of the gingiva to the tooth. Of these, secretory calcium-binding phosphoprotein proline-glutamine rich 1 (SCPPPQ1) protein has a distinctive resistance to degradation, suggesting that it may offer resistance to bacterial attack. In silico analysis of its amino acid sequence was used to explore its molecular characteristics and to predict its two- and three-dimensional structure. SCPPPQ1 exhibits similarities with both proline-rich and cationic antimicrobial proteins, suggesting a putative antimicrobial potential. A combination of imaging approaches showed that incubation with 20 μM of purified SCPPPQ1 decrease bacterial number (p < 0.01). Fluorescence intensity decreased by 70% following a 2 h incubation of Porphyromonas gingivalis with the protein. Electron microscopy analyses revealed that SCPPPQ1 induced bacterial membrane disruption and breaches. While SCPPPQ1 has no effect on mammalian cells, our results suggest that it is bactericidal to Porphyromonas gingivalis, and that this protein, normally present in the gingival seal, may be exploited to maintain a healthy seal and prevent systemic dissemination of bacteria.
Collapse
Affiliation(s)
- Aurélien Fouillen
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montreal, QC, Canada.,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC, Canada
| | - Charline Mary
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montreal, QC, Canada.,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC, Canada
| | - Katia Julissa Ponce
- Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC, Canada
| | - Pierre Moffatt
- Department of Human Genetics, McGill University, Montreal, QC, Canada.,Shriners Hospitals for Children - Canada, Montreal, QC, Canada
| | - Antonio Nanci
- Laboratory for the Study of Calcified Tissues and Biomaterials, Faculty of Dental Medicine, Université de Montréal, Montreal, QC, Canada. .,Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montreal, QC, Canada.
| |
Collapse
|
27
|
Seffernick JT, Lindert S. Hybrid methods for combined experimental and computational determination of protein structure. J Chem Phys 2020; 153:240901. [PMID: 33380110 PMCID: PMC7773420 DOI: 10.1063/5.0026025] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 11/10/2020] [Indexed: 02/04/2023] Open
Abstract
Knowledge of protein structure is paramount to the understanding of biological function, developing new therapeutics, and making detailed mechanistic hypotheses. Therefore, methods to accurately elucidate three-dimensional structures of proteins are in high demand. While there are a few experimental techniques that can routinely provide high-resolution structures, such as x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-EM, which have been developed to determine the structures of proteins, these techniques each have shortcomings and thus cannot be used in all cases. However, additionally, a large number of experimental techniques that provide some structural information, but not enough to assign atomic positions with high certainty have been developed. These methods offer sparse experimental data, which can also be noisy and inaccurate in some instances. In cases where it is not possible to determine the structure of a protein experimentally, computational structure prediction methods can be used as an alternative. Although computational methods can be performed without any experimental data in a large number of studies, inclusion of sparse experimental data into these prediction methods has yielded significant improvement. In this Perspective, we cover many of the successes of integrative modeling, computational modeling with experimental data, specifically for protein folding, protein-protein docking, and molecular dynamics simulations. We describe methods that incorporate sparse data from cryo-EM, NMR, mass spectrometry, electron paramagnetic resonance, small-angle x-ray scattering, Förster resonance energy transfer, and genetic sequence covariation. Finally, we highlight some of the major challenges in the field as well as possible future directions.
Collapse
Affiliation(s)
- Justin T. Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
28
|
Costa LSM, Pires ÁS, Damaceno NB, Rigueiras PO, Maximiano MR, Franco OL, Porto WF. In silico characterization of class II plant defensins from Arabidopsis thaliana. PHYTOCHEMISTRY 2020; 179:112511. [PMID: 32931963 DOI: 10.1016/j.phytochem.2020.112511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 08/31/2020] [Accepted: 09/01/2020] [Indexed: 06/11/2023]
Abstract
Defensins comprise a polyphyletic group of multifunctional defense peptides. Cis-defensins, also known as cysteine stabilized αβ (CSαβ) defensins, are one of the most ancient defense peptide families. In plants, these peptides have been divided into two classes, according to their precursor organization. Class I defensins are composed of the signal peptide and the mature sequence, while class II defensins have an additional C-terminal prodomain, which is proteolytically cleaved. Class II defensins have been described in Solanaceae and Poaceae species, indicating this class could be spread among all flowering plants. Here, a search by regular expression (RegEx) was applied to the Arabidopsis thaliana proteome, a model plant with more than 300 predicted defensin genes. Two sequences were identified, A7REG2 and A7REG4, which have a typical plant defensin structure and an additional C-terminal prodomain. TraVA database indicated they are expressed in flower, ovules and seeds, and being duplicated genes, this indicates they could be a result of a subfunctionalization process. The presence of class II defensin sequences in Brassicaceae and Solanaceae and evolutionary distance between them suggest class II defensins may be present in other eudicots. Discovery of class II defensins in other plants could shed some light on flower, ovules and seed physiology, as this class is expressed in these locations.
Collapse
Affiliation(s)
- Laura S M Costa
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil; Departamento de Biologia, Programa de Pós-Graduação em Genética e Biotecnologia, Universidade Federal de Juiz de Fora, Campus Universitário, Juiz de Fora, MG, Brazil
| | - Állan S Pires
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| | - Neila B Damaceno
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| | - Pietra O Rigueiras
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| | - Mariana R Maximiano
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil
| | - Octavio L Franco
- Centro de Análises Proteômicas e Bioquímicas. Programa de Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, DF, Brazil; Departamento de Biologia, Programa de Pós-Graduação em Genética e Biotecnologia, Universidade Federal de Juiz de Fora, Campus Universitário, Juiz de Fora, MG, Brazil; S-Inova Biotech, Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, MS, Brazil
| | - William F Porto
- S-Inova Biotech, Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, MS, Brazil; Porto Reports, Brasília, DF, Brazil.
| |
Collapse
|
29
|
Ding W, Gong H. Predicting the Real-Valued Inter-Residue Distances for Proteins. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2020; 7:2001314. [PMID: 33042750 PMCID: PMC7539185 DOI: 10.1002/advs.202001314] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/06/2020] [Indexed: 05/04/2023]
Abstract
Predicting protein structure from the amino acid sequence has been a challenge with theoretical and practical significance in biophysics. Despite the recent progresses elicited by improved inter-residue contact prediction, contact-based structure prediction has gradually reached the performance ceiling. New methods have been proposed to predict the inter-residue distance, but unanimously by simplifying the real-valued distance prediction into a multiclass classification problem. Here, a lightweight regression-based distance prediction method is shown, which adopts the generative adversarial network to capture the delicate geometric relationship between residue pairs and thus could predict the continuous, real-valued inter-residue distance rapidly and satisfactorily. The predicted residue distance map allows quick structure modeling by the CNS suite, and the constructed models approach the same level of quality as the other state-of-the-art protein structure prediction methods when tested on CASP13 targets. Moreover, this method can be used directly for the structure prediction of membrane proteins without transfer learning.
Collapse
Affiliation(s)
- Wenze Ding
- MOE Key Laboratory of BioinformaticsSchool of Life SciencesTsinghua UniversityBeijing100084China
- Beijing Advanced Innovation Center for Structural BiologyTsinghua UniversityBeijing100084China
| | - Haipeng Gong
- MOE Key Laboratory of BioinformaticsSchool of Life SciencesTsinghua UniversityBeijing100084China
- Beijing Advanced Innovation Center for Structural BiologyTsinghua UniversityBeijing100084China
| |
Collapse
|
30
|
Zhang C, Zheng W, Mortuza SM, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 2020; 36:2105-2112. [PMID: 31738385 DOI: 10.1093/bioinformatics/btz863] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 10/17/2019] [Accepted: 11/15/2019] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. RESULTS We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/DeepMSA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
31
|
Sunita, Singhvi N, Singh Y, Shukla P. Computational approaches in epitope design using DNA binding proteins as vaccine candidate in Mycobacterium tuberculosis. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 83:104357. [PMID: 32438080 DOI: 10.1016/j.meegid.2020.104357] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 05/04/2020] [Accepted: 05/07/2020] [Indexed: 12/28/2022]
Abstract
Mycobacterium tuberculosis (Mtb) is a successful pathogen in the history of mankind. A high rate of mortality and morbidity raises the need for vaccine development. Mechanism of pathogenesis, survival strategy and virulence determinant are needed to be explored well for this pathogen. The involvement of DNA binding proteins in the regulation of virulence genes, transcription, DNA replication, repair make them more significant. In present work, we have identified 1453 DNA binding proteins (DBPs) in the 4173 genes of Mtb through the DNABIND tool and they were subjected for further screening by incorporating different bioinformatics tools. The eighteen DBPs were selected for the B-cell epitope prediction by using ABCpred server. Moreover, the B-cell epitope bearing the antigenic and non- allergenic property were selected for T-cell epitope prediction using ProPredI, and ProPred server. Finally, DGIGSAVSV (Rv1088), IRALPSSRH (Rv3923c), LTISPIANS (Rv3235), VQPSGKGGL (Rv2871) VPRPGPRPG (Rv2731) and VGQKINPHG (Rv0707) were identified as T-cell epitopes. The structural modelling of these epitopes and DBPs was performed to ensure the localization of these epitopes on the respective proteins. The interaction studies of these epitopes with human HLA confirmed their validation to be used as potential vaccine candidates. Collectively, these results revealed that the DBPs- Rv2731, Rv3235, Rv1088, Rv0707, Rv3923c and Rv2871 are the most appropriate vaccine candidates. In our knowledge, it is the first report of using the DBPs of Mtb for epitope prediction. Significantly, this study also provides evidence to be useful for designing a peptide-based vaccine against tuberculosis.
Collapse
Affiliation(s)
- Sunita
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak 124001, Haryana, India; Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi, Delhi 110007, India
| | - Nirjara Singhvi
- Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi, Delhi 110007, India
| | - Yogendra Singh
- Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi, Delhi 110007, India
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak 124001, Haryana, India.
| |
Collapse
|
32
|
Mohammad S, Bouchama A, Mohammad Alharbi B, Rashid M, Saleem Khatlani T, Gaber NS, Malik SS. SARS-CoV-2 ORF8 and SARS-CoV ORF8ab: Genomic Divergence and Functional Convergence. Pathogens 2020; 9:E677. [PMID: 32825438 PMCID: PMC7558349 DOI: 10.3390/pathogens9090677] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 08/17/2020] [Accepted: 08/19/2020] [Indexed: 01/18/2023] Open
Abstract
The COVID-19 pandemic, in the first seven months, has led to more than 15 million confirmed infected cases and 600,000 deaths. SARS-CoV-2, the causative agent for COVID-19, has proved to be a great challenge for its ability to spread in asymptomatic stages and the diverse disease spectrum it has generated. This has created a challenge of unimaginable magnitude, not only affecting human health and life but also potentially generating a long-lasting socioeconomic impact. Both medical sciences and biomedical research have also been challenged, consequently leading to a large number of clinical trials and vaccine initiatives. While known proteins of pathobiological importance are targets for these therapeutic approaches, it is imperative to explore other factors of viral significance. Accessory proteins are one such trait that have diverse roles in coronavirus pathobiology. Here, we analyze certain genomic characteristics of SARS-CoV-2 accessory protein ORF8 and predict its protein features. We have further reviewed current available literature regarding its function and comparatively evaluated these and other features of ORF8 and ORF8ab, its homolog from SARS-CoV. Because coronaviruses have been infecting humans repeatedly and might continue to do so, we therefore expect this study to aid in the development of holistic understanding of these proteins. Despite low nucleotide and protein identity and differentiating genome level characteristics, there appears to be significant structural integrity and functional proximity between these proteins pointing towards their high significance. There is further need for comprehensive genomics and structural-functional studies to lead towards definitive conclusions regarding their criticality and that can eventually define their relevance to therapeutics development.
Collapse
Affiliation(s)
- Sameer Mohammad
- Experimental Medicine Department, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia; (S.M.); (A.B.); (B.M.A.); (N.S.G.)
| | - Abderrezak Bouchama
- Experimental Medicine Department, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia; (S.M.); (A.B.); (B.M.A.); (N.S.G.)
| | - Bothina Mohammad Alharbi
- Experimental Medicine Department, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia; (S.M.); (A.B.); (B.M.A.); (N.S.G.)
| | - Mamoon Rashid
- Bioinformatics and Biostatistics Department, King Abdullah International Medical Research Center, King~Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia;
| | - Tanveer Saleem Khatlani
- Stem Cells Unit, Department of Cellular Therapy, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia;
| | - Nusaibah S. Gaber
- Experimental Medicine Department, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia; (S.M.); (A.B.); (B.M.A.); (N.S.G.)
| | - Shuja Shafi Malik
- Experimental Medicine Department, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, MNGHA, Riyadh 11426, Saudi Arabia; (S.M.); (A.B.); (B.M.A.); (N.S.G.)
| |
Collapse
|
33
|
Zhang B, Zhang X, Pearce R, Shen HB, Zhang Y. A New Protocol for Atomic-Level Protein Structure Modeling and Refinement Using Low-to-Medium Resolution Cryo-EM Density Maps. J Mol Biol 2020; 432:5365-5377. [PMID: 32771523 DOI: 10.1016/j.jmb.2020.07.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 07/14/2020] [Accepted: 07/31/2020] [Indexed: 12/19/2022]
Abstract
The rapid progress of cryo-electron microscopy (cryo-EM) in structural biology has raised an urgent need for robust methods to create and refine atomic-level structural models using low-resolution EM density maps. We propose a new protocol to create initial models using I-TASSER protein structure prediction, followed by EM density map-based rigid-body structure fitting, flexible fragment adjustment and atomic-level structure refinement simulations. The protocol was tested on a large set of 285 non-homologous proteins and generated structural models with correct folds for 260 proteins, where 28% had RMSDs below 2 Å. Compared to other state-of-the-art methods, the major advantage of the proposed pipeline lies in the uniform structure prediction and refinement protocol, as well as the extensive structural re-assembly simulations, which allow for low-to-medium resolution EM density map-guided structure modeling starting from amino acid sequences. Interestingly, the quality of both the image fitting and subsequent structure refinement was found to be strongly correlated with the correctness of the initial I-TASSER models; this is mainly due to the different correlation patterns observed between force field and structural quality for the models with template modeling score (or TM-score, a metric quantifying the similarity of models to the native) above and below a threshold of 0.5. Overall, the results demonstrate a new avenue that is ready to use for large-scale cryo-EM-based structure modeling and atomic-level density map-guided structure refinement.
Collapse
Affiliation(s)
- Biao Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xi Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
34
|
Lin M, Wang F, Zhu Y. Modeled structure-based computational redesign of a glycosyltransferase for the synthesis of rebaudioside D from rebaudioside A. Biochem Eng J 2020. [DOI: 10.1016/j.bej.2020.107626] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
35
|
Fahmi M, Kubota Y, Ito M. Nonstructural proteins NS7b and NS8 are likely to be phylogenetically associated with evolution of 2019-nCoV. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 81:104272. [PMID: 32142938 PMCID: PMC7106073 DOI: 10.1016/j.meegid.2020.104272] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/01/2020] [Accepted: 03/03/2020] [Indexed: 12/12/2022]
Abstract
The seventh novel human infecting Betacoronavirus that causes pneumonia (2019 novel coronavirus, 2019-nCoV) originated in Wuhan, China. The evolutionary relationship between 2019-nCoV and the other human respiratory illness-causing coronavirus is not closely related. We sought to characterize the relationship of the translated proteins of 2019-nCoV with other species of Orthocoronavirinae. A phylogenetic tree was constructed from the genome sequences. A cluster tree was developed from the profiles retrieved from the presence and absence of homologs of ten 2019-nCoV proteins. The combined data were used to characterize the relationship of the translated proteins of 2019-nCoV to other species of Orthocoronavirinae. Our analysis reliably suggests that 2019-nCoV is most closely related to BatCoV RaTG13 and belongs to subgenus Sarbecovirus of Betacoronavirus, together with SARS coronavirus and Bat-SARS-like coronavirus. The phylogenetic profiling cluster of homolog proteins of one annotated 2019-nCoV protein against other genome sequences revealed two clades of ten 2019-nCoV proteins. Clade 1 consisted of a group of conserved proteins in Orthocoronavirinae comprising Orf1ab polyprotein, Nucleocapsid protein, Spike glycoprotein, and Membrane protein. Clade 2 comprised six proteins exclusive to Sarbecovirus and Hibecovirus. Two of six Clade 2 nonstructural proteins, NS7b and NS8, were exclusively conserved among 2019-nCoV, BetaCoV_RaTG, and BatSARS-like Cov. NS7b and NS8 have previously been shown to affect immune response signaling in the SARS-CoV experimental model. Thus, we speculated that knowledge of the functional changes in the NS7b and NS8 proteins during evolution may provide important information to explore the human infective property of 2019-nCoV.
Collapse
Affiliation(s)
- Muhamad Fahmi
- Advanced Life Sciences Program, Graduate School of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan
| | - Yukihiko Kubota
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan
| | - Masahiro Ito
- Advanced Life Sciences Program, Graduate School of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan; Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan.
| |
Collapse
|
36
|
Li Y, Hu J, Zhang C, Yu DJ, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2020; 35:4647-4655. [PMID: 31070716 DOI: 10.1093/bioinformatics/btz291] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 03/18/2019] [Accepted: 04/17/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Contact-map of a protein sequence dictates the global topology of structural fold. Accurate prediction of the contact-map is thus essential to protein 3D structure prediction, which is particularly useful for the protein sequences that do not have close homology templates in the Protein Data Bank. RESULTS We developed a new method, ResPRE, to predict residue-level protein contacts using inverse covariance matrix (or precision matrix) of multiple sequence alignments (MSAs) through deep residual convolutional neural network training. The approach was tested on a set of 158 non-homologous proteins collected from the CASP experiments and achieved an average accuracy of 50.6% in the top-L long-range contact prediction with L being the sequence length, which is 11.7% higher than the best of other state-of-the-art approaches ranging from coevolution coupling analysis to deep neural network training. Detailed data analyses show that the major advantage of ResPRE lies at the utilization of precision matrix that helps rule out transitional noises of contact-maps compared with the previously used covariance matrix. Meanwhile, the residual network with parallel shortcut layer connections increases the learning ability of deep neural network training. It was also found that appropriate collection of MSAs can further improve the accuracy of final contact-map predictions. The standalone package and online server of ResPRE are made freely available, which should bring important impact on protein structure and function modeling studies in particular for the distant- and non-homology protein targets. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/ResPRE and https://github.com/leeyang/ResPRE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| |
Collapse
|
37
|
AlQuraishi M. AlphaFold at CASP13. Bioinformatics 2020; 35:4862-4865. [PMID: 31116374 DOI: 10.1093/bioinformatics/btz422] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 03/26/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.,Lab of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
38
|
Jin S, Chen M, Chen X, Bueno C, Lu W, Schafer NP, Lin X, Onuchic JN, Wolynes PG. Protein Structure Prediction in CASP13 Using AWSEM-Suite. J Chem Theory Comput 2020; 16:3977-3988. [PMID: 32396727 DOI: 10.1021/acs.jctc.0c00188] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Recently several techniques have emerged that significantly enhance the quality of predictions of protein tertiary structures. In this study, we describe the performance of AWSEM-Suite, an algorithm that incorporates template-based modeling and coevolutionary restraints with a realistic coarse-grained force field, AWSEM. With its roots in neural networks, AWSEM contains both physical and bioinformatical energies that have been optimized using energy landscape theory. AWSEM-Suite participated in CASP13 as a server predictor and generated reliable predictions for most targets. AWSEM-Suite ranked eighth in both the free-modeling category and the hard-to-model category and in one case provided the best submitted prediction. Here we critically discuss the prediction performance of AWSEM-Suite using several examples from different categories in CASP13. Structure prediction tests on these selected targets, two of them being hard-to-model targets, show that AWSEM-Suite can achieve high-resolution structure prediction after incorporating both template guidances and coevolutionary restraints even when homology is weak. For targets with reliable templates (template-easy category), introducing coevolutionary restraints sometimes damages the overall quality of the predictions. Free energy profile analyses demonstrate, however, that the incorporations of both of these evolutionarily informed terms effectively increase the funneling of the landscape toward native-like structures while still allowing sufficient flexibility to correct for discrepancies between the correct target structure and the provided guidance. In contrast to other predictors that are exclusively oriented toward structure prediction, the connection of AWSEM-Suite to a statistical mechanical basis and affiliated molecular dynamics and importance sampling simulations makes it suitable for functional explorations.
Collapse
Affiliation(s)
| | | | - Xun Chen
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | | | - Wei Lu
- Department of Physics, Rice University, Houston, Texas 77005, United States
| | | | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - José N Onuchic
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| | - Peter G Wolynes
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
39
|
Zaman AB, Kamranfar P, Domeniconi C, Shehu A. Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering. Molecules 2020; 25:E2228. [PMID: 32397410 PMCID: PMC7248879 DOI: 10.3390/molecules25092228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 04/21/2020] [Accepted: 04/28/2020] [Indexed: 11/16/2022] Open
Abstract
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Parastoo Kamranfar
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|
40
|
Linden LDS, Bustamante-Filho IC, Souza APB, Lopes TN, Silva AFT, Tomé LM, Timmers LFMS, Santos SI, Neves AP. Structural modelling of the equine protein disulphide isomerase A1 and its quantification in the epididymis and seminal plasma. Andrologia 2020; 52:e13530. [DOI: 10.1111/and.13530] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 12/18/2019] [Accepted: 01/05/2020] [Indexed: 01/02/2023] Open
Affiliation(s)
- Liana de Salles Linden
- Programa de Pós‐graduação em Medicina Animal: Equinos Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre Brazil
| | | | | | - Tayná Nauê Lopes
- Laboratório de Biotecnologia Universidade do Vale do Taquari – Univates Lajeado Brazil
| | | | - Luise Marcon Tomé
- Laboratório de Biotecnologia Universidade do Vale do Taquari – Univates Lajeado Brazil
| | | | | | - Adriana Pires Neves
- Programa de Pós‐graduação em Medicina Animal: Equinos Universidade Federal do Rio Grande do Sul (UFRGS) Porto Alegre Brazil
- Universidade Federal do Pampa (UNIPAMPA) Dom Pedrito Brazil
| |
Collapse
|
41
|
Abstract
The purpose of this quick guide is to help new modelers who have little or no background in comparative modeling yet are keen to produce high-resolution protein 3D structures for their study by following systematic good modeling practices, using affordable personal computers or online computational resources. Through the available experimental 3D-structure repositories, the modeler should be able to access and use the atomic coordinates for building homology models. We also aim to provide the modeler with a rationale behind making a simple list of atomic coordinates suitable for computational analysis abiding to principles of physics (e.g., molecular mechanics). Keeping that objective in mind, these quick tips cover the process of homology modeling and some postmodeling computations such as molecular docking and molecular dynamics (MD). A brief section was left for modeling nonprotein molecules, and a short case study of homology modeling is discussed.
Collapse
Affiliation(s)
- Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
42
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
43
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020; 577:706-710. [PMID: 31942072 DOI: 10.1038/s41586-019-1923-7] [Citation(s) in RCA: 1349] [Impact Index Per Article: 337.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 12/10/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T Jones
- The Francis Crick Institute, London, UK.,University College London, London, UK
| | | | | | | |
Collapse
|
44
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 2019; 87:1141-1148. [PMID: 31602685 PMCID: PMC7079254 DOI: 10.1002/prot.25834] [Citation(s) in RCA: 169] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/17/2022]
Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T. Jones
- The Francis Crick InstituteLondonUK
- University College LondonLondonUK
| | | | | | | |
Collapse
|
45
|
Li Y, Zhang C, Bell EW, Yu DJ, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 2019; 87:1082-1091. [PMID: 31407406 PMCID: PMC6851483 DOI: 10.1002/prot.25798] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/20/2019] [Accepted: 08/08/2019] [Indexed: 12/26/2022]
Abstract
We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
Collapse
Affiliation(s)
- Yang Li
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Dong-Jun Yu
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| |
Collapse
|
46
|
Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 2019; 20:681-697. [PMID: 31417196 PMCID: PMC7032036 DOI: 10.1038/s41580-019-0163-x] [Citation(s) in RCA: 365] [Impact Index Per Article: 73.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2019] [Indexed: 12/18/2022]
Abstract
The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.
Collapse
Affiliation(s)
- Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA.
| | - Philip Bradley
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
- Institute for Protein Design, University of Washington, Seattle, WA, USA.
| |
Collapse
|
47
|
Wang Y, Shi Q, Yang P, Zhang C, Mortuza SM, Xue Z, Ning K, Zhang Y. Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families. Genome Biol 2019; 20:229. [PMID: 31676016 PMCID: PMC6825341 DOI: 10.1186/s13059-019-1823-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 09/13/2019] [Indexed: 02/01/2023] Open
Abstract
INTRODUCTION The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.
Collapse
Affiliation(s)
- Yan Wang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Qiang Shi
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Pengshuo Yang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhidong Xue
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Kang Ning
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
48
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
49
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
50
|
Assembling multidomain protein structures through analogous global structural alignments. Proc Natl Acad Sci U S A 2019; 116:15930-15938. [PMID: 31341084 DOI: 10.1073/pnas.1905068116] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Most proteins exist with multiple domains in cells for cooperative functionality. However, structural biology and protein folding methods are often optimized for single-domain structures, resulting in a rapidly growing gap between the improved capability for tertiary structure determination and high demand for multidomain structure models. We have developed a pipeline, termed DEMO, for constructing multidomain protein structures by docking-based domain assembly simulations, with interdomain orientations determined by the distance profiles from analogous templates as detected through domain-level structure alignments. The pipeline was tested on a comprehensive benchmark set of 356 proteins consisting of 2-7 continuous and discontinuous domains, for which DEMO generated models with correct global fold (TM-score > 0.5) for 86% of cases with continuous domains and for 100% of cases with discontinuous domain structures, starting from randomly oriented target-domain structures. DEMO was also applied to reassemble multidomain targets in the CASP12 and CASP13 experiments using domain structures excised from the top server predictions, where the full-length DEMO models showed a significantly improved quality over the original server models. Finally, sparse restraints of mass spectrometry-generated cross-linking data and cryo-EM density maps are incorporated into DEMO, resulting in improvements in the average TM-score by 6.3% and 12.5%, respectively. The results demonstrate an efficient approach to assembling multidomain structures, which can be easily used for automated, genome-scale multidomain protein structure assembly.
Collapse
|