1
|
Singh NK, Agarwal M, Radhakrishna M. Statistical analysis of the unique characteristics of secondary structures in proteins. Comput Biol Chem 2024; 113:108237. [PMID: 39393289 DOI: 10.1016/j.compbiolchem.2024.108237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/11/2024] [Accepted: 09/30/2024] [Indexed: 10/13/2024]
Abstract
Protein folding is a complex process influenced by the primary sequence of amino acids. Early studies focused on understanding whether the specificity or the conservation of properties of amino acids was crucial for folding into secondary structures such as α-helices, β-sheets, turns, and coils. However, with the advent of artificial intelligence (AI) and machine learning (ML), the emphasis has shifted towards the precise nature and occurrence of specific amino acids. In our study, we analyzed a large set of proteins from diverse organisms to identify unique features of secondary structures, particularly in terms of the distribution of polar, non-polar, and charged amino acid residues. We found that α-helices tend to have a higher proportion of charged and non-polar groups compared to other secondary structures and that the presence of oppositely charged amino acid residues in helices stabilizes them, facilitating the formation of longer helices. These characteristics are distinct to α-helices. This study offers valuable insights for researchers in the field of protein design, enabling the de-novo creation of short helical peptides for a range of applications. We have also developed a web server for extensive analysis of proteins from different databases. The web server is housed at https://proseqanalyser.iitgn.ac.in/.
Collapse
Affiliation(s)
- Nitin Kumar Singh
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Manish Agarwal
- Computer Services Centre, Indian Institute of Technology (IIT) Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Mithun Radhakrishna
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India; Center for Biomedical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India.
| |
Collapse
|
2
|
Wang BR, Zhi WX, Han SY, Zhao HF, Liu YX, Xu SY, Zhang YH, Mu ZS. Adaptability to the environment of protease by secondary structure changes and application to enzyme-selective hydrolysis. Int J Biol Macromol 2024; 278:134969. [PMID: 39179060 DOI: 10.1016/j.ijbiomac.2024.134969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/05/2024] [Accepted: 08/20/2024] [Indexed: 08/26/2024]
Abstract
The reactions involving enzymes are significantly influenced by various environmental factors. Clarity of how the activity and structure of proteases impact their function is crucial for more efficient application of enzymes as a tool. The impact of temperature, pH, and ionic strength on changes in protease activity, secondary structure, and protein conformation during enzymatic hydrolysis were investigated in this study. The enzymatic activity and secondary structure of acid-base protease were found to undergo significant modifications under different physical conditions, as demonstrated by UV spectrophotometry and FTIR spectroscopy analysis. Specifically, variations in α-helix and β-fold content were observed to correlate with changes in enzyme activity. Molecular simulation analysis revealed that physical conditions have varying effects on the protease, particularly influencing enzyme activity and secondary structure. Evaluation of the proteases indicated alterations in both enzyme activity and structure. This treatment selectively hydrolyzed β-lactoglobulin and reduced sensitization. These findings offer novel perspectives on the functionalities and regulatory mechanisms of proteases, as well as potential industrial applications.
Collapse
Affiliation(s)
- Bao-Rong Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Wen-Xiu Zhi
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Si-Yi Han
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Hong-Fu Zhao
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Ye-Xuan Liu
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Shi-Yao Xu
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Ying-Hua Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China.
| | - Zhi-Shen Mu
- Inner Mongolia Enterprise Key Laboratory of Dairy Nutrition, Health & Safety, Inner Mongolia Mengniu Dairy (Group) Co., Ltd., Huhhot 011500, PR China.
| |
Collapse
|
3
|
Coskuner-Weber O. Structures prediction and replica exchange molecular dynamics simulations of α-synuclein: A case study for intrinsically disordered proteins. Int J Biol Macromol 2024; 276:133813. [PMID: 38996889 DOI: 10.1016/j.ijbiomac.2024.133813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/08/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
In recent years, a variety of three-dimensional structure prediction tools, including AlphaFold2, AlphaFold3, I-TASSER, C-I-TASSER, Phyre2, ESMFold, and RoseTTAFold, have been employed in the investigation of intrinsically disordered proteins. However, a comprehensive validation of these tools specifically for intrinsically disordered proteins has yet to be conducted. In this study, we utilize AlphaFold2, AlphaFold3, I-TASSER, C-I-TASSER, Phyre2, ESMFold, and RoseTTAFold to predict the structure of a model intrinsically disordered α-synuclein protein. Additionally, extensive replica exchange molecular dynamics simulations of the intrinsically disordered protein are conducted. The resulting structures from both structure prediction tools and replica exchange molecular dynamics simulations are analyzed for radius of gyration, secondary and tertiary structure properties, as well as Cα and Hα chemical shift values. A comparison of the obtained results with experimental data reveals that replica exchange molecular dynamics simulations provide results in excellent agreement with experimental observations. However, none of the structure prediction tools utilized in this study can fully capture the structural characteristics of the model intrinsically disordered protein. This study shows that a cluster of ensembles are required for intrinsically disordered proteins. Artificial-intelligence based structure prediction tools such as AlphaFold3 and C-I-TASSER could benefit from stochastic sampling or Monte Carlo simulations for generating an ensemble of structures for intrinsically disordered proteins.
Collapse
Affiliation(s)
- Orkid Coskuner-Weber
- Turkish-German University, Molecular Biotechnology, Sahinkaya Caddesi, No. 106, Beykoz, Istanbul 34820, Turkey.
| |
Collapse
|
4
|
Pesce F, Bremer A, Tesei G, Hopkins JB, Grace CR, Mittag T, Lindorff-Larsen K. Design of intrinsically disordered protein variants with diverse structural properties. SCIENCE ADVANCES 2024; 10:eadm9926. [PMID: 39196930 PMCID: PMC11352843 DOI: 10.1126/sciadv.adm9926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 06/07/2024] [Indexed: 08/30/2024]
Abstract
Intrinsically disordered proteins (IDPs) perform a broad range of functions in biology, suggesting that the ability to design IDPs could help expand the repertoire of proteins with novel functions. Computational design of IDPs with specific conformational properties has, however, been difficult because of their substantial dynamics and structural complexity. We describe a general algorithm for designing IDPs with specific structural properties. We demonstrate the power of the algorithm by generating variants of naturally occurring IDPs that differ in compaction, long-range contacts, and propensity to phase separate. We experimentally tested and validated our designs and analyzed the sequence features that determine conformations. We show how our results are captured by a machine learning model, enabling us to speed up the algorithm. Our work expands the toolbox for computational protein design and will facilitate the design of proteins whose functions exploit the many properties afforded by protein disorder.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anne Bremer
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Giulio Tesei
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jesse B. Hopkins
- BioCAT, Department of Physics, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Christy R. Grace
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Tanja Mittag
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Majila K, Viswanath S. StrIDR: a database of intrinsically disordered regions of proteins with experimentally resolved structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.22.609111. [PMID: 39253485 PMCID: PMC11382991 DOI: 10.1101/2024.08.22.609111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Motivation Intrinsically disordered regions (IDRs) of proteins exist as an ensemble of conformations, and not as a single structure. Existing databases contain extensive, experimentally derived annotations of intrinsic disorder for millions of proteins at the sequence level. However, only a tiny fraction of these IDRs are associated with an experimentally determined protein structure. Moreover, even if a structure exists, parts of the disordered regions may still be unresolved. Results Here we organize Structures of Intrinsically Disordered Regions (StrIDR), a database of IDRs confirmed via experimental or homology-based evidence, resolved in experimentally determined structures. The database can provide useful insights into the dynamics, folding, and interactions of IDRs. It can also facilitate computational studies on IDRs, such as those using molecular dynamics simulations and/or machine learning. Availability StrIDR is available at https://isblab.ncbs.res.in/stridr. The web UI allows for downloading PDB structures and SIFTS mappings of individual entries. Additionally, the entire database can be downloaded in a JSON format. The source code for creating and updating the database is available at https://github.com/isblab/stridr.
Collapse
Affiliation(s)
- Kartik Majila
- National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065
| | - Shruthi Viswanath
- National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065
| |
Collapse
|
6
|
Cagliani R, Forni D, Mozzi A, Fuchs R, Tussia-Cohen D, Arrigoni F, Pozzoli U, De Gioia L, Hagai T, Sironi M. Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes. Mol Biol Evol 2024; 41:msae154. [PMID: 39101471 PMCID: PMC11299033 DOI: 10.1093/molbev/msae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/16/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
Several mammalian genes have originated from the domestication of retrotransposons, selfish mobile elements related to retroviruses. Some of the proteins encoded by these genes have maintained virus-like features; including self-processing, capsid structure formation, and the generation of different isoforms through -1 programmed ribosomal frameshifting. Using quantitative approaches in molecular evolution and biophysical analyses, we studied 28 retrotransposon-derived genes, with a focus on the evolution of virus-like features. By analyzing the rate of synonymous substitutions, we show that the -1 programmed ribosomal frameshifting mechanism in three of these genes (PEG10, PNMA3, and PNMA5) is conserved across mammals and originates alternative proteins. These genes were targets of positive selection in primates, and one of the positively selected sites affects a B-cell epitope on the spike domain of the PNMA5 capsid, a finding reminiscent of observations in infectious viruses. More generally, we found that retrotransposon-derived proteins vary in their intrinsically disordered region content and this is directly associated with their evolutionary rates. Most positively selected sites in these proteins are located in intrinsically disordered regions and some of them impact protein posttranslational modifications, such as autocleavage and phosphorylation. Detailed analyses of the biophysical properties of intrinsically disordered regions showed that positive selection preferentially targeted regions with lower conformational entropy. Furthermore, positive selection introduces variation in binary sequence patterns across orthologues, as well as in chain compaction. Our results shed light on the evolutionary trajectories of a unique class of mammalian genes and suggest a novel approach to study how intrinsically disordered region biophysical characteristics are affected by evolution.
Collapse
Affiliation(s)
- Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Rotem Fuchs
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dafna Tussia-Cohen
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy
| | - Tzachi Hagai
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| |
Collapse
|
7
|
Agarwal V, McShan AC. The power and pitfalls of AlphaFold2 for structure prediction beyond rigid globular proteins. Nat Chem Biol 2024; 20:950-959. [PMID: 38907110 DOI: 10.1038/s41589-024-01638-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 04/29/2024] [Indexed: 06/23/2024]
Abstract
Artificial intelligence-driven advances in protein structure prediction in recent years have raised the question: has the protein structure-prediction problem been solved? Here, with a focus on nonglobular proteins, we highlight the many strengths and potential weaknesses of DeepMind's AlphaFold2 in the context of its biological and therapeutic applications. We summarize the subtleties associated with evaluation of AlphaFold2 model quality and reliability using the predicted local distance difference test (pLDDT) and predicted aligned error (PAE) values. We highlight various classes of proteins that AlphaFold2 can be applied to and the caveats involved. Concrete examples of how AlphaFold2 models can be integrated with experimental data in the form of small-angle X-ray scattering (SAXS), solution NMR, cryo-electron microscopy (cryo-EM) and X-ray diffraction are discussed. Finally, we highlight the need to move beyond structure prediction of rigid, static structural snapshots toward conformational ensembles and alternate biologically relevant states. The overarching theme is that careful consideration is due when using AlphaFold2-generated models to generate testable hypotheses and structural models, rather than treating predicted models as de facto ground truth structures.
Collapse
Affiliation(s)
- Vinayak Agarwal
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
8
|
Li Y, Ma K, Dong Z, Gao S, Zhang J, Huang S, Yang J, Fang G, Li Y, Li X, Welch C, Griffin EL, Ramaswamy P, Valivullah Z, Liu X, Dong J, Wang DW, Du, Chung WK, Li Y. Frameshift variants in C10orf71 cause dilated cardiomyopathy in human, mouse, and organoid models. J Clin Invest 2024; 134:e177172. [PMID: 38950288 PMCID: PMC11178530 DOI: 10.1172/jci177172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 04/25/2024] [Indexed: 07/03/2024] Open
Abstract
Research advances over the past 30 years have confirmed a critical role for genetics in the etiology of dilated cardiomyopathies (DCMs). However, full knowledge of the genetic architecture of DCM remains incomplete. We identified candidate DCM causal gene, C10orf71, in a large family with 8 patients with DCM by whole-exome sequencing. Four loss-of-function variants of C10orf71 were subsequently identified in an additional group of492 patients with sporadic DCM from 2 independent cohorts. C10orf71 was found to be an intrinsically disordered protein specifically expressed in cardiomyocytes. C10orf71-KO mice had abnormal heart morphogenesis during embryonic development and cardiac dysfunction as adults with altered expression and splicing of contractile cardiac genes. C10orf71-null cardiomyocytes exhibited impaired contractile function with unaffected sarcomere structure. Cardiomyocytes and heart organoids derived from human induced pluripotent stem cells with C10orf71 frameshift variants also had contractile defects with normal electrophysiological activity. A rescue study using a cardiac myosin activator, omecamtiv mecarbil, restored contractile function in C10orf71-KO mice. These data support C10orf71 as a causal gene for DCM by contributing to the contractile function of cardiomyocytes. Mutation-specific pathophysiology may suggest therapeutic targets and more individualized therapy.
Collapse
Affiliation(s)
- Yang Li
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Ke Ma
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Zhujun Dong
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Shijuan Gao
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Jing Zhang
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Shan Huang
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Jie Yang
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Guangming Fang
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Yujie Li
- Novogene Co. Ltd., Beijing, China
| | - Xiaowei Li
- Department of Cardiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Carrie Welch
- Department of Pediatrics, Columbia University, New York, New York, USA
| | - Emily L. Griffin
- Department of Pediatrics, Columbia University, New York, New York, USA
| | | | | | | | - Jianzeng Dong
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Department of Cardiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Dao Wen Wang
- Division of Cardiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Du
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| | - Wendy K. Chung
- Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Yulin Li
- Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
- The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing, China
| |
Collapse
|
9
|
Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: How structure and disorder predictors perform differently. Proteins 2024; 92:757-767. [PMID: 38226524 DOI: 10.1002/prot.26652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/18/2023] [Accepted: 12/01/2023] [Indexed: 01/17/2024]
Abstract
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| |
Collapse
|
10
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput Biol 2024; 20:e1012144. [PMID: 38781245 PMCID: PMC11152266 DOI: 10.1371/journal.pcbi.1012144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/05/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
11
|
Venanzi NE, Basciu A, Vargiu AV, Kiparissides A, Dalby PA, Dikicioglu D. Machine Learning Integrating Protein Structure, Sequence, and Dynamics to Predict the Enzyme Activity of Bovine Enterokinase Variants. J Chem Inf Model 2024; 64:2681-2694. [PMID: 38386417 PMCID: PMC11005043 DOI: 10.1021/acs.jcim.3c00999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 02/24/2024]
Abstract
Despite recent advances in computational protein science, the dynamic behavior of proteins, which directly governs their biological activity, cannot be gleaned from sequence information alone. To overcome this challenge, we propose a framework that integrates the peptide sequence, protein structure, and protein dynamics descriptors into machine learning algorithms to enhance their predictive capabilities and achieve improved prediction of the protein variant function. The resulting machine learning pipeline integrates traditional sequence and structure information with molecular dynamics simulation data to predict the effects of multiple point mutations on the fold improvement of the activity of bovine enterokinase variants. This study highlights how the combination of structural and dynamic data can provide predictive insights into protein functionality and address protein engineering challenges in industrial contexts.
Collapse
Affiliation(s)
| | - Andrea Basciu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Attilio Vittorio Vargiu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Alexandros Kiparissides
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
- Department
of Chemical Engineering, Aristotle University
of Thessaloniki, 54 124 Thessaloniki, Greece
| | - Paul A. Dalby
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| | - Duygu Dikicioglu
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| |
Collapse
|
12
|
Yu Q, Wang Z, Tu Y, Cao Y, Zhu H, Shao J, Zhuang R, Zhou Y, Zhang J. Proteasome activation: A novel strategy for targeting undruggable intrinsically disordered proteins. Bioorg Chem 2024; 145:107217. [PMID: 38368657 DOI: 10.1016/j.bioorg.2024.107217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/23/2024] [Accepted: 02/14/2024] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins (IDPs) are characterized by their inability to adopt well-defined tertiary structures under physiological conditions. Nonetheless, they often play pivotal roles in the progression of various diseases, including cancer, neurodegenerative disorders, and cardiovascular ailments. Owing to their inherent dynamism, conventional drug design approaches based on structural considerations encounter substantial challenges when applied to IDPs. Consequently, the pursuit of therapeutic interventions directed towards IDPs presents a complex endeavor. While there are indeed existing methodologies for targeting IDPs, they are encumbered by noteworthy constrains. Hence, there exists an imminent imperative to investigate more efficacious and universally applicable strategies for modulating IDPs. Here, we present an overview of the latest advancements in the research pertaining to IDPs, along with the indirect regulation approach involving the modulation of IDP degradation through proteasome. By comprehending these advancements in research, novel insights can be generated to facilitate the development of new drugs targeted at addressing the accumulation of IDPs in diverse pathological conditions.
Collapse
Affiliation(s)
- Qian Yu
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, 310015, Zhejiang Province, China; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Province, China
| | - Zheng Wang
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, 310015, Zhejiang Province, China; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Province, China
| | - Yutong Tu
- The National Center for Drug Screening, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China
| | - Yu Cao
- Department of Pharmaceutical Preparation, Hangzhou Xixi Hospital, Hangzhou, 310023, Zhejiang Province, China
| | - Huajian Zhu
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, 310015, Zhejiang Province, China; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Province, China
| | - Jiaan Shao
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, 310015, Zhejiang Province, China; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Province, China
| | - Rangxiao Zhuang
- Department of Pharmaceutical Preparation, Hangzhou Xixi Hospital, Hangzhou, 310023, Zhejiang Province, China.
| | - Yubo Zhou
- The National Center for Drug Screening, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, 201203, China.
| | - Jiankang Zhang
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of Medicine, Hangzhou City University, Hangzhou, 310015, Zhejiang Province, China; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Province, China.
| |
Collapse
|
13
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.08.579522. [PMID: 38370653 PMCID: PMC10871340 DOI: 10.1101/2024.02.08.579522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
14
|
Tesei G, Trolle AI, Jonsson N, Betz J, Knudsen FE, Pesce F, Johansson KE, Lindorff-Larsen K. Conformational ensembles of the human intrinsically disordered proteome. Nature 2024; 626:897-904. [PMID: 38297118 DOI: 10.1038/s41586-023-07004-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024]
Abstract
Intrinsically disordered proteins and regions (collectively, IDRs) are pervasive across proteomes in all kingdoms of life, help to shape biological functions and are involved in numerous diseases. IDRs populate a diverse set of transiently formed structures and defy conventional sequence-structure-function relationships1. Developments in protein science have made it possible to predict the three-dimensional structures of folded proteins at the proteome scale2. By contrast, there is a lack of knowledge about the conformational properties of IDRs, partly because the sequences of disordered proteins are poorly conserved and also because only a few of these proteins have been characterized experimentally. The inability to predict structural properties of IDRs across the proteome has limited our understanding of the functional roles of IDRs and how evolution shapes them. As a supplement to previous structural studies of individual IDRs3, we developed an efficient molecular model to generate conformational ensembles of IDRs and thereby to predict their conformational properties from sequences4,5. Here we use this model to simulate nearly all of the IDRs in the human proteome. Examining conformational ensembles of 28,058 IDRs, we show how chain compaction is correlated with cellular function and localization. We provide insights into how sequence features relate to chain compaction and, using a machine-learning model trained on our simulation data, show the conservation of conformational properties across orthologues. Our results recapitulate observations from previous studies of individual protein systems and exemplify how to link-at the proteome scale-conformational ensembles with cellular function and localization, amino acid sequence, evolutionary conservation and disease variants. Our freely available database of conformational properties will encourage further experimental investigation and enable the generation of hypotheses about the biological roles and evolution of IDRs.
Collapse
Affiliation(s)
- Giulio Tesei
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Anna Ida Trolle
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Nicolas Jonsson
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Johannes Betz
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Frederik E Knudsen
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Francesco Pesce
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kristoffer E Johansson
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
15
|
Versini R, Sritharan S, Aykac Fas B, Tubiana T, Aimeur SZ, Henri J, Erard M, Nüsse O, Andreani J, Baaden M, Fuchs P, Galochkina T, Chatzigoulas A, Cournia Z, Santuz H, Sacquin-Mora S, Taly A. A Perspective on the Prospective Use of AI in Protein Structure Prediction. J Chem Inf Model 2024; 64:26-41. [PMID: 38124369 DOI: 10.1021/acs.jcim.3c01361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
AlphaFold2 (AF2) and RoseTTaFold (RF) have revolutionized structural biology, serving as highly reliable and effective methods for predicting protein structures. This article explores their impact and limitations, focusing on their integration into experimental pipelines and their application in diverse protein classes, including membrane proteins, intrinsically disordered proteins (IDPs), and oligomers. In experimental pipelines, AF2 models help X-ray crystallography in resolving the phase problem, while complementarity with mass spectrometry and NMR data enhances structure determination and protein flexibility prediction. Predicting the structure of membrane proteins remains challenging for both AF2 and RF due to difficulties in capturing conformational ensembles and interactions with the membrane. Improvements in incorporating membrane-specific features and predicting the structural effect of mutations are crucial. For intrinsically disordered proteins, AF2's confidence score (pLDDT) serves as a competitive disorder predictor, but integrative approaches including molecular dynamics (MD) simulations or hydrophobic cluster analyses are advocated for accurate dynamics representation. AF2 and RF show promising results for oligomeric models, outperforming traditional docking methods, with AlphaFold-Multimer showing improved performance. However, some caveats remain in particular for membrane proteins. Real-life examples demonstrate AF2's predictive capabilities in unknown protein structures, but models should be evaluated for their agreement with experimental data. Furthermore, AF2 models can be used complementarily with MD simulations. In this Perspective, we propose a "wish list" for improving deep-learning-based protein folding prediction models, including using experimental data as constraints and modifying models with binding partners or post-translational modifications. Additionally, a meta-tool for ranking and suggesting composite models is suggested, driving future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Raphaelle Versini
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sujith Sritharan
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Burcu Aykac Fas
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Thibault Tubiana
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Sana Zineb Aimeur
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Julien Henri
- Sorbonne Université, CNRS, Laboratoire de Biologie, Computationnelle et Quantitative UMR 7238, Institut de Biologie Paris-Seine, 4 Place Jussieu, F-75005 Paris, France
| | - Marie Erard
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Oliver Nüsse
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Patrick Fuchs
- Sorbonne Université, École Normale Supérieure, PSL University, CNRS, Laboratoire des Biomolécules, LBM, 75005 Paris, France
- Université de Paris, UFR Sciences du Vivant, 75013 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Alexios Chatzigoulas
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Hubert Santuz
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Antoine Taly
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| |
Collapse
|
16
|
Ghafouri H, Lazar T, Del Conte A, Tenorio Ku LG, Tompa P, Tosatto SCE, Monzon AM. PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D536-D544. [PMID: 37904608 PMCID: PMC10767937 DOI: 10.1093/nar/gkad947] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 11/01/2023] Open
Abstract
The Protein Ensemble Database (PED) (URL: https://proteinensemble.org) is the primary resource for depositing structural ensembles of intrinsically disordered proteins. This updated version of PED reflects advancements in the field, denoting a continual expansion with a total of 461 entries and 538 ensembles, including those generated without explicit experimental data through novel machine learning (ML) techniques. With this significant increment in the number of ensembles, a few yet-unprecedented new entries entered the database, including those also determined or refined by electron paramagnetic resonance or circular dichroism data. In addition, PED was enriched with several new features, including a novel deposition service, improved user interface, new database cross-referencing options and integration with the 3D-Beacons network-all representing efforts to improve the FAIRness of the database. Foreseeably, PED will keep growing in size and expanding with new types of ensembles generated by accurate and fast ML-based generative models and coarse-grained simulations. Therefore, among future efforts, priority will be given to further develop the database to be compatible with ensembles modeled at a coarse-grained level.
Collapse
Affiliation(s)
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Peter Tompa
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences (RCNS), Budapest, Hungary
| | | | | |
Collapse
|
17
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
18
|
Shahrajabian MH, Sun W. Characterization of Intrinsically Disordered Proteins in Healthy and Diseased States by Nuclear Magnetic Resonance. Rev Recent Clin Trials 2024; 19:176-188. [PMID: 38409704 DOI: 10.2174/0115748871271420240213064251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/10/2023] [Accepted: 12/13/2023] [Indexed: 02/28/2024]
Abstract
INTRODUCTION Intrinsically Disordered Proteins (IDPs) are active in different cellular procedures like ordered assembly of chromatin and ribosomes, interaction with membrane, protein, and ligand binding, molecular recognition, binding, and transportation via nuclear pores, microfilaments and microtubules process and disassembly, protein functions, RNA chaperone, and nucleic acid binding, modulation of the central dogma, cell cycle, and other cellular activities, post-translational qualification and substitute splicing, and flexible entropic linker and management of signaling pathways. METHODS The intrinsic disorder is a precise structural characteristic that permits IDPs/IDPRs to be involved in both one-to-many and many-to-one signaling. IDPs/IDPRs also exert some dynamical and structural ordering, being much less constrained in their activities than folded proteins. Nuclear magnetic resonance (NMR) spectroscopy is a major technique for the characterization of IDPs, and it can be used for dynamic and structural studies of IDPs. RESULTS AND CONCLUSION This review was carried out to discuss intrinsically disordered proteins and their different goals, as well as the importance and effectiveness of NMR in characterizing intrinsically disordered proteins in healthy and diseased states.
Collapse
Affiliation(s)
- Mohamad Hesam Shahrajabian
- National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenli Sun
- National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
19
|
Bajpai P, Singh AK, Kandagalla S, Chandra P, Kumar Sah V, Kumar P, Grishina M, Verma OP, Pathak P. Oxazoline/amide derivatives against M. tuberculosis: experimental, biological and computational investigations. J Biomol Struct Dyn 2023:1-11. [PMID: 37948157 DOI: 10.1080/07391102.2023.2276312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 10/18/2023] [Indexed: 11/12/2023]
Abstract
Tuberculosis (TB) is a treatable contagious disease that continuously kills approximately 2 million people yearly. Different oxazoline/amide derivatives were synthesized, and their anti-tuberculosis activity was performed against different strains of Mtb. This study designed the anti-Mtb compounds based on amide and oxazoline, two different structural moieties. The compounds were further synthesized and characterized by spectral techniques. Their anti-Tb activity was evaluated against strain (M. tuberculosis: H37Rv). Selectivity and binding affinity of all synthesized compounds (2a-2e, 3a-3e) against PanK in Mtb were investigated through molecular docking. Molecular dynamics simulation studies for the promising compounds 2d and 3e were performed for 100 ns. The stability of these complexes was assessed by calculating the root mean square deviation, solvent-accessible surface area, and gyration radius relative to their parent structures. Additionally, free energy of binding calculations were performed. Among all synthesized compounds, 2d and 3e had comparable antitubercular activity against standard drug, validated by their computational and biological study.
Collapse
Affiliation(s)
- Priyanka Bajpai
- Goel Institute of Pharmacy and Sciences, Lucknow, Uttar Pradesh, India
| | - Ankit Kumar Singh
- Department of Pharmaceutical Sciences and Natural Products, Central University of Punjab, Ghudda, Bathinda, India
| | - Shivanada Kandagalla
- Laboratory of Computational Modeling of Drugs, Higher Medical and Biological School, South Ural State University, Chelyabinsk, Russia
| | - Phool Chandra
- Teerthanker Mahaveer College of Pharmacy, Teerthanker Mahaveer University, Moradabad, India
| | - Vimlendu Kumar Sah
- Department of Pharmaceutical Sciences and Natural Products, Central University of Punjab, Ghudda, Bathinda, India
| | - Pradeep Kumar
- Department of Pharmaceutical Sciences and Natural Products, Central University of Punjab, Ghudda, Bathinda, India
| | - Maria Grishina
- Laboratory of Computational Modeling of Drugs, Higher Medical and Biological School, South Ural State University, Chelyabinsk, Russia
| | - Om Prakash Verma
- Goel Institute of Pharmacy and Sciences, Lucknow, Uttar Pradesh, India
| | - Prateek Pathak
- Laboratory of Computational Modeling of Drugs, Higher Medical and Biological School, South Ural State University, Chelyabinsk, Russia
- Department of Pharmaceutical Analysis, Quality Assurance and Pharmaceutical Chemistry, School of Pharmacy, GITAM (Deemed to be University), Hyderabad Campus, India
| |
Collapse
|
20
|
Gonzalez JP, Frandsen KEH, Kesten C. The role of intrinsic disorder in binding of plant microtubule-associated proteins to the cytoskeleton. Cytoskeleton (Hoboken) 2023; 80:404-436. [PMID: 37578201 DOI: 10.1002/cm.21773] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/28/2023] [Accepted: 07/30/2023] [Indexed: 08/15/2023]
Abstract
Microtubules (MTs) represent one of the main components of the eukaryotic cytoskeleton and support numerous critical cellular functions. MTs are in principle tube-like structures that can grow and shrink in a highly dynamic manner; a process largely controlled by microtubule-associated proteins (MAPs). Plant MAPs are a phylogenetically diverse group of proteins that nonetheless share many common biophysical characteristics and often contain large stretches of intrinsic protein disorder. These intrinsically disordered regions are determinants of many MAP-MT interactions, in which structural flexibility enables low-affinity protein-protein interactions that enable a fine-tuned regulation of MT cytoskeleton dynamics. Notably, intrinsic disorder is one of the major obstacles in functional and structural studies of MAPs and represents the principal present-day challenge to decipher how MAPs interact with MTs. Here, we review plant MAPs from an intrinsic protein disorder perspective, by providing a complete and up-to-date summary of all currently known members, and address the current and future challenges in functional and structural characterization of MAPs.
Collapse
Affiliation(s)
- Jordy Perez Gonzalez
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Kristian E H Frandsen
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Christopher Kesten
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| |
Collapse
|
21
|
Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A 2023; 120:e2304302120. [PMID: 37878721 PMCID: PMC10622901 DOI: 10.1073/pnas.2304302120] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/30/2023] [Indexed: 10/27/2023] Open
Abstract
The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.
Collapse
Affiliation(s)
- T. Reid Alderson
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Đesika Kolarić
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Alan M. Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
| | - Julie D. Forman-Kay
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
22
|
Pesce F, Bremer A, Tesei G, Hopkins JB, Grace CR, Mittag T, Lindorff-Larsen K. Design of intrinsically disordered protein variants with diverse structural properties. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.22.563461. [PMID: 37961110 PMCID: PMC10634714 DOI: 10.1101/2023.10.22.563461] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Intrinsically disordered proteins (IDPs) perform a wide range of functions in biology, suggesting that the ability to design IDPs could help expand the repertoire of proteins with novel functions. Designing IDPs with specific structural or functional properties has, however, been difficult, in part because determining accurate conformational ensembles of IDPs generally requires a combination of computational modelling and experiments. Motivated by recent advancements in efficient physics-based models for simulations of IDPs, we have developed a general algorithm for designing IDPs with specific structural properties. We demonstrate the power of the algorithm by generating variants of naturally occurring IDPs with different levels of compaction and that vary more than 100 fold in their propensity to undergo phase separation, even while keeping a fixed amino acid composition. We experimentally tested designs of variants of the low-complexity domain of hnRNPA1 and find high accuracy in our computational predictions, both in terms of single-chain compaction and propensity to undergo phase separation. We analyze the sequence features that determine changes in compaction and propensity to phase separate and find an overall good agreement with previous findings for naturally occurring sequences. Our general, physics-based method enables the design of disordered sequences with specified conformational properties. Our algorithm thus expands the toolbox for protein design to include also the most flexible proteins and will enable the design of proteins whose functions exploit the many properties afforded by protein disorder.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anne Bremer
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Giulio Tesei
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jesse B. Hopkins
- BioCAT, Department of Physics, Illinois Institute of Technology, Chicago, IL, USA
| | - Christy R. Grace
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Tanja Mittag
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
23
|
Ho W, Huang H, Huang J. IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning. Protein Sci 2023; 32:e4739. [PMID: 37498545 PMCID: PMC10443345 DOI: 10.1002/pro.4739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 06/21/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023]
Abstract
Conserved residues in protein homolog sequence alignments are structurally or functionally important. For intrinsically disordered proteins or proteins with intrinsically disordered regions (IDRs), however, alignment often fails because they lack a steric structure to constrain evolution. Although sequences vary, the physicochemical features of IDRs may be preserved in maintaining function. Therefore, a method to retrieve common IDR features may help identify functionally important residues. We applied unsupervised contrastive learning to train a model with self-attention neuronal networks on human IDR orthologs. Parameters in the model were trained to match sequences in ortholog pairs but not in other IDRs. The trained model successfully identifies previously reported critical residues from experimental studies, especially those with an overall pattern (e.g., multiple aromatic residues or charged blocks) rather than short motifs. This predictive model can be used to identify potentially important residues in other proteins, improving our understanding of their functions. The trained model can be run directly from the Jupyter Notebook in the GitHub repository using Binder (mybinder.org). The only required input is the primary sequence. The training scripts are available on GitHub (https://github.com/allmwh/IFF). The training datasets have been deposited in an Open Science Framework repository (https://osf.io/jk29b).
Collapse
Affiliation(s)
- Wen‐Lin Ho
- Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Hsuan‐Cheng Huang
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Jie‐rong Huang
- Institute of Biochemistry and Molecular Biology, National Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Department of Life Sciences and Institute of Genome SciencesNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| |
Collapse
|
24
|
Singh NK, Bhardwaj P, Radhakrishna M. Hydrophobicity─A Single Parameter for the Accurate Prediction of Disordered Regions in Proteins. J Chem Inf Model 2023; 63:5375-5383. [PMID: 37581491 DOI: 10.1021/acs.jcim.3c00592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
The prediction of disordered regions in proteins is crucial for understanding their functions, dynamics, and interactions. Intrinsically disordered proteins (IDPs) play a key role in many biological processes like cell signaling, recognition, and regulation, but experimentally determining these regions can be challenging due to their high mobility. To address this challenge, we present an algorithm called HydroDisPred (HDP). HDP uses a single parameter, the fraction of hydrophobicity (λ) in each segment of the protein, to accurately predict disordered regions. The algorithm was validated using experimental data from the DisProt database and was found to be on par and, in some cases, more effective than the existing algorithms. HDP is a simple and effective method for identifying disordered regions in proteins, and its prediction is not affected by the availability of training data, unlike other ML approaches. The application is housed in the web server and can be accessed through the URL https://proseqanalyser.iitgn.ac.in/hydrodispred/.
Collapse
Affiliation(s)
- Nitin Kumar Singh
- Discipline of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Pratyasha Bhardwaj
- Discipline of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Mithun Radhakrishna
- Discipline of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
- Center for Biomedical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| |
Collapse
|
25
|
Ginell GM, Flynn AJ, Holehouse AS. SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets. Bioinformatics 2023; 39:btad488. [PMID: 37540173 PMCID: PMC10423030 DOI: 10.1093/bioinformatics/btad488] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 07/02/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics. RESULTS To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology. AVAILABILITY AND IMPLEMENTATION We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).
Collapse
Affiliation(s)
- Garrett M Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| | - Aidan J Flynn
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States
- Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States
| |
Collapse
|
26
|
Pesce F, Lindorff-Larsen K. Combining Experiments and Simulations to Examine the Temperature-Dependent Behavior of a Disordered Protein. J Phys Chem B 2023. [PMID: 37433228 DOI: 10.1021/acs.jpcb.3c01862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Intrinsically disordered proteins are a class of proteins that lack stable folded conformations and instead adopt a range of conformations that determine their biochemical functions. The temperature-dependent behavior of such disordered proteins is complex and can vary depending on the specific protein and environment. Here, we have used molecular dynamics simulations and previously published experimental data to investigate the temperature-dependent behavior of histatin 5, a 24-residue-long polypeptide. We examined the hypothesis that histatin 5 undergoes a loss of polyproline II (PPII) structure with increasing temperature, leading to more compact conformations. We found that the conformational ensembles generated by the simulations generally agree with small-angle X-ray scattering data for histatin 5, but show some discrepancies with the hydrodynamic radius as probed by pulsed-field gradient NMR spectroscopy, and with the secondary structure information derived from circular dichroism. We attempted to reconcile these differences by reweighting the conformational ensembles against the scattering and NMR data. By doing so, we were in part able to capture the temperature-dependent behavior of histatin 5 and to link the observed decrease in hydrodynamic radius with increasing temperature to a loss of PPII structure. We were, however, unable to achieve agreement with both the scattering and NMR data within experimental errors. We discuss different possible reasons for this including inaccuracies in the force field, differences in conditions of the NMR and scattering experiments, and issues related to the calculation of the hydrodynamic radius from conformational ensembles. Our study highlights the importance of integrating multiple types of experimental data when modeling conformational ensembles of disordered proteins and how environmental factors such as the temperature influence them.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| |
Collapse
|
27
|
Abbas U, Chen J, Shao Q. Assessing Fairness of AlphaFold2 Prediction of Protein 3D Structures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.23.542006. [PMID: 37293014 PMCID: PMC10245900 DOI: 10.1101/2023.05.23.542006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
AlphaFold2 is reshaping biomedical research by enabling the prediction of a protein's 3D structure solely based on its amino acid sequence. This breakthrough reduces reliance on labor-intensive experimental methods traditionally used to obtain protein structures, thereby accelerating the pace of scientific discovery. Despite the bright future, it remains unclear whether AlphaFold2 can uniformly predict the wide spectrum of proteins equally well. Systematic investigation into the fairness and unbiased nature of its predictions is still an area yet to be thoroughly explored. In this paper, we conducted an in-depth analysis of AlphaFold2's fairness using data comprised of five million reported protein structures from its open-access repository. Specifically, we assessed the variability in the distribution of PLDDT scores, considering factors such as amino acid type, secondary structure, and sequence length. Our findings reveal a systematic discrepancy in AlphaFold2's predictive reliability, varying across different types of amino acids and secondary structures. Furthermore, we observed that the size of the protein exerts a notable impact on the credibility of the 3D structural prediction. AlphaFold2 demonstrates enhanced prediction power for proteins of medium size compared to those that are either smaller or larger. These systematic biases could potentially stem from inherent biases present in its training data and model architecture. These factors need to be taken into account when expanding the applicability of AlphaFold2.
Collapse
Affiliation(s)
- Usman Abbas
- Chemical & Materials Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, Kentucky, USA
| | - Qing Shao
- Chemical & Materials Engineering, University of Kentucky, Lexington, Kentucky, USA
| |
Collapse
|
28
|
Zheng LE, Barethiya S, Nordquist E, Chen J. Machine Learning Generation of Dynamic Protein Conformational Ensembles. Molecules 2023; 28:4047. [PMID: 37241789 PMCID: PMC10220786 DOI: 10.3390/molecules28104047] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning has achieved remarkable success across a broad range of scientific and engineering disciplines, particularly its use for predicting native protein structures from sequence information alone. However, biomolecules are inherently dynamic, and there is a pressing need for accurate predictions of dynamic structural ensembles across multiple functional levels. These problems range from the relatively well-defined task of predicting conformational dynamics around the native state of a protein, which traditional molecular dynamics (MD) simulations are particularly adept at handling, to generating large-scale conformational transitions connecting distinct functional states of structured proteins or numerous marginally stable states within the dynamic ensembles of intrinsically disordered proteins. Machine learning has been increasingly applied to learn low-dimensional representations of protein conformational spaces, which can then be used to drive additional MD sampling or directly generate novel conformations. These methods promise to greatly reduce the computational cost of generating dynamic protein ensembles, compared to traditional MD simulations. In this review, we examine recent progress in machine learning approaches towards generative modeling of dynamic protein ensembles and emphasize the crucial importance of integrating advances in machine learning, structural data, and physical principles to achieve these ambitious goals.
Collapse
Affiliation(s)
- Li-E Zheng
- Department of Gynecology, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China;
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| |
Collapse
|
29
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
30
|
Varadi M, Bordin N, Orengo C, Velankar S. The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors. Curr Opin Struct Biol 2023; 79:102543. [PMID: 36807079 DOI: 10.1016/j.sbi.2023.102543] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/04/2023] [Accepted: 01/13/2023] [Indexed: 02/21/2023]
Abstract
The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK. https://twitter.com/nicolabordin
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
31
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
32
|
González-Delgado J, Sagar A, Zanon C, Lindorff-Larsen K, Bernadó P, Neuvial P, Cortés J. WASCO: A Wasserstein-based statistical tool to compare conformational ensembles of intrinsically disordered proteins. J Mol Biol 2023:168053. [PMID: 36934808 DOI: 10.1016/j.jmb.2023.168053] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/10/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
The structural investigation of intrinsically disordered proteins (IDPs) requires ensemble models describing the diversity of the conformational states of the molecule. Due to their probabilistic nature, there is a need for new paradigms that understand and treat IDPs from a purely statistical point of view, considering their conformational ensembles as well-defined probability distributions. In this work, we define a conformational ensemble as an ordered set of probability distributions and provide a suitable metric to detect differences between two given ensembles at the residue level, both locally and globally. The underlying geometry of the conformational space is properly integrated, one ensemble being characterized by a set of probability distributions supported on the three-dimensional Euclidean space (for global-scale comparisons) and on the two-dimensional flat torus (for local-scale comparisons). The inherent uncertainty of the data is also taken into account to provide finer estimations of the differences between ensembles. Additionally, an overall distance between ensembles is defined from the differences at the residue level. We illustrate the interest of the approach with several examples of applications for the comparison of conformational ensembles: (i) produced from molecular dynamics (MD) simulations using different force fields, and (ii) before and after refinement with experimental data. We also show the usefulness of the method to assess the convergence of MD simulations, and discuss other potential applications such as in machine-learning-based approaches. The numerical tool has been implemented in Python through easy-to-use Jupyter Notebooks available at https://gitlab.laas.fr/moma/WASCO.
Collapse
Affiliation(s)
- Javier González-Delgado
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France; Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, Toulouse, France
| | - Amin Sagar
- Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, Montpellier, France
| | | | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Denmark
| | - Pau Bernadó
- Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, Montpellier, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, Toulouse, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
33
|
Pesce F, Newcombe EA, Seiffert P, Tranchant EE, Olsen JG, Grace CR, Kragelund BB, Lindorff-Larsen K. Assessment of models for calculating the hydrodynamic radius of intrinsically disordered proteins. Biophys J 2023; 122:310-321. [PMID: 36518077 PMCID: PMC9892621 DOI: 10.1016/j.bpj.2022.12.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 11/18/2022] [Accepted: 12/09/2022] [Indexed: 12/15/2022] Open
Abstract
Diffusion measurements by pulsed-field gradient NMR and fluorescence correlation spectroscopy can be used to probe the hydrodynamic radius of proteins, which contains information about the overall dimension of a protein in solution. The comparison of this value with structural models of intrinsically disordered proteins is nonetheless impaired by the uncertainty of the accuracy of the methods for computing the hydrodynamic radius from atomic coordinates. To tackle this issue, we here build conformational ensembles of 11 intrinsically disordered proteins that we ensure are in agreement with measurements of compaction by small-angle x-ray scattering. We then use these ensembles to identify the forward model that more closely fits the radii derived from pulsed-field gradient NMR diffusion experiments. Of the models we examined, we find that the Kirkwood-Riseman equation provides the best description of the hydrodynamic radius probed by pulsed-field gradient NMR experiments. While some minor discrepancies remain, our results enable better use of measurements of the hydrodynamic radius in integrative modeling and for force field benchmarking and parameterization.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Estella A Newcombe
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Pernille Seiffert
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Emil E Tranchant
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Johan G Olsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christy R Grace
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, Tennessee
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
34
|
Sun B, Kekenes-Huskey PM. Myofilament-associated proteins with intrinsic disorder (MAPIDs) and their resolution by computational modeling. Q Rev Biophys 2023; 56:e2. [PMID: 36628457 PMCID: PMC11070111 DOI: 10.1017/s003358352300001x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The cardiac sarcomere is a cellular structure in the heart that enables muscle cells to contract. Dozens of proteins belong to the cardiac sarcomere, which work in tandem to generate force and adapt to demands on cardiac output. Intriguingly, the majority of these proteins have significant intrinsic disorder that contributes to their functions, yet the biophysics of these intrinsically disordered regions (IDRs) have been characterized in limited detail. In this review, we first enumerate these myofilament-associated proteins with intrinsic disorder (MAPIDs) and recent biophysical studies to characterize their IDRs. We secondly summarize the biophysics governing IDR properties and the state-of-the-art in computational tools toward MAPID identification and characterization of their conformation ensembles. We conclude with an overview of future computational approaches toward broadening the understanding of intrinsic disorder in the cardiac sarcomere.
Collapse
Affiliation(s)
- Bin Sun
- Research Center for Pharmacoinformatics (The State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Department of Medicinal Chemistry and Natural Medicine Chemistry, College of Pharmacy, Harbin Medical University, Harbin 150081, China
| | | |
Collapse
|
35
|
Guo HB, Perminov A, Bekele S, Kedziora G, Farajollahi S, Varaljay V, Hinkle K, Molinero V, Meister K, Hung C, Dennis P, Kelley-Loughnane N, Berry R. AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Sci Rep 2022; 12:10696. [PMID: 35739160 PMCID: PMC9226352 DOI: 10.1038/s41598-022-14382-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 06/06/2022] [Indexed: 12/29/2022] Open
Abstract
AlphaFold 2 (AF2) has placed Molecular Biology in a new era where we can visualize, analyze and interpret the structures and functions of all proteins solely from their primary sequences. We performed AF2 structure predictions for various protein systems, including globular proteins, a multi-domain protein, an intrinsically disordered protein (IDP), a randomized protein, two larger proteins (> 1000 AA), a heterodimer and a homodimer protein complex. Our results show that along with the three dimensional (3D) structures, AF2 also decodes protein sequences into residue flexibilities via both the predicted local distance difference test (pLDDT) scores of the models, and the predicted aligned error (PAE) maps. We show that PAE maps from AF2 are correlated with the distance variation (DV) matrices from molecular dynamics (MD) simulations, which reveals that the PAE maps can predict the dynamical nature of protein residues. Here, we introduce the AF2-scores, which are simply derived from pLDDT scores and are in the range of [0, 1]. We found that for most protein models, including large proteins and protein complexes, the AF2-scores are highly correlated with the root mean square fluctuations (RMSF) calculated from MD simulations. However, for an IDP and a randomized protein, the AF2-scores do not correlate with the RMSF from MD, especially for the IDP. Our results indicate that the protein structures predicted by AF2 also convey information of the residue flexibility, i.e., protein dynamics.
Collapse
Affiliation(s)
- Hao-Bo Guo
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
- UES Inc., Dayton, OH, USA
| | - Alexander Perminov
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
- Computer Science Department, Miami University, Oxford, OH, USA
| | - Selemon Bekele
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
- UES Inc., Dayton, OH, USA
| | - Gary Kedziora
- General Dynamics Information Technology, Inc., Wright-Patterson Air Force Base, 45433, OH, USA
| | - Sanaz Farajollahi
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
- UES Inc., Dayton, OH, USA
| | - Vanessa Varaljay
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
| | - Kevin Hinkle
- Department of Chemical and Materials Engineering, Dayton University, Dayton, OH, USA
| | - Valeria Molinero
- Department of Chemistry, The University of Utah, Salt Lake City, UT, USA
| | - Konrad Meister
- Department of Natural Sciences, University of Alaska Southeast, Juneau, AK, USA
- Max Planck Institute for Polymer Research, Mainz, Germany
| | - Chia Hung
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
| | - Patrick Dennis
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA
| | - Nancy Kelley-Loughnane
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA.
| | - Rajiv Berry
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson Air Force Base, 45433, OH, USA.
| |
Collapse
|
36
|
Laurents DV. AlphaFold 2 and NMR Spectroscopy: Partners to Understand Protein Structure, Dynamics and Function. Front Mol Biosci 2022; 9:906437. [PMID: 35655760 PMCID: PMC9152297 DOI: 10.3389/fmolb.2022.906437] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 04/25/2022] [Indexed: 11/29/2022] Open
Abstract
The artificial intelligence program AlphaFold 2 is revolutionizing the field of protein structure determination as it accurately predicts the 3D structure of two thirds of the human proteome. Its predictions can be used directly as structural models or indirectly as aids for experimental structure determination using X-ray crystallography, CryoEM or NMR spectroscopy. Nevertheless, AlphaFold 2 can neither afford insight into how proteins fold, nor can it determine protein stability or dynamics. Rare folds or minor alternative conformations are also not predicted by AlphaFold 2 and the program does not forecast the impact of post translational modifications, mutations or ligand binding. The remaining third of human proteome which is poorly predicted largely corresponds to intrinsically disordered regions of proteins. Key to regulation and signaling networks, these disordered regions often form biomolecular condensates or amyloids. Fortunately, the limitations of AlphaFold 2 are largely complemented by NMR spectroscopy. This experimental approach provides information on protein folding and dynamics as well as biomolecular condensates and amyloids and their modulation by experimental conditions, small molecules, post translational modifications, mutations, flanking sequence, interactions with other proteins, RNA and virus. Together, NMR spectroscopy and AlphaFold 2 can collaborate to advance our comprehension of proteins.
Collapse
|
37
|
Wilson CJ, Choy WY, Karttunen M. AlphaFold2: A Role for Disordered Protein/Region Prediction? Int J Mol Sci 2022; 23:4591. [PMID: 35562983 PMCID: PMC9104326 DOI: 10.3390/ijms23094591] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023] Open
Abstract
The development of AlphaFold2 marked a paradigm-shift in the structural biology community. Herein, we assess the ability of AlphaFold2 to predict disordered regions against traditional sequence-based disorder predictors. We find that AlphaFold2 performs well at discriminating disordered regions, but also note that the disorder predictor one constructs from an AlphaFold2 structure determines accuracy. In particular, a naïve, but non-trivial assumption that residues assigned to helices, strands, and H-bond stabilized turns are likely ordered and all other residues are disordered results in a dramatic overestimation in disorder; conversely, the predicted local distance difference test (pLDDT) provides an excellent measure of residue-wise disorder. Furthermore, by employing molecular dynamics (MD) simulations, we note an interesting relationship between the pLDDT and secondary structure, that may explain our observations and suggests a broader application of the pLDDT for characterizing the local dynamics of intrinsically disordered proteins and regions (IDPs/IDRs).
Collapse
Affiliation(s)
- Carter J. Wilson
- Department of Mathematics, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada;
- Centre for Advanced Materials and Biomaterials Research, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
| | - Wing-Yiu Choy
- Department of Biochemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5C1, Canada
| | - Mikko Karttunen
- Centre for Advanced Materials and Biomaterials Research, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
- Department of Physics and Astronomy, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 5B7, Canada
- Department of Chemistry, The University of Western Ontario, 1151 Richmond Street, London, ON N6A 3K7, Canada
| |
Collapse
|
38
|
Kulkarni P, Leite VBP, Roy S, Bhattacharyya S, Mohanty A, Achuthan S, Singh D, Appadurai R, Rangarajan G, Weninger K, Orban J, Srivastava A, Jolly MK, Onuchic JN, Uversky VN, Salgia R. Intrinsically disordered proteins: Ensembles at the limits of Anfinsen's dogma. BIOPHYSICS REVIEWS 2022; 3:011306. [PMID: 38505224 PMCID: PMC10903413 DOI: 10.1063/5.0080512] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 02/17/2022] [Indexed: 03/21/2024]
Abstract
Intrinsically disordered proteins (IDPs) are proteins that lack rigid 3D structure. Hence, they are often misconceived to present a challenge to Anfinsen's dogma. However, IDPs exist as ensembles that sample a quasi-continuum of rapidly interconverting conformations and, as such, may represent proteins at the extreme limit of the Anfinsen postulate. IDPs play important biological roles and are key components of the cellular protein interaction network (PIN). Many IDPs can interconvert between disordered and ordered states as they bind to appropriate partners. Conformational dynamics of IDPs contribute to conformational noise in the cell. Thus, the dysregulation of IDPs contributes to increased noise and "promiscuous" interactions. This leads to PIN rewiring to output an appropriate response underscoring the critical role of IDPs in cellular decision making. Nonetheless, IDPs are not easily tractable experimentally. Furthermore, in the absence of a reference conformation, discerning the energy landscape representation of the weakly funneled IDPs in terms of reaction coordinates is challenging. To understand conformational dynamics in real time and decipher how IDPs recognize multiple binding partners with high specificity, several sophisticated knowledge-based and physics-based in silico sampling techniques have been developed. Here, using specific examples, we highlight recent advances in energy landscape visualization and molecular dynamics simulations to discern conformational dynamics and discuss how the conformational preferences of IDPs modulate their function, especially in phenotypic switching. Finally, we discuss recent progress in identifying small molecules targeting IDPs underscoring the potential therapeutic value of IDPs. Understanding structure and function of IDPs can not only provide new insight on cellular decision making but may also help to refine and extend Anfinsen's structure/function paradigm.
Collapse
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology and Therapeutics Research, City of Hope National Medical Center, Duarte, California 91010, USA
| | - Vitor B. P. Leite
- Departamento de Física, Instituto de Biociências, Letras e Ciências Exatas, Universidade Estadual Paulista (UNESP), São José do Rio Preto, São Paulo 15054-000, Brazil
| | - Susmita Roy
- Department of Chemical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, West Bengal 741246, India
| | - Supriyo Bhattacharyya
- Translational Bioinformatics, Center for Informatics, Department of Computational and Quantitative Medicine, City of Hope National Medical Center, Duarte, California 91010, USA
| | - Atish Mohanty
- Department of Medical Oncology and Therapeutics Research, City of Hope National Medical Center, Duarte, California 91010, USA
| | - Srisairam Achuthan
- Center for Informatics, Division of Research Informatics, City of Hope National Medical Center, Duarte, California 91010, USA
| | - Divyoj Singh
- Center for BioSystems Science and Engineering, Indian Institute of Science, Bangalore 560012, India
| | - Rajeswari Appadurai
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Govindan Rangarajan
- Department of Mathematics, Indian Institute of Science, Bangalore 560012, India
| | - Keith Weninger
- Department of Physics, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Mohit Kumar Jolly
- Center for BioSystems Science and Engineering, Indian Institute of Science, Bangalore 560012, India
| | - Jose N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005-1892, USA
| | | | - Ravi Salgia
- Department of Medical Oncology and Therapeutics Research, City of Hope National Medical Center, Duarte, California 91010, USA
| |
Collapse
|
39
|
Conformational ensembles of intrinsically disordered proteins and flexible multidomain proteins. Biochem Soc Trans 2022; 50:541-554. [PMID: 35129612 DOI: 10.1042/bst20210499] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/13/2022] [Accepted: 01/17/2022] [Indexed: 12/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) and multidomain proteins with flexible linkers show a high level of structural heterogeneity and are best described by ensembles consisting of multiple conformations with associated thermodynamic weights. Determining conformational ensembles usually involves the integration of biophysical experiments and computational models. In this review, we discuss current approaches to determine conformational ensembles of IDPs and multidomain proteins, including the choice of biophysical experiments, computational models used to sample protein conformations, models to calculate experimental observables from protein structure, and methods to refine ensembles against experimental data. We also provide examples of recent applications of integrative conformational ensemble determination to study IDPs and multidomain proteins and suggest future directions for research in the field.
Collapse
|
40
|
Jephthah S, Pesce F, Lindorff-Larsen K, Skepö M. Force Field Effects in Simulations of Flexible Peptides with Varying Polyproline II Propensity. J Chem Theory Comput 2021; 17:6634-6646. [PMID: 34524800 PMCID: PMC8515809 DOI: 10.1021/acs.jctc.1c00408] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Five peptides previously suggested to possess polyproline II (PPII) structure have here been investigated by using atomistic molecular dynamics simulations to compare how well four different force fields known for simulating intrinsically disordered proteins relatively well (Amber ff99SB-disp, Amber ff99SB-ILDN, CHARM36IDPSFF, and CHARMM36m) can capture this secondary structure element. The results revealed that all force fields sample PPII structures but to different extents and with different propensities toward other secondary structure elements, in particular, the β-sheet and "random coils". A cluster analysis of the simulations of histatin 5 also revealed that the conformational ensembles of the force fields are quite different. We compared the simulations to circular dichroism and nuclear magnetic resonance spectroscopy experiments and conclude that further experiments and methods for interpreting them are needed to assess the accuracy of force fields in determining PPII structure.
Collapse
Affiliation(s)
- Stéphanie Jephthah
- Division of Theoretical Chemistry, Lund University, SE-221 00 Lund, Sweden
| | - Francesco Pesce
- Structural Biology and NMR Laboratory & the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Marie Skepö
- Division of Theoretical Chemistry, Lund University, SE-221 00 Lund, Sweden
| |
Collapse
|
41
|
|
42
|
Griffith D, Holehouse AS. PARROT is a flexible recurrent neural network framework for analysis of large protein datasets. eLife 2021; 10:e70576. [PMID: 34533455 PMCID: PMC8448528 DOI: 10.7554/elife.70576] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022] Open
Abstract
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
Collapse
Affiliation(s)
- Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of MedicineSt LouisUnited States
- Center for Science and Engineering Living Systems, Washington UniversitySt LouisUnited States
| |
Collapse
|