1
|
Mollaei P, Sadasivam D, Guntuboina C, Barati Farimani A. IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models. J Phys Chem B 2024; 128:12030-12037. [PMID: 39586094 DOI: 10.1021/acs.jpcb.4c02507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Danush Sadasivam
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
2
|
Mathur A, Ghosh R, Nunes-Alves A. Recent Progress in Modeling and Simulation of Biomolecular Crowding and Condensation Inside Cells. J Chem Inf Model 2024. [PMID: 39660892 DOI: 10.1021/acs.jcim.4c01520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
Macromolecular crowding in the cellular cytoplasm can potentially impact diffusion rates of proteins, their intrinsic structural stability, binding of proteins to their corresponding partners as well as biomolecular organization and phase separation. While such intracellular crowding can have a large impact on biomolecular structure and function, the molecular mechanisms and driving forces that determine the effect of crowding on dynamics and conformations of macromolecules are so far not well understood. At a molecular level, computational methods can provide a unique lens to investigate the effect of macromolecular crowding on biomolecular behavior, providing us with a resolution that is challenging to reach with experimental techniques alone. In this review, we focus on the various physics-based and data-driven computational methods developed in the past few years to investigate macromolecular crowding and intracellular protein condensation. We review recent progress in modeling and simulation of biomolecular systems of varying sizes, ranging from single protein molecules to the entire cellular cytoplasm. We further discuss the effects of macromolecular crowding on different phenomena, such as diffusion, protein-ligand binding, and mechanical and viscoelastic properties, such as surface tension of condensates. Finally, we discuss some of the outstanding challenges that we anticipate the community addressing in the next few years in order to investigate biological phenomena in model cellular environments by reproducing in vivo conditions as accurately as possible.
Collapse
Affiliation(s)
- Apoorva Mathur
- Institute of Chemistry, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
| | - Rikhia Ghosh
- Institute of Chemistry, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
- Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, Ridgefield, Connecticut 06877, United States
| | - Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany
| |
Collapse
|
3
|
Yu M, Gruzinov AY, Ruan H, Scheidt T, Chowdhury A, Giofrè S, Mohammed ASA, Caria J, Sauter PF, Svergun DI, Lemke EA. A genetically encoded anomalous SAXS ruler to probe the dimensions of intrinsically disordered proteins. Proc Natl Acad Sci U S A 2024; 121:e2415220121. [PMID: 39642200 DOI: 10.1073/pnas.2415220121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/21/2024] [Indexed: 12/08/2024] Open
Abstract
Intrinsically disordered proteins (IDPs) adopt ensembles of rapidly fluctuating heterogeneous conformations, influencing their binding capabilities and supramolecular transitions. The primary conformational descriptors for understanding IDP ensembles-the radius of gyration (RG), measured by small-angle X-ray scattering (SAXS), and the root mean square (rms) end-to-end distance (RE), probed by fluorescent resonance energy transfer (FRET)-are often reported to produce inconsistent results regarding IDP expansion as a function of denaturant concentration in the buffer. This ongoing debate surrounding the FRET-SAXS discrepancy raises questions about the overall reliability of either method for quantitatively studying IDP properties. To address this discrepancy, we introduce a genetically encoded anomalous SAXS (ASAXS) ruler, enabling simultaneous and direct measurements of RG and RE without assuming a specific structural model. This ruler utilizes a genetically encoded noncanonical amino acid with two bromine atoms, providing an anomalous X-ray scattering signal for precise distance measurements. Through this approach, we experimentally demonstrate that the ratio between RE and RG varies under different denaturing conditions, highlighting the intrinsic properties of IDPs as the primary source of the observed SAXS-FRET discrepancy rather than shortcomings in either of the two established methods. The developed genetically encoded ASAXS ruler emerges as a versatile tool for both IDPs and folded proteins, providing a unified approach for obtaining complementary and site-specific conformational information in scattering experiments, thereby contributing to a deeper understanding of protein functions.
Collapse
Affiliation(s)
- Miao Yu
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology postdoctoral program, Mainz 55128, Germany
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Andrey Yu Gruzinov
- BIOSAXS Group, European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Hao Ruan
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology postdoctoral program, Mainz 55128, Germany
| | - Tom Scheidt
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology postdoctoral program, Mainz 55128, Germany
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Aritra Chowdhury
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Sabrina Giofrè
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology postdoctoral program, Mainz 55128, Germany
| | - Ahmed S A Mohammed
- BIOSAXS Group, European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Joana Caria
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
| | - Paul F Sauter
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Dmitri I Svergun
- BIOSAXS Group, European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Edward A Lemke
- Biocenter, Johannes Gutenberg University Mainz, Mainz 55128, Germany
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Institute of Molecular Biology, Mainz 55128, Germany
| |
Collapse
|
4
|
Xu Z, Schahl A, Jolivet MD, Legrand A, Grélard A, Berbon M, Morvan E, Lagardere L, Piquemal JP, Loquet A, Germain V, Chavent M, Mongrand S, Habenstein B. Dynamic pre-structuration of lipid nanodomain-segregating remorin proteins. Commun Biol 2024; 7:1620. [PMID: 39639105 PMCID: PMC11621693 DOI: 10.1038/s42003-024-07330-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 11/28/2024] [Indexed: 12/07/2024] Open
Abstract
Remorins are multifunctional proteins, regulating immunity, development and symbiosis in plants. When associating to the membrane, remorins sequester specific lipids into functional membrane nanodomains. The multigenic protein family contains six groups, classified upon their protein-domain composition. Membrane targeting of remorins occurs independently from the secretory pathway. Instead, they are directed into different nanodomains depending on their phylogenetic group. All family members contain a C-terminal membrane anchor and a homo-oligomerization domain, flanked by an intrinsically disordered region of variable length at the N-terminal end. We here combined molecular imaging, NMR spectroscopy, protein structure calculations and advanced molecular dynamics simulation to unveil a stable pre-structuration of coiled-coil dimers as nanodomain-targeting units, containing a tunable fuzzy coat and a bar code-like positive surface charge before membrane association. Our data suggest that remorins fold in the cytosol with the N-terminal disordered region as a structural ensemble around a dimeric anti-parallel coiled-coil core containing a symmetric interface motif reminiscent of a hydrophobic Leucine zipper. The domain geometry, the charge distribution in the coiled-coil remorins and the differences in structures and dynamics between C-terminal lipid anchors of the remorin groups provide a selective platform for phospholipid binding when encountering the membrane surface.
Collapse
Affiliation(s)
- Zeren Xu
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France
| | - Adrien Schahl
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, Université Paul Sabatier, 31400, Toulouse, France
- Sorbonne Université, LCT, UMR7616 CNRS,75005Paris, France; Qubit Pharmaceuticals, Advanced Research Department, 75014, Paris, France
| | - Marie-Dominique Jolivet
- Laboratoire de Biogenèse Membranaire (LBM) UMR-5200, CNRS-Univ. Bordeaux, F-33140, Villenave d'Ornon, France
| | - Anthony Legrand
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France
| | - Axelle Grélard
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France
| | - Mélanie Berbon
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France
| | - Estelle Morvan
- Univ. Bordeaux, CNRS, Inserm, IECB, UAR3033, US01, Pessac, France
| | - Louis Lagardere
- Sorbonne Université, LCT, UMR7616 CNRS,75005Paris, France; Qubit Pharmaceuticals, Advanced Research Department, 75014, Paris, France
| | - Jean-Philip Piquemal
- Sorbonne Université, LCT, UMR7616 CNRS,75005Paris, France; Qubit Pharmaceuticals, Advanced Research Department, 75014, Paris, France
| | - Antoine Loquet
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France
| | - Véronique Germain
- Laboratoire de Biogenèse Membranaire (LBM) UMR-5200, CNRS-Univ. Bordeaux, F-33140, Villenave d'Ornon, France
| | - Matthieu Chavent
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, Université Paul Sabatier, 31400, Toulouse, France.
- Laboratoire de Microbiologie et Génétique Moléculaires (LMGM), Centre de Biologie Intégrative (CBI), Université de Toulouse, CNRS, UPS, Toulouse, France.
| | - Sébastien Mongrand
- Laboratoire de Biogenèse Membranaire (LBM) UMR-5200, CNRS-Univ. Bordeaux, F-33140, Villenave d'Ornon, France.
| | - Birgit Habenstein
- Univ. Bordeaux, CNRS, Bordeaux INP, CBMN, UMR 5248, IECB, F-33600, Pessac, France.
| |
Collapse
|
5
|
Cagliani R, Forni D, Mozzi A, Fuchs R, Hagai T, Sironi M. Evolutionary analysis of ZAP and its cofactors identifies intrinsically disordered regions as central elements in host-pathogen interactions. Comput Struct Biotechnol J 2024; 23:3143-3154. [PMID: 39234301 PMCID: PMC11372611 DOI: 10.1016/j.csbj.2024.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 07/30/2024] [Accepted: 07/30/2024] [Indexed: 09/06/2024] Open
Abstract
The zinc-finger antiviral protein (ZAP) is an innate immunity sensor of non-self nucleic acids. Its antiviral activity is exerted through the physical interaction with different cofactors, including TRIM25, Riplet and KHNYN. Cellular proteins that interact with infectious agents are expected to be engaged in genetic conflicts that often result in their rapid evolution. To test this possibility and to identify the regions most strongly targeted by natural selection, we applied in silico molecular evolution tools to analyze the evolutionary history of ZAP and cofactors in four mammalian groups. We report evidence of positive selection in all genes and in most mammalian groups. On average, the intrinsically disordered regions (IDRs) embedded in the four proteins evolve significantly faster than folded domains and most positively selected sites fall within IDRs. In ZAP, the PARP domain also shows abundant signals of selection, and independent evolution in different mammalian groups suggests modulation of its ADP-ribose binding ability. Detailed analyses of the biophysical properties of IDRs revealed that chain compaction and conformational entropy are conserved across mammals. The IDRs in ZAP and KHNYN are particularly compact, indicating that they may promote phase separation (PS). In line with this hypothesis, we predicted several PS-promoting regions in ZAP and KHNYN, as well as in TRIM25. Positively selected sites are abundant in these regions, suggesting that PS may be important for the antiviral functions of these proteins and the evolutionary arms race with viruses. Our data shed light into the evolution of ZAP and cofactors and indicate that IDRs represent central elements in host-pathogen interactions.
Collapse
Affiliation(s)
- Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Rotem Fuchs
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tzachi Hagai
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| |
Collapse
|
6
|
Erdős G, Dosztányi Z. Deep learning for intrinsically disordered proteins: From improved predictions to deciphering conformational ensembles. Curr Opin Struct Biol 2024; 89:102950. [PMID: 39522439 DOI: 10.1016/j.sbi.2024.102950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/19/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024]
Abstract
Intrinsically disordered proteins (IDPs) lack a stable three-dimensional structure under physiological conditions, challenging traditional structure-based prediction methods. This review explores how modern deep learning approaches, which have revolutionized structure prediction for globular proteins, have impacted protein disorder predictions. We highlight the role of community-driven efforts in curating data and assessing state-of-the-art, which have been crucial in advancing the field. We also review state-of-the-art methods utilizing deep learning techniques, highlighting innovative approaches. We also address advancements in characterizing protein conformational ensembles directly from sequence data using novel machine learning methods.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary.
| |
Collapse
|
7
|
Zhang X, Song X, Hu G, Yang Y, Liu R, Zhou N, Basu S, Qiao D, Hou Q. Landscape of intrinsically disordered proteins in mental disorder diseases. Comput Struct Biotechnol J 2024; 23:3839-3849. [PMID: 39534590 PMCID: PMC11554586 DOI: 10.1016/j.csbj.2024.10.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/12/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
Disrupted genes linked to mental disorders sometimes exhibit characteristics of Intrinsically Disordered Proteins (IDPs). However, few studies have comprehensively explored the functional associations between protein disorder properties and different psychiatric disorders. In this study, we collected disrupted proteins for seven mental diseases (MDD, SCZ, BP, ID, AD, ADHD, ASD) and a control dataset from normal brains. After calculating the disorder scores for each protein, we thoroughly compared the proportions and functions of IDPs between differentially expressed proteins in each disease and healthy controls. Our findings revealed that disrupted proteins, particularly in ASD and ADHD, contain more IDPs than controls from normal brains. Distinct patterns in disorder properties were observed among different mental disorders. Functional enrichment analysis indicated that IDPs in mental disorders were associated with neurodevelopment, synaptic signaling, and gene expression regulatory pathways. In addition, we analyzed the proportion and function of liquid-phase-separated proteins (LLPS) in psychiatric disorders, finding that LLPS proteins are mainly enriched in pathways related to neurodevelopment and inter-synaptic signaling. Furthermore, to validate our findings, we conducted an analysis of differentially expressed genes in an ASD cohort, revealing that the encoded proteins also exhibit a higher proportion of IDPs. Notably, these IDPs were particularly enriched in pathways related to neurodevelopment, including head development, a process known to be disrupted in ASD. Our study sheds light on the crucial role of IDPs in psychiatric disorders, enhancing our understanding of their molecular mechanisms.
Collapse
Affiliation(s)
- Xinwu Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Xixi Song
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Guangchun Hu
- School of Information Science and Engineering, University of Jinan, Jinan 250022, China
| | - Yaqing Yang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Ruotong Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Na Zhou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Sankar Basu
- Department of Microbiology, Asutosh College (affiliated with University of Calcutta), 92, Shyama Prasad Mukherjee Rd, Bhowanipore 700026, Kolkata, India
| | - Dongdong Qiao
- Shandong Mental Health Center, Shandong University, Jinan 250014, China
| | - Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| |
Collapse
|
8
|
Baratam K, Srivastava A. SOP-MULTI: A Self-Organized Polymer-Based Coarse-Grained Model for Multidomain and Intrinsically Disordered Proteins with Conformation Ensemble Consistent with Experimental Scattering Data. J Chem Theory Comput 2024; 20:10179-10198. [PMID: 39499823 DOI: 10.1021/acs.jctc.4c00579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
Multidomain proteins with long flexible linkers and full-length intrinsically disordered proteins (IDPs) are best defined as an ensemble of conformations rather than a single structure. Determining high-resolution ensemble structures of such proteins poses various challenges by using tools from experimental structural biophysics. Integrative approaches combining available low-resolution ensemble-averaged experimental data and in silico biomolecular reconstructions are now often used for the purpose. However, extensive Boltzmann weighted conformation sampling for large proteins, especially for ones where both the folded and disordered domains exist in the same polypeptide chain, remains a challenge. In this work, we present a 2-site per amino-acid resolution SOP-MULTI force field for simulating coarse-grained models of multidomain proteins. SOP-MULTI combines two well-established self-organized polymer models─: (i) SOP-SC models for folded systems and (ii) SOP-IDP for IDPs. For the SOP-MULTI, we introduce cross-interaction terms between the beads belonging to the folded and disordered regions to generate conformation ensembles for full-length multidomain proteins such as hnRNP A1, TDP-43, G3BP1, hGHR-ECD, TIA1, HIV-1 Gag, polyubiquitin, and FUS. When back-mapped to all-atom resolution, SOP-MULTI trajectories faithfully recapitulate the scattering data over the range of the reciprocal space. We also show that individual folded domains preserve native contacts with respect to solved folded structures, and root-mean-square fluctuations of residues in folded domains match those obtained from all-atom molecular dynamics simulation trajectories of the same folded systems. SOP-MULTI force field is made available as a LAMMPS-compatible user package along with setup codes for generating the required files for any full-length protein with folded and disordered regions.
Collapse
Affiliation(s)
- Krishnakanth Baratam
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| |
Collapse
|
9
|
Borthakur K, Sisk TR, Panei FP, Bonomi M, Robustelli P. Determining accurate conformational ensembles of intrinsically disordered proteins at atomic resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.04.616700. [PMID: 39651234 PMCID: PMC11623552 DOI: 10.1101/2024.10.04.616700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Determining accurate atomic resolution conformational ensembles of intrinsically disordered proteins (IDPs) is extremely challenging. Molecular dynamics (MD) simulations provide atomistic conformational ensembles of IDPs, but their accuracy is highly dependent on the quality of physical models, or force fields, used. Here, we demonstrate how to determine accurate atomic resolution conformational ensembles of IDPs by integrating all-atom MD simulations with experimental data from nuclear magnetic resonance (NMR) spectroscopy and small-angle x-ray scattering (SAXS) with a simple, robust and fully automated maximum entropy reweighting procedure. We demonstrate that when this approach is applied with sufficient experimental data, IDP ensembles derived from different MD force fields converge to highly similar conformational distributions. The maximum entropy reweighting procedure presented here facilitates the integration of MD simulations with extensive experimental datasets and enables the calculation of accurate, force-field independent atomic resolution conformational ensembles of IDPs.
Collapse
|
10
|
Houston L, Phillips M, Torres A, Gaalswyk K, Ghosh K. Physics-Based Machine Learning Trains Hamiltonians and Decodes the Sequence-Conformation Relation in the Disordered Proteome. J Chem Theory Comput 2024; 20:10266-10274. [PMID: 39504303 DOI: 10.1021/acs.jctc.4c01114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
Intrinsically disordered proteins and regions (IDPs) are involved in vital biological processes. To understand the IDP function, often controlled by conformation, we need to find the link between sequence and conformation. We decode this link by integrating theory, simulation, and machine learning (ML) where sequence-dependent electrostatics is modeled analytically while nonelectrostatic interaction is extracted from simulations for many sequences and subsequently trained using ML. The resulting Hamiltonian, combining physics-based electrostatics and machine-learned nonelectrostatics, accurately predicts sequence-specific global and local measures of conformations beyond the original observable used from the simulation. This is in contrast to traditional ML approaches that train and predict a specific observable, not a Hamiltonian. Our formalism reproduces experimental measurements, predicts multiple conformational features directly from sequence with high throughput that will give insights into IDP design and evolution, and illustrates the broad utility of using physics-based ML to train unknown parts of a Hamiltonian, rather than a specific observable, in combination with known physics.
Collapse
Affiliation(s)
- Lilianna Houston
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Michael Phillips
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Andrew Torres
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Kari Gaalswyk
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
| | - Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, Colorado 80210, United States
- Department of Molecular and Cellular Biophysics, University of Denver, Denver, Colorado 80210, United States
| |
Collapse
|
11
|
Chillón-Pino D, Badonyi M, Semple CA, Marsh JA. Protein structural context of cancer mutations reveals molecular mechanisms and candidate driver genes. Cell Rep 2024; 43:114905. [PMID: 39441719 DOI: 10.1016/j.celrep.2024.114905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 08/23/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Advances in protein structure determination and modeling allow us to study the structural context of human genetic variants on an unprecedented scale. Here, we analyze millions of cancer-associated missense mutations based on their structural locations and predicted perturbative effects. By considering the collective properties of mutations at the level of individual proteins, we identify distinct patterns associated with tumor suppressors and oncogenes. Tumor suppressors are enriched in structurally damaging mutations, consistent with loss-of-function mechanisms, while oncogene mutations tend to be structurally mild, reflecting selection for gain-of-function driver mutations and against loss-of-function mutations. Although oncogenes are difficult to distinguish from genes with no role in cancer using only structural damage, we find that the three-dimensional clustering of mutations is highly predictive. These observations allow us to identify candidate driver genes and speculate about their molecular roles, which we expect will have general utility in the analysis of cancer sequencing data.
Collapse
Affiliation(s)
- Diego Chillón-Pino
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Mihaly Badonyi
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Colin A Semple
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
12
|
Sundaravadivelu Devarajan D, Mittal J. Sequence-Encoded Spatiotemporal Dependence of Viscoelasticity of Protein Condensates Using Computational Microrheology. JACS AU 2024; 4:4394-4405. [PMID: 39610751 PMCID: PMC11600178 DOI: 10.1021/jacsau.4c00740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/09/2024] [Accepted: 10/29/2024] [Indexed: 11/30/2024]
Abstract
Many biomolecular condensates act as viscoelastic complex fluids with distinct cellular functions. Deciphering the viscoelastic behavior of biomolecular condensates can provide insights into their spatiotemporal organization and physiological roles within cells. Although there is significant interest in defining the role of condensate dynamics and rheology in physiological functions, the quantification of their time-dependent viscoelastic properties is limited and is mostly done through experimental rheological methods. Here, we demonstrate that a computational passive probe microrheology technique, coupled with continuum mechanics, can accurately characterize the linear viscoelasticity of condensates formed by intrinsically disordered proteins (IDPs). Using a transferable coarse-grained protein model, we first provide a physical basis for choosing optimal values that define the attributes of the probe particle, namely, its size and interaction strength with the residues in an IDP chain. We show that the technique captures the sequence-dependent viscoelasticity of heteropolymeric IDPs that differ in either sequence charge patterning or sequence hydrophobicity. We also illustrate the technique's potential in quantifying the spatial dependence of viscoelasticity in heterogeneous IDP condensates. The computational microrheology technique has important implications for investigating the time-dependent rheology of complex biomolecular architectures, resulting in the sequence-rheology-function relationship for condensates.
Collapse
Affiliation(s)
| | - Jeetain Mittal
- Artie McFerrin
Department of Chemical Engineering, Texas
A&M University, College
Station, Texas 77843, United States
- Department
of Chemistry, Texas A&M University, College Station, Texas 77843, United States
- Interdisciplinary
Graduate Program in Genetics and Genomics, Texas A&M University, College Station, Texas 77843, United States
| |
Collapse
|
13
|
Knechtel JW, Strickfaden H, Missiaen K, Hadfield JD, Hendzel MJ, Underhill DA. KMT5C leverages disorder to optimize cooperation with HP1 for heterochromatin retention. EMBO Rep 2024:10.1038/s44319-024-00320-5. [PMID: 39562713 DOI: 10.1038/s44319-024-00320-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 09/27/2024] [Accepted: 11/04/2024] [Indexed: 11/21/2024] Open
Abstract
A defining feature of constitutive heterochromatin compartments is the heterochromatin protein-1 (HP1) family, whose members display fast internal mobility and rapid exchange with the surrounding nucleoplasm. Here, we describe a paradoxical state for the lysine methyltransferase KMT5C characterized by rapid internal diffusion but minimal nucleoplasmic exchange. This retentive behavior is conferred by sparse sequence features that constitute two modules tethered by an intrinsically disordered linker. While both modules harbor variant HP1 interaction motifs, the first comprises adjacent sequences that increase affinity using avidity. The second motif increases HP1 effective concentration to further enhance affinity in a context-dependent manner, which is evident using distinct heterochromatin recruitment strategies and heterologous linkers with defined conformational ensembles. Despite the linker sequence being highly divergent, it is under evolutionary constraint for functional length, suggesting conformational buffering can support cooperativity between modules across distant orthologs. Overall, we show that KMT5C has evolved a robust tethering strategy that uses minimal sequence determinants to harness highly dynamic HP1 proteins for retention within heterochromatin compartments.
Collapse
Affiliation(s)
- Justin W Knechtel
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Hilmar Strickfaden
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Kristal Missiaen
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Joanne D Hadfield
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Michael J Hendzel
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
- Department of Cell Biology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - D Alan Underhill
- Department of Oncology, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada.
- Department of Medical Genetics, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada.
| |
Collapse
|
14
|
Zhu J, Robustelli PJ. Covalent adducts formed by the androgen receptor transactivation domain and small molecule drugs remain disordered. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.12.623257. [PMID: 39605539 PMCID: PMC11601358 DOI: 10.1101/2024.11.12.623257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Intrinsically disordered proteins are implicated in many human diseases. Small molecules that target the disordered androgen receptor transactivation domain have entered human trials for the treatment of castration-resistant prostate cancer. These molecules have been shown to react with cysteine residues of the androgen receptor transactivation domain and form covalent adducts under physiological conditions. It is currently unclear how covalent attachment of these molecules alters the conformational ensemble of the androgen receptor. Here, we utilize all-atom molecular dynamics computer simulations to simulate covalent adducts of the small molecule ligands EPI-002 and EPI-7170 bound to the disordered androgen receptor transactivation domain. Our simulations reveal that the conformational ensembles of androgen receptor transactivation domain covalent adducts are heterogeneous and disordered. We find that covalent attachment of EPI-002 and EPI-7170 increases the population of collapsed helical transactivation domain conformations relative to the populations observed in non-covalent binding simulations and we identify networks of protein-ligand interactions that stabilize collapsed conformations in covalent adduct ensembles. We compare the populations of protein-ligand interactions observed in covalent adduct ensembles to those observed in non-covalent ligand-bound ensembles and find substantial differences. Our results provide atomically detailed descriptions of covalent adducts formed by small molecules and an intrinsically disordered protein and suggest strategies for developing more potent covalent inhibitors of intrinsically disordered proteins.
Collapse
Affiliation(s)
- Jiaqi Zhu
- Department of Chemistry, Dartmouth College, Hanover, NH, USA
| | | |
Collapse
|
15
|
Day EC, Chittari SS, Cunha KC, Zhao RJ, Dodds JN, Davis DC, Baker ES, Berlow RB, Shea JE, Kulkarni RU, Knight AS. A High-Throughput Workflow to Analyze Sequence-Conformation Relationships and Explore Hydrophobic Patterning in Disordered Peptoids. Chem 2024; 10:3444-3458. [PMID: 39582487 PMCID: PMC11580747 DOI: 10.1016/j.chempr.2024.07.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2024]
Abstract
Understanding how a macromolecule's primary sequence governs its conformational landscape is crucial for elucidating its function, yet these design principles are still emerging for macromolecules with intrinsic disorder. Herein, we introduce a high-throughput workflow that implements a practical colorimetric conformational assay, introduces a semi-automated sequencing protocol using MALDI-MS/MS, and develops a generalizable sequence-structure algorithm. Using a model system of 20mer peptidomimetics containing polar glycine and hydrophobic N-butylglycine residues, we identified nine classifications of conformational disorder and isolated 122 unique sequences across varied compositions and conformations. Conformational distributions of three compositionally identical library sequences were corroborated through atomistic simulations and ion mobility spectrometry coupled with liquid chromatography. A data-driven strategy was developed using existing sequence variables and data-derived 'motifs' to inform a machine learning algorithm towards conformation prediction. This multifaceted approach enhances our understanding of sequence-conformation relationships and offers a powerful tool for accelerating the discovery of materials with conformational control.
Collapse
Affiliation(s)
- Erin C. Day
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Supraja S. Chittari
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Keila C. Cunha
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, USA
| | - Roy J. Zhao
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, USA
| | - James N. Dodds
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Delaney C. Davis
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Erin S. Baker
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Rebecca B. Berlow
- Department of Biochemistry and Biophysics and Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 USA
| | - Joan-Emma Shea
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, Santa Barbara, California 93106, USA
| | | | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
- Lead contact
| |
Collapse
|
16
|
Colas K, Bindl D, Suga H. Selection of Nucleotide-Encoded Mass Libraries of Macrocyclic Peptides for Inaccessible Drug Targets. Chem Rev 2024; 124:12213-12241. [PMID: 39451037 PMCID: PMC11565579 DOI: 10.1021/acs.chemrev.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 10/02/2024] [Accepted: 10/04/2024] [Indexed: 10/26/2024]
Abstract
Technological advances and breakthrough developments in the pharmaceutical field are knocking at the door of the "undruggable" fortress with increasing insistence. Notably, the 21st century has seen the emergence of macrocyclic compounds, among which cyclic peptides are of particular interest. This new class of potential drug candidates occupies the vast chemical space between classic small-molecule drugs and larger protein-based therapeutics, such as antibodies. As research advances toward clinical targets that have long been considered inaccessible, macrocyclic peptides are well-suited to tackle these challenges in a post-rule of 5 pharmaceutical landscape. Facilitating their discovery is an arsenal of high-throughput screening methods that exploit massive randomized libraries of genetically encoded compounds. These techniques benefit from the incorporation of non-natural moieties, such as non- proteinogenic amino acids or stabilizing hydrocarbon staples. Exploiting these features for the strategic architectural design of macrocyclic peptides has the potential to tackle challenging targets such as protein-protein interactions, which have long resisted research efforts. This Review summarizes the basic principles and recent developments of the main high-throughput techniques for the discovery of macrocyclic peptides and focuses on their specific deployment for targeting undruggable space. A particular focus is placed on the development of new design guidelines and principles for the cyclization and structural stabilization of cyclic peptides and the resulting success stories achieved against well-known inaccessible drug targets.
Collapse
Affiliation(s)
- Kilian Colas
- University of Tokyo, Department of Chemistry, Graduate School of Science 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Daniel Bindl
- University of Tokyo, Department of Chemistry, Graduate School of Science 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroaki Suga
- University of Tokyo, Department of Chemistry, Graduate School of Science 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| |
Collapse
|
17
|
Pettitt AJ, Shukla VK, Figueiredo AM, Newton LS, McCarthy S, Tabor AB, Heller GT, Lorenz CD, Hansen DF. An integrative characterization of proline cis and trans conformers in a disordered peptide. Biophys J 2024; 123:3798-3811. [PMID: 39340152 PMCID: PMC11560310 DOI: 10.1016/j.bpj.2024.09.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/11/2024] [Accepted: 09/25/2024] [Indexed: 09/30/2024] Open
Abstract
Intrinsically disordered proteins (IDPs) often contain proline residues that undergo cis/trans isomerization. While molecular dynamics (MD) simulations have the potential to fully characterize the proline cis and trans subensembles, they are limited by the slow timescales of isomerization and force field inaccuracies. NMR spectroscopy can report on ensemble-averaged observables for both the cis-proline and trans-proline states, but a full atomistic characterization of these conformers is challenging. Given the importance of proline cis/trans isomerization for influencing the conformational sampling of disordered proteins, we employed a combination of all-atom MD simulations with enhanced sampling (metadynamics), NMR, and small-angle x-ray scattering (SAXS) to characterize the two subensembles of the ORF6 C-terminal region (ORF6CTR) from SARS-CoV-2 corresponding to the proline-57 (P57) cis and trans states. We performed MD simulations in three distinct force fields: AMBER03ws, AMBER99SB-disp, and CHARMM36m, which are all optimized for disordered proteins. Each simulation was run for an accumulated time of 180-220 μs until convergence was reached, as assessed by blocking analysis. A good agreement between the cis-P57 populations predicted from metadynamic simulations in AMBER03ws was observed with populations obtained from experimental NMR data. Moreover, we observed good agreement between the radius of gyration predicted from the metadynamic simulations in AMBER03ws and that measured using SAXS. Our findings suggest that both the cis-P57 and trans-P57 conformations of ORF6CTR are extremely dynamic and that interdisciplinary approaches combining both multiscale computations and experiments offer avenues to explore highly dynamic states that cannot be reliably characterized by either approach in isolation.
Collapse
Affiliation(s)
- Alice J Pettitt
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; Department of Engineering, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, London, United Kingdom; The Francis Crick Institute, London, United Kingdom
| | - Vaibhav Kumar Shukla
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; The Francis Crick Institute, London, United Kingdom
| | | | - Lydia S Newton
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom
| | - Stephen McCarthy
- Department of Chemistry, Faculty of Mathematical and Physical Sciences, London, United Kingdom
| | - Alethea B Tabor
- Department of Chemistry, Faculty of Mathematical and Physical Sciences, London, United Kingdom
| | - Gabriella T Heller
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom
| | - Christian D Lorenz
- Department of Engineering, Faculty of Natural, Mathematical and Engineering Sciences, King's College London, London, United Kingdom.
| | - D Flemming Hansen
- Department of Structural and Molecular Biology, Division of Biosciences, London, United Kingdom; The Francis Crick Institute, London, United Kingdom.
| |
Collapse
|
18
|
Wohl S, Gilron Y, Zheng W. Structural and Functional Relevance of Charge Based Transient Interactions inside Intrinsically Disordered Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.30.621161. [PMID: 39554085 PMCID: PMC11565980 DOI: 10.1101/2024.10.30.621161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Intrinsically disordered proteins (IDPs) perform a wide range of biological functions without adopting stable, well-defined, three-dimensional structures. Instead, IDPs exist as dynamic ensembles of flexible conformations, traditionally thought to be governed by weak, nonspecific interactions, which are well described by homopolymer theory. However, recent research highlights the presence of transient, specific interactions in several IDPs, suggesting that factors beyond overall size influence their conformational behavior. In this study, we investigate how the spatial arrangement of charged amino acids within IDP sequences shapes the prevalence of transient, specific interactions. Through a series of model peptides, we establish a quantitative empirical relationship between the fraction of transient interactions and a novel sequence metric, termed effective charged patch length, which characterizes the ability of charged patches to drive these interactions. By examining IDP ensembles with varying levels of transient interactions, we further explore their heteropolymeric structural behavior in phase-separated condensates, where we observe the formation of a condensate-spanning network structure. Additionally, we perform a proteome-wide scan for charge-based transient interactions within disordered regions of the human proteome, revealing that approximately 10% of these regions exhibit such charge-driven transient interactions, leading to heteropolymeric behaviors in their conformational ensembles. Finally, we examine how these charge-based transient interactions correlate with molecular functions, identifying specific biological roles in which these interactions are enriched.
Collapse
Affiliation(s)
- Samuel Wohl
- Department of Physics, Arizona State University, Tempe, AZ 85287, USA
| | - Yishai Gilron
- College of Integrative Sciences and Arts, Arizona State University, Mesa, AZ 85212, USA
| | - Wenwei Zheng
- College of Integrative Sciences and Arts, Arizona State University, Mesa, AZ 85212, USA
| |
Collapse
|
19
|
González-Delgado J, Bernadó P, Neuvial P, Cortés J. Weighted families of contact maps to characterize conformational ensembles of (highly-)flexible proteins. Bioinformatics 2024; 40:btae627. [PMID: 39432675 PMCID: PMC11530230 DOI: 10.1093/bioinformatics/btae627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/17/2024] [Accepted: 10/16/2024] [Indexed: 10/23/2024] Open
Abstract
MOTIVATION Characterizing the structure of flexible proteins, particularly within the realm of intrinsic disorder, presents a formidable challenge due to their high conformational variability. Currently, their structural representation relies on (possibly large) conformational ensembles derived from a combination of experimental and computational methods. The detailed structural analysis of these ensembles is a difficult task, for which existing tools have limited effectiveness. RESULTS This study proposes an innovative extension of the concept of contact maps to the ensemble framework, incorporating the intrinsic probabilistic nature of disordered proteins. Within this framework, a conformational ensemble is characterized through a weighted family of contact maps. To achieve this, conformations are first described using a refined definition of contact that appropriately accounts for the geometry of the inter-residue interactions and the sequence context. Representative structural features of the ensemble naturally emerge from the subsequent clustering of the resulting contact-based descriptors. Importantly, transiently populated structural features are readily identified within large ensembles. The performance of the method is illustrated by several use cases and compared with other existing approaches, highlighting its superiority in capturing relevant structural features of highly flexible proteins. AVAILABILITY AND IMPLEMENTATION An open-source implementation of the method is provided together with an easy-to-use Jupyter notebook, available at https://gitlab.laas.fr/moma/WARIO.
Collapse
Affiliation(s)
- Javier González-Delgado
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, 34090 Montpellier, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| |
Collapse
|
20
|
Cao F, von Bülow S, Tesei G, Lindorff‐Larsen K. A coarse-grained model for disordered and multi-domain proteins. Protein Sci 2024; 33:e5172. [PMID: 39412378 PMCID: PMC11481261 DOI: 10.1002/pro.5172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/12/2024] [Accepted: 08/23/2024] [Indexed: 10/20/2024]
Abstract
Many proteins contain more than one folded domain, and such modular multi-domain proteins help expand the functional repertoire of proteins. Because of their larger size and often substantial dynamics, it may be difficult to characterize the conformational ensembles of multi-domain proteins by simulations. Here, we present a coarse-grained model for multi-domain proteins that is both fast and provides an accurate description of the global conformational properties in solution. We show that the accuracy of a one-bead-per-residue coarse-grained model depends on how the interaction sites in the folded domains are represented. Specifically, we find excessive domain-domain interactions if the interaction sites are located at the position of the Cα atoms. We also show that if the interaction sites are located at the center of mass of the residue, we obtain good agreement between simulations and experiments across a wide range of proteins. We then optimize our previously described CALVADOS model using this center-of-mass representation, and validate the resulting model using independent data. Finally, we use our revised model to simulate phase separation of both disordered and multi-domain proteins, and to examine how the stability of folded domains may differ between the dilute and dense phases. Our results provide a starting point for understanding interactions between folded and disordered regions in proteins, and how these regions affect the propensity of proteins to self-associate and undergo phase separation.
Collapse
Affiliation(s)
- Fan Cao
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Sören von Bülow
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Giulio Tesei
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff‐Larsen
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| |
Collapse
|
21
|
Davis MC, André AAM, Kjaergaard M. Entering the Next Phase: Predicting Biological Effects of Biomolecular Condensates. J Mol Biol 2024; 436:168645. [PMID: 38848869 DOI: 10.1016/j.jmb.2024.168645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/02/2024] [Accepted: 06/03/2024] [Indexed: 06/09/2024]
Abstract
Biomolecular condensates are increasingly recognized as important drivers of cellular function; their dysregulation leads to pathology and disease. We discuss three questions in terms of the impending utility of data-driven techniques to predict condensate-driven biological outcomes, i.e., the impact of cellular state changes on condensates, the effect of condensates on biochemical processes within, and condensate properties that result in cellular dysregulation and disease.
Collapse
Affiliation(s)
- Maria C Davis
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Alain A M André
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Magnus Kjaergaard
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark; The Danish Research Institute for Translational Neuroscience (DANDRITE), Denmark.
| |
Collapse
|
22
|
Piovesan D, Del Conte A, Mehdiabadi M, Aspromonte MC, Blum M, Tesei G, von Bülow S, Lindorff-Larsen K, Tosatto SCE. MOBIDB in 2025: integrating ensemble properties and function annotations for intrinsically disordered proteins. Nucleic Acids Res 2024:gkae969. [PMID: 39470701 DOI: 10.1093/nar/gkae969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 10/30/2024] Open
Abstract
The MobiDB database (URL: https://mobidb.org/) aims to provide structural and functional information about intrinsic protein disorder, aggregating annotations from the literature, experimental data, and predictions for all known protein sequences. Here, we describe the improvements made to our resource to capture more information, simplify access to the aggregated data, and increase documentation of all MobiDB features. Compared to the previous release, all underlying pipeline modules were updated. The prediction module is ten times faster and can detect if a predicted disordered region is structurally extended or compact. The PDB component is now able to process large cryo-EM structures extending the number of processed entries. The entry page has been restyled to highlight functional aspects of disorder and all graphical modules have been completely reimplemented for better flexibility and faster rendering. The server has been improved to optimise bulk downloads. Annotation provenance has been standardised by adopting ECO terms. Finally, we propagated disorder function (IDPO and GO terms) from the DisProt database exploiting sequence similarity and protein embeddings. These improvements, along with the addition of comprehensive training material, offer a more intuitive interface and novel functional knowledge about intrinsic disorder.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padua 35131, Italy
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padua 35131, Italy
| | - Mahta Mehdiabadi
- Department of Biomedical Sciences, University of Padova, Padua 35131, Italy
| | | | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Giulio Tesei
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sören von Bülow
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Padua 35131, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| |
Collapse
|
23
|
Poveda-Cuevas SA, Lohachova K, Markusic B, Dikic I, Hummer G, Bhaskara RM. Intrinsically disordered region amplifies membrane remodeling to augment selective ER-phagy. Proc Natl Acad Sci U S A 2024; 121:e2408071121. [PMID: 39453744 PMCID: PMC11536123 DOI: 10.1073/pnas.2408071121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 08/27/2024] [Indexed: 10/27/2024] Open
Abstract
Intrinsically disordered regions (IDRs) play a pivotal role in organellar remodeling. They transduce signals across membranes, scaffold signaling complexes, and mediate vesicular traffic. Their functions are regulated by constraining conformational ensembles through specific intra- and intermolecular interactions, physical tethering, and posttranslational modifications. The endoplasmic reticulum (ER)-phagy receptor FAM134B/RETREG1, known for its reticulon homology domain (RHD), includes a substantial C-terminal IDR housing the LC3 interacting motif. Beyond engaging the autophagic machinery, the function of the FAM134B-IDR is unclear. Here, we investigate the characteristics of the FAM134B-IDR by extensive modeling and molecular dynamics simulations. We present detailed structural models for the IDR, mapping its conformational landscape in solution and membrane-anchored configurations. Our analysis reveals that depending on the membrane anchor, the IDRs collapse onto the membrane and induce positive membrane curvature to varying degrees. The charge patterns underlying this Janus-like behavior are conserved across other ER-phagy receptors. We found that IDRs alone are sufficient to sense curvature. When combined with RHDs, they intensify membrane remodeling and drive efficient protein clustering, leading to faster budding, thereby amplifying RHD remodeling functions. Our simulations provide a perspective on IDRs of FAM134B, their Janus-like membrane interactions, and the resulting modulatory functions during large-scale ER remodeling.
Collapse
Affiliation(s)
- Sergio Alejandro Poveda-Cuevas
- Goethe University Frankfurt, School of Medicine, Institute of Biochemistry II, Frankfurt am Main60590, Germany
- Goethe University Frankfurt, Buchmann Institute for Molecular Life Sciences, Riedberg Campus, Frankfurt am Main60438, Germany
| | - Kateryna Lohachova
- Goethe University Frankfurt, School of Medicine, Institute of Biochemistry II, Frankfurt am Main60590, Germany
- Goethe University Frankfurt, Buchmann Institute for Molecular Life Sciences, Riedberg Campus, Frankfurt am Main60438, Germany
| | - Borna Markusic
- Goethe University Frankfurt, School of Medicine, Institute of Biochemistry II, Frankfurt am Main60590, Germany
- International Max Planck Research School on Cellular Biophysics, Max-von-Laue-Strasse 3, Frankfurt am Main60438, Germany
| | - Ivan Dikic
- Goethe University Frankfurt, School of Medicine, Institute of Biochemistry II, Frankfurt am Main60590, Germany
- Goethe University Frankfurt, Buchmann Institute for Molecular Life Sciences, Riedberg Campus, Frankfurt am Main60438, Germany
| | - Gerhard Hummer
- Max-Planck Institute of Biophysics, Department of Theoretical Biophysics, Frankfurt am Main60438, Germany
- Goethe University Frankfurt, Department of Physics, Institute of Biophysics, Frankfurt am Main60438, Germany
| | - Ramachandra M. Bhaskara
- Goethe University Frankfurt, School of Medicine, Institute of Biochemistry II, Frankfurt am Main60590, Germany
- Goethe University Frankfurt, Buchmann Institute for Molecular Life Sciences, Riedberg Campus, Frankfurt am Main60438, Germany
| |
Collapse
|
24
|
Shokhen M, Albeck A, Borisov V, Israel Y, Levy NS, Levy AP. Conformational analysis of the IQSEC2 protein by statistical thermodynamics. Curr Res Struct Biol 2024; 8:100158. [PMID: 39431217 PMCID: PMC11490877 DOI: 10.1016/j.crstbi.2024.100158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 09/09/2024] [Accepted: 10/01/2024] [Indexed: 10/22/2024] Open
Abstract
Mutations in the IQSEC2 gene result in severe intellectual disability, epilepsy and autism. The primary function of IQSEC2 is to serve as a guanine exchange factor (GEF) controlling the activation of ARF6 which in turn mediates membrane trafficking and synaptic connections between neurons. As IQSEC2 is a large intrinsically disordered protein little is known of the structure of the protein and how this influences its function. Understanding this structure and function relationship is critical for the development of novel therapies to treat IQSEC2 disease. We therefore sought to identify IQSEC2 conformers in unfolded and folded states and analyze how conformers differ when binding to ARF6 and thereby influence GEF catalysis. We simulated the folding process of IQSEC2 by accelerated molecular dynamics (aMD). Following the ensemble method of Gibbs, we proposed that the number of microstates in the ensemble replicating a protein macroscopic system is the total number of MD snapshots sampled on the production MD trajectory. We divided the entire range of reaction coordinate into a series of consecutive, non-overlapping bins. Thermal fluctuations of biomolecules in local equilibrium states are Gaussian in form. To predict the free energy and entropy of different conformational states using statistical thermodynamics, the density of states was estimated taking into account how many MD snapshots constitute each conformational state. IQSEC2 dimers derived from the most stable folded and unfolded conformers of IQSEC2 were generated by protein-protein docking and then used to construct IQSEC2-ARF6 encounter complexes. We suggest that IQSEC2 folding and dimerization are two competing processes that may be used by nature to regulate the process of GDP exchange on ARF6 catalyzed by IQSEC2.
Collapse
Affiliation(s)
- Michael Shokhen
- Department of Chemistry, Bar Ilan University, Ramat Gan, Israel
| | - Amnon Albeck
- Department of Chemistry, Bar Ilan University, Ramat Gan, Israel
| | - Veronika Borisov
- Technion Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
| | - Yonat Israel
- Technion Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
| | - Nina S. Levy
- Technion Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
| | - Andrew P. Levy
- Technion Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
25
|
Krokengen OC, Touma C, Mularski A, Sutinen A, Dunkel R, Ytterdal M, Raasakka A, Mertens HDT, Simonsen AC, Kursula P. The cytoplasmic tail of myelin protein zero induces morphological changes in lipid membranes. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2024; 1866:184368. [PMID: 38971517 DOI: 10.1016/j.bbamem.2024.184368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 06/24/2024] [Accepted: 07/01/2024] [Indexed: 07/08/2024]
Abstract
The major myelin protein expressed by the peripheral nervous system Schwann cells is protein zero (P0), which represents 50% of the total protein content in myelin. This 30-kDa integral membrane protein consists of an immunoglobulin (Ig)-like domain, a transmembrane helix, and a 69-residue C-terminal cytoplasmic tail (P0ct). The basic residues in P0ct contribute to the tight packing of myelin lipid bilayers, and alterations in the tail affect how P0 functions as an adhesion molecule necessary for the stability of compact myelin. Several neurodegenerative neuropathies are related to P0, including the more common Charcot-Marie-Tooth disease (CMT) and Dejerine-Sottas syndrome (DSS) as well as rare cases of motor and sensory polyneuropathy. We found that high P0ct concentrations affected the membrane properties of bicelles and induced a lamellar-to-inverted hexagonal phase transition, which caused bicelles to fuse into long, protein-containing filament-like structures. These structures likely reflect the formation of semicrystalline lipid domains with potential relevance for myelination. Not only is P0ct important for stacking lipid membranes, but time-lapse fluorescence microscopy also shows that it might affect membrane properties during myelination. We further describe recombinant production and low-resolution structural characterization of full-length human P0. Our findings shed light on P0ct effects on membrane properties, and with the successful purification of full-length P0, we have new tools to study the role of P0 in myelin formation and maintenance in vitro.
Collapse
Affiliation(s)
- Oda C Krokengen
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Christine Touma
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Anna Mularski
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense, Denmark
| | - Aleksi Sutinen
- Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, Oulu, Finland
| | - Ryan Dunkel
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Marie Ytterdal
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Arne Raasakka
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Haydyn D T Mertens
- European Molecular Biology Laboratory EMBL, Hamburg Site, c/o DESY, Hamburg, Germany
| | - Adam Cohen Simonsen
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense, Denmark
| | - Petri Kursula
- Department of Biomedicine, University of Bergen, Bergen, Norway; Faculty of Biochemistry and Molecular Medicine & Biocenter Oulu, University of Oulu, Oulu, Finland.
| |
Collapse
|
26
|
Giraldo-Castaño MC, Littlejohn KA, Avecilla ARC, Barrera-Villamizar N, Quiroz FG. Programmability and biomedical utility of intrinsically-disordered protein polymers. Adv Drug Deliv Rev 2024; 212:115418. [PMID: 39094909 PMCID: PMC11389844 DOI: 10.1016/j.addr.2024.115418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 07/03/2024] [Accepted: 07/29/2024] [Indexed: 08/04/2024]
Abstract
Intrinsically disordered proteins (IDPs) exhibit molecular-level conformational dynamics that are functionally harnessed across a wide range of fascinating biological phenomena. The low sequence complexity of IDPs has led to the design and development of intrinsically-disordered protein polymers (IDPPs), a class of engineered repeat IDPs with stimuli-responsive properties. The perfect repetitive architecture of IDPPs allows for repeat-level encoding of tunable protein functionality. Designer IDPPs can be modeled on endogenous IDPs or engineered de novo as protein polymers with dual biophysical and biological functionality. Their properties can be rationally tailored to access enigmatic IDP biology and to create programmable smart biomaterials. With the goal of inspiring the bioengineering of multifunctional IDP-based materials, here we synthesize recent multidisciplinary progress in programming and exploiting the bio-functionality of IDPPs and IDPP-containing proteins. Collectively, expanding beyond the traditional sequence space of extracellular IDPs, emergent sequence-level control of IDPP functionality is fueling the bioengineering of self-assembling biomaterials, advanced drug delivery systems, tissue scaffolds, and biomolecular condensates -genetically encoded organelle-like structures. Looking forward, we emphasize open challenges and emerging opportunities, arguing that the intracellular behaviors of IDPPs represent a rich space for biomedical discovery and innovation. Combined with the intense focus on IDP biology, the growing landscape of IDPPs and their biomedical applications set the stage for the accelerated engineering of high-value biotechnologies and biomaterials.
Collapse
Affiliation(s)
- Maria Camila Giraldo-Castaño
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Kai A Littlejohn
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Alexa Regina Chua Avecilla
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Natalia Barrera-Villamizar
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Felipe Garcia Quiroz
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
| |
Collapse
|
27
|
Phillips M, Muthukumar M, Ghosh K. Beyond monopole electrostatics in regulating conformations of intrinsically disordered proteins. PNAS NEXUS 2024; 3:pgae367. [PMID: 39253398 PMCID: PMC11382291 DOI: 10.1093/pnasnexus/pgae367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 08/13/2024] [Indexed: 09/11/2024]
Abstract
Conformations and dynamics of an intrinsically disordered protein (IDP) depend on its composition of charged and uncharged amino acids, and their specific placement in the protein sequence. In general, the charge (positive or negative) on an amino acid residue in the protein is not a fixed quantity. Each of the ionizable groups can exist in an equilibrated distribution of fully ionized state (monopole) and an ion-pair (dipole) state formed between the ionizing group and its counterion from the background electrolyte solution. The dipole formation (counterion condensation) depends on the protein conformation, which in turn depends on the distribution of charges and dipoles on the molecule. Consequently, effective charges of ionizable groups in the IDP backbone may differ from their chemical charges in isolation-a phenomenon termed charge-regulation. Accounting for the inevitable dipolar interactions, that have so far been ignored, and using a self-consistent procedure, we present a theory of charge-regulation as a function of sequence, temperature, and ionic strength. The theory quantitatively agrees with both charge reduction and salt-dependent conformation data of Prothymosin-alpha and makes several testable predictions. We predict charged groups are less ionized in sequences where opposite charges are well mixed compared to sequences where they are strongly segregated. Emergence of dipolar interactions from charge-regulation allows spontaneous coexistence of two phases having different conformations and charge states, sensitively depending on the charge patterning. These findings highlight sequence dependent charge-regulation and its potential exploitation by biological regulators such as phosphorylation and mutations in controlling protein conformation and function.
Collapse
Affiliation(s)
- Michael Phillips
- Department of Physics and Astronomy, University of Denver, Denver, CO 80208, USA
| | - Murugappan Muthukumar
- Department of Polymer Science and Engineering, University of Massachusetts, Amherst, MA 01003, USA
| | - Kingshuk Ghosh
- Department of Physics and Astronomy, University of Denver, Denver, CO 80208, USA
- Molecular and Cellular Biophysics, University of Denver, Denver, CO 80208, USA
| |
Collapse
|
28
|
Manfredi M, Savojardo C, Iardukhin G, Salomoni D, Costantini A, Martelli PL, Casadio R. Alpha&ESMhFolds: A Web Server for Comparing AlphaFold2 and ESMFold Models of the Human Reference Proteome. J Mol Biol 2024; 436:168593. [PMID: 38718922 DOI: 10.1016/j.jmb.2024.168593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/22/2024] [Accepted: 04/30/2024] [Indexed: 05/16/2024]
Abstract
We develop a novel database Alpha&ESMhFolds which allows the direct comparison of AlphaFold2 and ESMFold predicted models for 42,942 proteins of the Reference Human Proteome, and when available, their comparison with 2,900 directly associated PDB structures with at least a structure to sequence coverage of 70%. Statistics indicate that good quality models tend to overlap with a TM-score >0.6 as long as some PDB structural information is available. As expected, a direct model superimposition to the PDB structure highlights that AlphaFold2 models are slightly superior to ESMFold ones. However, some 55% of the database is endowed with models overlapping with TM-score <0.6. This highlights the different outputs of the two methods. The database is freely available for usage at https://alpha-esmhfolds.biocomp.unibo.it/.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Georgii Iardukhin
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | | | | | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
29
|
Robustelli P. Extending computational protein design to intrinsically disordered proteins. SCIENCE ADVANCES 2024; 10:eadr3239. [PMID: 39196938 PMCID: PMC11352910 DOI: 10.1126/sciadv.adr3239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 08/13/2024] [Indexed: 08/30/2024]
Abstract
Advances in the accuracy and throughput of molecular simulations usher in a new era in the structural biology of disordered proteins.
Collapse
Affiliation(s)
- Paul Robustelli
- Department of Chemistry, Dartmouth College, Hanover, NH 03755, USA
| |
Collapse
|
30
|
Regina Chua Avecilla A, Thomas J, Quiroz FG. Genetically-encoded phase separation sensors for intracellular probing of biomolecular condensates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.29.610365. [PMID: 39257779 PMCID: PMC11383673 DOI: 10.1101/2024.08.29.610365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Biomolecular condensates are dynamic membraneless compartments with enigmatic roles across intracellular phenomena. Intrinsically-disordered proteins (IDPs) often function as condensate scaffolds, fueled by their liquid-liquid phase separation (LLPS) dynamics. Intracellular probing of these condensates relies on live-cell imaging of IDP-scaffolds tagged with fluorescent proteins. Conformational heterogeneity in IDPs, however, renders them uniquely sensitive to molecular-level fusions, risking distortion of the native biophysical properties of IDP-scaffolds and their assemblies. Probing epidermal condensates in mouse skin, we recently introduced genetically encoded LLPS-sensors that circumvent the need for molecular-level tagging of skin IDPs. The concept of LLPS-sensors involves a shift in focus from subcellular tracking of IDP-scaffolds to higher-level observations that report on the assembly and liquid-dynamics of their condensates. Towards advancing the repertoire of intracellular LLPS-sensors, here we demonstrate biomolecular approaches for the evolution and tunability of epidermal LLPS-sensors and assess their impact in early and late stages of intracellular LLPS dynamics. Benchmarking against scaffold-bound fluorescent reporters, we found that tunable ultraweak scaffold-sensor interactions are key to the sensitive and innocuous probing of nascent and established biomolecular condensates. Our LLPS-sensitive tools pave the way for the high-fidelity intracellular probing of IDP-governed biomolecular condensates across biological systems.
Collapse
Affiliation(s)
- Alexa Regina Chua Avecilla
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
| | - Jeremy Thomas
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
| | - Felipe Garcia Quiroz
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA
| |
Collapse
|
31
|
Pesce F, Bremer A, Tesei G, Hopkins JB, Grace CR, Mittag T, Lindorff-Larsen K. Design of intrinsically disordered protein variants with diverse structural properties. SCIENCE ADVANCES 2024; 10:eadm9926. [PMID: 39196930 PMCID: PMC11352843 DOI: 10.1126/sciadv.adm9926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 06/07/2024] [Indexed: 08/30/2024]
Abstract
Intrinsically disordered proteins (IDPs) perform a broad range of functions in biology, suggesting that the ability to design IDPs could help expand the repertoire of proteins with novel functions. Computational design of IDPs with specific conformational properties has, however, been difficult because of their substantial dynamics and structural complexity. We describe a general algorithm for designing IDPs with specific structural properties. We demonstrate the power of the algorithm by generating variants of naturally occurring IDPs that differ in compaction, long-range contacts, and propensity to phase separate. We experimentally tested and validated our designs and analyzed the sequence features that determine conformations. We show how our results are captured by a machine learning model, enabling us to speed up the algorithm. Our work expands the toolbox for computational protein design and will facilitate the design of proteins whose functions exploit the many properties afforded by protein disorder.
Collapse
Affiliation(s)
- Francesco Pesce
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Anne Bremer
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Giulio Tesei
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jesse B. Hopkins
- BioCAT, Department of Physics, Illinois Institute of Technology, Chicago, IL 60616, USA
| | - Christy R. Grace
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Tanja Mittag
- Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
32
|
Aupič J, Pokorná P, Ruthstein S, Magistrato A. Predicting Conformational Ensembles of Intrinsically Disordered Proteins: From Molecular Dynamics to Machine Learning. J Phys Chem Lett 2024; 15:8177-8186. [PMID: 39093570 DOI: 10.1021/acs.jpclett.4c01544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Intrinsically disordered proteins and regions (IDP/IDRs) are ubiquitous across all domains of life. Characterized by a lack of a stable tertiary structure, IDP/IDRs populate a diverse set of transiently formed structural states that can promiscuously adapt upon binding with specific interaction partners and/or certain alterations in environmental conditions. This malleability is foundational for their role as tunable interaction hubs in core cellular processes such as signaling, transcription, and translation. Tracing the conformational ensemble of an IDP/IDR and its perturbation in response to regulatory cues is thus paramount for illuminating its function. However, the conformational heterogeneity of IDP/IDRs poses several challenges. Here, we review experimental and computational methods devised to disentangle the conformational landscape of IDP/IDRs, highlighting recent computational advances that permit proteome-wide scans of IDP/IDRs conformations. We briefly evaluate selected computational methods using the disordered N-terminal of the human copper transporter 1 as a test case and outline further challenges in IDP/IDRs ensemble prediction.
Collapse
Affiliation(s)
- Jana Aupič
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| | - Pavlína Pokorná
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| | - Sharon Ruthstein
- Department of Chemistry, Faculty of Exact Sciences and the Institute for Nanotechnology and Advanced Materials (BINA), Bar-Ilan University, 5290002 Ramat-Gan, Israel
| | - Alessandra Magistrato
- CNR-IOM at International School for Advanced Studies (SISSA/ISAS), via Bonomea 265, 34136 Trieste, Italy
| |
Collapse
|
33
|
Pal T, Wessén J, Das S, Chan HS. Differential Effects of Sequence-Local versus Nonlocal Charge Patterns on Phase Separation and Conformational Dimensions of Polyampholytes as Model Intrinsically Disordered Proteins. J Phys Chem Lett 2024; 15:8248-8256. [PMID: 39105804 DOI: 10.1021/acs.jpclett.4c01973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
Conformational properties of intrinsically disordered proteins (IDPs) are governed by a sequence-ensemble relationship. To differentiate the impact of sequence-local versus sequence-nonlocal features of an IDP's charge pattern on its conformational dimensions and its phase-separation propensity, the charge "blockiness" κ and the nonlocality-weighted sequence charge decoration (SCD) parameters are compared for their correlations with isolated-chain radii of gyration (Rgs) and upper critical solution temperatures (UCSTs) of polyampholytes modeled by random phase approximation, field-theoretic simulation, and coarse-grained molecular dynamics. SCD is superior to κ in predicting Rg because SCD accounts for effects of contact order, i.e., nonlocality, on dimensions of isolated chains. In contrast, κ and SCD are comparably good, though nonideal, predictors of UCST because frequencies of interchain contacts in the multiple-chain condensed phase are less sensitive to sequence positions than frequencies of intrachain contacts of an isolated chain, as reflected by κ correlating better with condensed-phase interaction energy than SCD.
Collapse
Affiliation(s)
- Tanmoy Pal
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Jonas Wessén
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Suman Das
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Department of Chemistry, Gandhi Institute of Technology and Management, Visakhapatnam, Andhra Pradesh 530045, India
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
34
|
Cagliani R, Forni D, Mozzi A, Fuchs R, Tussia-Cohen D, Arrigoni F, Pozzoli U, De Gioia L, Hagai T, Sironi M. Evolution of Virus-like Features and Intrinsically Disordered Regions in Retrotransposon-derived Mammalian Genes. Mol Biol Evol 2024; 41:msae154. [PMID: 39101471 PMCID: PMC11299033 DOI: 10.1093/molbev/msae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/16/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024] Open
Abstract
Several mammalian genes have originated from the domestication of retrotransposons, selfish mobile elements related to retroviruses. Some of the proteins encoded by these genes have maintained virus-like features; including self-processing, capsid structure formation, and the generation of different isoforms through -1 programmed ribosomal frameshifting. Using quantitative approaches in molecular evolution and biophysical analyses, we studied 28 retrotransposon-derived genes, with a focus on the evolution of virus-like features. By analyzing the rate of synonymous substitutions, we show that the -1 programmed ribosomal frameshifting mechanism in three of these genes (PEG10, PNMA3, and PNMA5) is conserved across mammals and originates alternative proteins. These genes were targets of positive selection in primates, and one of the positively selected sites affects a B-cell epitope on the spike domain of the PNMA5 capsid, a finding reminiscent of observations in infectious viruses. More generally, we found that retrotransposon-derived proteins vary in their intrinsically disordered region content and this is directly associated with their evolutionary rates. Most positively selected sites in these proteins are located in intrinsically disordered regions and some of them impact protein posttranslational modifications, such as autocleavage and phosphorylation. Detailed analyses of the biophysical properties of intrinsically disordered regions showed that positive selection preferentially targeted regions with lower conformational entropy. Furthermore, positive selection introduces variation in binary sequence patterns across orthologues, as well as in chain compaction. Our results shed light on the evolutionary trajectories of a unique class of mammalian genes and suggest a novel approach to study how intrinsically disordered region biophysical characteristics are affected by evolution.
Collapse
Affiliation(s)
- Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Rotem Fuchs
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dafna Tussia-Cohen
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan 20126, Italy
| | - Tzachi Hagai
- Shmunis School of Biomedicine and Cancer Research, George S Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Computational Biology Unit, Bosisio Parini 23842, Italy
| |
Collapse
|
35
|
Hutchins CM, Gorfe AA. From disorder comes function: Regulation of small GTPase function by intrinsically disordered lipidated membrane anchor. Curr Opin Struct Biol 2024; 87:102869. [PMID: 38943706 PMCID: PMC11283958 DOI: 10.1016/j.sbi.2024.102869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/23/2024] [Accepted: 06/04/2024] [Indexed: 07/01/2024]
Abstract
The intrinsically disordered, lipid-modified membrane anchor of small GTPases is emerging as a critical modulator of function through its ability to sort lipids in a conformation-dependent manner. We reviewed recent computational and experimental studies that have begun to shed light on the sequence-ensemble-function relationship in this unique class of lipidated intrinsically disordered regions (LIDRs).
Collapse
Affiliation(s)
- Chase M Hutchins
- Department of Integrative Biology and Pharmacology, McGovern Medical School, University of Texas Health Science Center at Houston, 6431 Fannin St., Houston, TX 77030, USA; Biochemistry and Cell Biology Program & Therapeutics and Pharmacology Program, MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, 6431 Fannin St., Houston, TX 77030, USA. https://twitter.com/chasedsims
| | - Alemayehu A Gorfe
- Department of Integrative Biology and Pharmacology, McGovern Medical School, University of Texas Health Science Center at Houston, 6431 Fannin St., Houston, TX 77030, USA; Biochemistry and Cell Biology Program & Therapeutics and Pharmacology Program, MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, 6431 Fannin St., Houston, TX 77030, USA.
| |
Collapse
|
36
|
Erdős G, Dosztányi Z. AIUPred: combining energy estimation with deep learning for the enhanced prediction of protein disorder. Nucleic Acids Res 2024; 52:W176-W181. [PMID: 38747347 PMCID: PMC11223784 DOI: 10.1093/nar/gkae385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/19/2024] [Accepted: 05/07/2024] [Indexed: 07/06/2024] Open
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) carry out important biological functions without relying on a single well-defined conformation. As these proteins are a challenge to study experimentally, computational methods play important roles in their characterization. One of the commonly used tools is the IUPred web server which provides prediction of disordered regions and their binding sites. IUPred is rooted in a simple biophysical model and uses a limited number of parameters largely derived on globular protein structures only. This enabled an incredibly fast and robust prediction method, however, its limitations have also become apparent in light of recent breakthrough methods using deep learning techniques. Here, we present AIUPred, a novel version of IUPred which incorporates deep learning techniques into the energy estimation framework. It achieves improved performance while keeping the robustness of the original method. Based on the evaluation of recent benchmark datasets, AIUPred scored amongst the top three single sequence based methods. With a new web server we offer fast and reliable visual analysis for users as well as options to analyze whole genomes in mere seconds with the downloadable package. AIUPred is available at https://aiupred.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| |
Collapse
|
37
|
Jung J, Yagi K, Tan C, Oshima H, Mori T, Yu I, Matsunaga Y, Kobayashi C, Ito S, Ugarte La Torre D, Sugita Y. GENESIS 2.1: High-Performance Molecular Dynamics Software for Enhanced Sampling and Free-Energy Calculations for Atomistic, Coarse-Grained, and Quantum Mechanics/Molecular Mechanics Models. J Phys Chem B 2024; 128:6028-6048. [PMID: 38876465 PMCID: PMC11215777 DOI: 10.1021/acs.jpcb.4c02096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 05/15/2024] [Accepted: 05/21/2024] [Indexed: 06/16/2024]
Abstract
GENeralized-Ensemble SImulation System (GENESIS) is a molecular dynamics (MD) software developed to simulate the conformational dynamics of a single biomolecule, as well as molecular interactions in large biomolecular assemblies and between multiple biomolecules in cellular environments. To achieve the latter purpose, the earlier versions of GENESIS emphasized high performance in atomistic MD simulations on massively parallel supercomputers, with or without graphics processing units (GPUs). Here, we implemented multiscale MD simulations that include atomistic, coarse-grained, and hybrid quantum mechanics/molecular mechanics (QM/MM) calculations. They demonstrate high performance and are integrated with enhanced conformational sampling algorithms and free-energy calculations without using external programs except for the QM programs. In this article, we review new functions, molecular models, and other essential features in GENESIS version 2.1 and discuss ongoing developments for future releases.
Collapse
Affiliation(s)
- Jaewoon Jung
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
| | - Kiyoshi Yagi
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
| | - Cheng Tan
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
| | - Hiraku Oshima
- Laboratory
for Biomolecular Function Simulation, RIKEN
Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047, Japan
- Graduate
School of Life Science, University of Hyogo, Harima Science Park City, Hyogo 678-1297, Japan
| | - Takaharu Mori
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Department
of Chemistry, Tokyo University of Science, Shinjuku-ku, Tokyo 162-8601, Japan
| | - Isseki Yu
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Department
of Bioinformatics, Maebashi Institute of
Technology, Maebashi, Gunma 371-0816, Japan
| | - Yasuhiro Matsunaga
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- Graduate
School of Science and Engineering, Saitama
University, Saitama 338-8570, Japan
| | - Chigusa Kobayashi
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
| | - Shingo Ito
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
| | - Diego Ugarte La Torre
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
| | - Yuji Sugita
- Computational
Biophysics Research Team, RIKEN Center for
Computational Science, Kobe, Hyogo 650-0047, Japan
- Theoretical
Molecular Science Laboratory, RIKEN Cluster
for Pioneering Research, Wako, Saitama 351-0198, Japan
- Laboratory
for Biomolecular Function Simulation, RIKEN
Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
38
|
Chen J, Li Q, Xia S, Arsala D, Sosa D, Wang D, Long M. The Rapid Evolution of De Novo Proteins in Structure and Complex. Genome Biol Evol 2024; 16:evae107. [PMID: 38753069 PMCID: PMC11149777 DOI: 10.1093/gbe/evae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2024] [Indexed: 06/06/2024] Open
Abstract
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
Collapse
Affiliation(s)
- Jianhai Chen
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Qingrong Li
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| | - Dong Wang
- Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cellular & Molecular Medicine, School of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
39
|
Ginell GM, Emenecker RJ, Lotthammer JM, Usher ET, Holehouse AS. Direct prediction of intermolecular interactions driven by disordered regions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.597104. [PMID: 38895487 PMCID: PMC11185574 DOI: 10.1101/2024.06.03.597104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Intrinsically disordered regions (IDRs) are critical for a wide variety of cellular functions, many of which involve interactions with partner proteins. Molecular recognition is typically considered through the lens of sequence-specific binding events. However, a growing body of work has shown that IDRs often interact with partners in a manner that does not depend on the precise order of the amino acid order, instead driven by complementary chemical interactions leading to disordered bound-state complexes. Despite this emerging paradigm, we lack tools to describe, quantify, predict, and interpret these types of structurally heterogeneous interactions from the underlying amino acid sequences. Here, we repurpose the chemical physics developed originally for molecular simulations to develop an approach for predicting intermolecular interactions between IDRs and partner proteins. Our approach enables the direct prediction of phase diagrams, the identification of chemically-specific interaction hotspots on IDRs, and a route to develop and test mechanistic hypotheses regarding IDR function in the context of molecular recognition. We use our approach to examine a range of systems and questions to highlight its versatility and applicability.
Collapse
Affiliation(s)
- Garrett M. Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Ryan. J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Jeffrey M. Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Emery T. Usher
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| | - Alex S. Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO
| |
Collapse
|
40
|
Khan MI, Pathania S, Al-Rabia MW, Ethayathulla AS, Khan MI, Allemailem KS, Azam M, Hariprasad G, Imran MA. MolDy: molecular dynamics simulation made easy. Bioinformatics 2024; 40:btae313. [PMID: 38867698 PMCID: PMC11187490 DOI: 10.1093/bioinformatics/btae313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/26/2024] [Accepted: 06/11/2024] [Indexed: 06/14/2024] Open
Abstract
MOTIVATION Molecular dynamics (MD) is a computational experiment that is crucial for understanding the structure of biological macro and micro molecules, their folding, and the inter-molecular interactions. Accurate knowledge of these structural features is the cornerstone in drug development and elucidating macromolecules functions. The open-source GROMACS biomolecular MD simulation program is recognized as a reliable and frequently used simulation program for its precision. However, the user requires expertise, and scripting skills to carrying out MD simulations. RESULTS We have developed an end-to-end interactive MD simulation application, MolDy for Gromacs. This front-end application provides a customizable user interface integrated with the Python and Perl-based logical backend connecting the Linux shell and Gromacs software. The tool performs analysis and provides the user with simulation trajectories and graphical representations of relevant biophysical parameters. The advantages of MolDy are (i) user-friendly, does not requiring the researcher to have prior knowledge of Linux; (ii) easy installation by a single command; (iii) freely available for academic research; (iv) can run with minimum configuration of operating systems; (v) has valid default prefilled parameters for beginners, and at the same time provides scope for modifications for expert users. AVAILABILITY AND IMPLEMENTATION MolDy is available freely as compressed source code files with user manual for installation and operation on GitHub: https://github.com/AIBResearchMolDy/Moldyv01.git and on https://aibresearch.com/innovations.
Collapse
Affiliation(s)
- Mohd Imran Khan
- Division of Bioinformatics, AIBR Artificial Intelligence and Biochemical Research Pvt. Ltd., New Delhi 110076, India
| | - Sheetal Pathania
- Division of Bioinformatics, AIBR Artificial Intelligence and Biochemical Research Pvt. Ltd., New Delhi 110076, India
| | - Mohammed W Al-Rabia
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, King Abdul Aziz University, Jeddah 21589, Saudi Arabia
- Department of Clinical and Molecular Microbiology Laboratory, King Abdulaziz University Hospital, Jeddah 21589, Saudi Arabia
| | - Abdul S Ethayathulla
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi 110029, India
| | - Mohammad Imran Khan
- Research Center, King Faisal Specialist Hospital and Research Center, Jeddah 21589, Saudi Arabia
| | - Khaled S Allemailem
- Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia
| | - Mohd Azam
- Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia
| | - Gururao Hariprasad
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi 110029, India
| | - Mohammad Azhar Imran
- Division of Bioinformatics, AIBR Artificial Intelligence and Biochemical Research Pvt. Ltd., New Delhi 110076, India
| |
Collapse
|
41
|
Hutson M. Software tools identify forgotten genes. Nature 2024:10.1038/d41586-024-01548-w. [PMID: 38789607 DOI: 10.1038/d41586-024-01548-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
|
42
|
Waszkiewicz R, Michaś A, Białobrzewski MK, Klepka BP, Cieplak-Rotowska MK, Staszałek Z, Cichocki B, Lisicki M, Szymczak P, Niedzwiecka A. Hydrodynamic Radii of Intrinsically Disordered Proteins: Fast Prediction by Minimum Dissipation Approximation and Experimental Validation. J Phys Chem Lett 2024; 15:5024-5033. [PMID: 38696815 PMCID: PMC11103702 DOI: 10.1021/acs.jpclett.4c00312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/12/2024] [Accepted: 04/26/2024] [Indexed: 05/04/2024]
Abstract
The diffusion coefficients of globular and fully unfolded proteins can be predicted with high accuracy solely from their mass or chain length. However, this approach fails for intrinsically disordered proteins (IDPs) containing structural domains. We propose a rapid predictive methodology for estimating the diffusion coefficients of IDPs. The methodology uses accelerated conformational sampling based on self-avoiding random walks and includes hydrodynamic interactions between coarse-grained protein subunits, modeled using the generalized Rotne-Prager-Yamakawa approximation. To estimate the hydrodynamic radius, we rely on the minimum dissipation approximation recently introduced by Cichocki et al. Using a large set of experimentally measured hydrodynamic radii of IDPs over a wide range of chain lengths and domain contributions, we demonstrate that our predictions are more accurate than the Kirkwood approximation and phenomenological approaches. Our technique may prove to be valuable in predicting the hydrodynamic properties of both fully unstructured and multidomain disordered proteins.
Collapse
Affiliation(s)
- Radost Waszkiewicz
- Institute
of Theoretical Physics, Faculty of Physics, University of Warsaw, L. Pasteura 5, 02-093 Warsaw, Poland
| | - Agnieszka Michaś
- Institute
of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668 Warsaw, Poland
| | - Michał K. Białobrzewski
- Institute
of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668 Warsaw, Poland
| | - Barbara P. Klepka
- Institute
of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668 Warsaw, Poland
| | | | - Zuzanna Staszałek
- Institute
of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668 Warsaw, Poland
| | - Bogdan Cichocki
- Institute
of Theoretical Physics, Faculty of Physics, University of Warsaw, L. Pasteura 5, 02-093 Warsaw, Poland
| | - Maciej Lisicki
- Institute
of Theoretical Physics, Faculty of Physics, University of Warsaw, L. Pasteura 5, 02-093 Warsaw, Poland
| | - Piotr Szymczak
- Institute
of Theoretical Physics, Faculty of Physics, University of Warsaw, L. Pasteura 5, 02-093 Warsaw, Poland
| | - Anna Niedzwiecka
- Institute
of Physics, Polish Academy of Sciences, Aleja Lotnikow 32/46, PL-02668 Warsaw, Poland
| |
Collapse
|
43
|
Song FV, Su J, Huang S, Zhang N, Li K, Ni M, Liao M. DeepSS2GO: protein function prediction from secondary structure. Brief Bioinform 2024; 25:bbae196. [PMID: 38701416 PMCID: PMC11066904 DOI: 10.1093/bib/bbae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/31/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open
Abstract
Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.
Collapse
Affiliation(s)
- Fu V Song
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Jiaqi Su
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Sixing Huang
- Gemini Data Japan, Kitaku Oujikamiya 1-11-11, 115-0043, Tokyo, Japan
| | - Neng Zhang
- Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, E1 4NS, London, UK
| | - Kaiyue Li
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| | - Ming Ni
- MGI Tech, Beishan Industrial Zone, 518083, Shenzhen, China
| | - Maofu Liao
- Department of Chemical Biology, School of Life Sciences, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
- Institute for Biological Electron Microscopy, Southern University of Science and Technology, Xueyuan Avenue, 518055, Shenzhen, China
| |
Collapse
|
44
|
Lotthammer JM, Ginell GM, Griffith D, Emenecker RJ, Holehouse AS. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat Methods 2024; 21:465-476. [PMID: 38297184 PMCID: PMC10927563 DOI: 10.1038/s41592-023-02159-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 12/20/2023] [Indexed: 02/02/2024]
Abstract
Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes.
Collapse
Affiliation(s)
- Jeffrey M Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Garrett M Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
45
|
An easy-to-use computational tool for predicting 3D properties of disordered proteins. Nat Methods 2024; 21:385-386. [PMID: 38297185 DOI: 10.1038/s41592-023-02160-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
|