1
|
Hasenahuer MA, Sanchis-Juan A, Laskowski RA, Baker JA, Stephenson JD, Orengo CA, Raymond FL, Thornton JM. Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins. J Mol Biol 2023; 435:167892. [PMID: 36410474 PMCID: PMC9875310 DOI: 10.1016/j.jmb.2022.167892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 11/08/2022] [Accepted: 11/14/2022] [Indexed: 11/23/2022]
Abstract
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
Collapse
Affiliation(s)
- Marcia A. Hasenahuer
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK,Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK,Corresponding author at: European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK. @MarHasenahuer
| | - Alba Sanchis-Juan
- Department of Haematology, NHS Blood and Transplant Centre, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Roman A. Laskowski
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James A. Baker
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James D. Stephenson
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - F. Lucy Raymond
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, UK,NIHR BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Janet M. Thornton
- European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
2
|
Dey A, Maiti S. Determining the Stoichiometry of Amyloid Oligomers by Single-Molecule Photobleaching. Methods Mol Biol 2022; 2538:55-74. [PMID: 35951293 DOI: 10.1007/978-1-0716-2529-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Small oligomers are the initial intermediates in the pathway to amyloid fibril formation. They have a distinct identity from the monomers as well as from the protofibrils and the fibrils, both in their structure and in their properties. In many cases, they play a crucial biological role. However, due to their transient nature, they are difficult to characterize. "Oligomer" is a diffuse definition, encompassing aggregates of many different sizes, and this lack of precise definition causes much confusion and disagreement between different research groups. Here, we define the small oligomers as "n"-mers with n < 10, which is the size range in which the amyloid proteins typically exist at the initial phase of the aggregation process. Since the oligomers dynamically interconvert into each other, a solution of aggregating amyloid proteins will contain a distribution of sizes. A precise characterization of an oligomeric solution will, therefore, require quantification of the relative population of each size. Size-based separation methods, such as size-exclusion chromatography, are typically used to characterize this distribution. However, if the interconversion between oligomers of different sizes is fast, this would not yield reliable results. Single-molecule photobleaching (smPB) is a direct method to evaluate this size distribution in a heterogeneous solution without separation. In addition, understanding the mechanism of action of amyloid oligomers requires knowing the affinity of each oligomer type to different cellular components, such as the cell membrane. These measurements are also amenable to smPB. Here we show how to perform smPB, both for oligomers in solution and for oligomers attached to the membrane.
Collapse
Affiliation(s)
- Arpan Dey
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India
| | - Sudipta Maiti
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India.
| |
Collapse
|
3
|
Vishvakarma V, Maiti S. Measuring the Size and Spontaneous Fluctuations of Amyloid Aggregates with Fluorescence Correlation Spectroscopy. Methods Mol Biol 2022; 2538:35-54. [PMID: 35951292 DOI: 10.1007/978-1-0716-2529-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Bacterial amyloids decorate the cell surface of many bacteria by forming functional amyloid fibers. These amyloids have structural and biochemical similarities with many disease-related amyloids in eukaryotes. Amyloid aggregation starts at the individual monomer level, and the end product is the amyloid fibril. The process of amyloid aggregation involves a continuous increase of the aggregate size, and therefore size is a critical parameter to measure in aggregation experiments. Also, our understanding of the aggregation process, and our ability to design interventions, can benefit from a measurement of the conformational dynamics of proteins undergoing aggregation. Fluorescence correlation spectroscopy (FCS) is perhaps the most sensitive and rapid technique available currently for this purpose. It can measure the average size and the size distribution of molecules and aggregates down to sub-nm length scales and can also measure fast nanosecond time-scale conformational dynamics, all in an equilibrium solution. FCS achieves this by measuring the fluorescence intensity fluctuations of freely diffusing protein molecules in an optically defined microscopic probe volume in a solution. Here, we present a set of instructions for effectively measuring the size and dynamics of amyloid aggregates with FCS.
Collapse
Affiliation(s)
- Vicky Vishvakarma
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India
| | - Sudipta Maiti
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India.
| |
Collapse
|
4
|
Shillcock JC, Brochut M, Chénais E, Ipsen JH. Phase behaviour and structure of a model biomolecular condensate. SOFT MATTER 2020; 16:6413-6423. [PMID: 32584357 DOI: 10.1039/d0sm00813c] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Phase separation of immiscible fluids is a common phenomenon in polymer chemistry, and is recognized as an important mechanism by which cells compartmentalize their biochemical reactions. Biomolecular condensates are condensed fluid droplets in cells that form by liquid-liquid phase separation of intrinsically-disordered proteins. They have a wide range of functions and are associated with chronic neurodegenerative diseases in which they become pathologically rigid. However, it remains unclear how their material properties depend on the molecular structure of the proteins. Here we explore the phase behaviour and structure of a model biomolecular condensate composed of semi-flexible polymers with attractive end-caps using coarse-grained simulations. The model contains the minimal molecular features that are sufficient to observe liquid-liquid phase separation of soluble polymers into a porous, three-dimensional network in which their end-caps reversibly bind at junctions. The distance between connected junctions scales with the polymer length as a self-avoiding random walk over a wide range of concentration with a weak affinity-dependent prefactor. By contrast, the average number of polymers that meet at the junctions depends on the end-cap affinity but only weakly on the polymer length. The structured porosity of the condensed phase suggests a mechanism for cells to regulate biomolecular condensates. Protein interaction sites may be turned on or off to modulate the condensate's porosity and therefore the diffusion and interaction of additional proteins.
Collapse
Affiliation(s)
- J C Shillcock
- Laboratory of Molecular and Chemical Biology of Neurodegeneration, Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland.
| | | | | | | |
Collapse
|
5
|
The Order-Disorder Continuum: Linking Predictions of Protein Structure and Disorder through Molecular Simulation. Sci Rep 2020; 10:2068. [PMID: 32034199 PMCID: PMC7005769 DOI: 10.1038/s41598-020-58868-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/16/2019] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions within proteins (IDRs) serve an increasingly expansive list of biological functions, including regulation of transcription and translation, protein phosphorylation, cellular signal transduction, as well as mechanical roles. The strong link between protein function and disorder motivates a deeper fundamental characterization of IDPs and IDRs for discovering new functions and relevant mechanisms. We review recent advances in experimental techniques that have improved identification of disordered regions in proteins. Yet, experimentally curated disorder information still does not currently scale to the level of experimentally determined structural information in folded protein databases, and disorder predictors rely on several different binary definitions of disorder. To link secondary structure prediction algorithms developed for folded proteins and protein disorder predictors, we conduct molecular dynamics simulations on representative proteins from the Protein Data Bank, comparing secondary structure and disorder predictions with simulation results. We find that structure predictor performance from neural networks can be leveraged for the identification of highly dynamic regions within molecules, linked to disorder. Low accuracy structure predictions suggest a lack of static structure for regions that disorder predictors fail to identify. While disorder databases continue to expand, secondary structure predictors and molecular simulations can improve disorder predictor performance, which aids discovery of novel functions of IDPs and IDRs. These observations provide a platform for the development of new, integrated structural databases and fusion of prediction tools toward protein disorder characterization in health and disease.
Collapse
|