1
|
Basu S, Kurgan L. Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses. Comput Struct Biotechnol J 2024; 23:1968-1977. [PMID: 38765610 PMCID: PMC11098722 DOI: 10.1016/j.csbj.2024.04.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
Intrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
2
|
Chen G, Zhang Z. IDRWalker: A Random Walk Based Tool for Generating Intrinsically Disordered Regions in Large Protein Complexes. ACS OMEGA 2024; 9:32059-32065. [PMID: 39072126 PMCID: PMC11270708 DOI: 10.1021/acsomega.4c04161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/16/2024] [Accepted: 06/27/2024] [Indexed: 07/30/2024]
Abstract
Intrinsically disordered regions (IDRs), which may be functionally important, are common in proteins. However, the structures of IDRs are often missing due to their highly dynamic nature. In the study of IDRs, integrative modeling combining computational simulations and experimental data is a common approach, for which initial structures of the IDRs need to be built. However, applying this method to large protein complexes is challenging because existing structure generation tools are sometimes unsuitable for IDRs in large systems. To facilitate convenient and rapid structure generation of IDRs in large protein complexes, we developed a computational tool named IDRWalker based on self-avoiding random walks. Three protein complexes were used to illustrate the efficiency of the tool, and it was found that IDRs in more than 800 chains of the nuclear pore complex could be generated in minutes. These structures of large protein complexes with added IDRs can be further used to run computational simulations for integrative modeling.
Collapse
Affiliation(s)
- Guanglin Chen
- Department
of Physics, University of Science and Technology
of China, Hefei, Anhui 230026, PR China
| | - Zhiyong Zhang
- Department
of Physics, University of Science and Technology
of China, Hefei, Anhui 230026, PR China
- MOE
Key Laboratory for Cellular Dynamics, University of Science and Technology
of China, Hefei, Anhui 230026, PR
China
| |
Collapse
|
3
|
Maiti S, Singh A, Maji T, Saibo NV, De S. Experimental methods to study the structure and dynamics of intrinsically disordered regions in proteins. Curr Res Struct Biol 2024; 7:100138. [PMID: 38707546 PMCID: PMC11068507 DOI: 10.1016/j.crstbi.2024.100138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/12/2024] [Accepted: 03/15/2024] [Indexed: 05/07/2024] Open
Abstract
Eukaryotic proteins often feature long stretches of amino acids that lack a well-defined three-dimensional structure and are referred to as intrinsically disordered proteins (IDPs) or regions (IDRs). Although these proteins challenge conventional structure-function paradigms, they play vital roles in cellular processes. Recent progress in experimental techniques, such as NMR spectroscopy, single molecule FRET, high speed AFM and SAXS, have provided valuable insights into the biophysical basis of IDP function. This review discusses the advancements made in these techniques particularly for the study of disordered regions in proteins. In NMR spectroscopy new strategies such as 13C detection, non-uniform sampling, segmental isotope labeling, and rapid data acquisition methods address the challenges posed by spectral overcrowding and low stability of IDPs. The importance of various NMR parameters, including chemical shifts, hydrogen exchange rates, and relaxation measurements, to reveal transient secondary structures within IDRs and IDPs are presented. Given the high flexibility of IDPs, the review outlines NMR methods for assessing their dynamics at both fast (ps-ns) and slow (μs-ms) timescales. IDPs exert their functions through interactions with other molecules such as proteins, DNA, or RNA. NMR-based titration experiments yield insights into the thermodynamics and kinetics of these interactions. Detailed study of IDPs requires multiple experimental techniques, and thus, several methods are described for studying disordered proteins, highlighting their respective advantages and limitations. The potential for integrating these complementary techniques, each offering unique perspectives, is explored to achieve a comprehensive understanding of IDPs.
Collapse
Affiliation(s)
| | - Aakanksha Singh
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Tanisha Maji
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Nikita V. Saibo
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| | - Soumya De
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India
| |
Collapse
|
4
|
Hannon CE, Eisen MB. Intrinsic protein disorder is insufficient to drive subnuclear clustering in embryonic transcription factors. eLife 2024; 12:RP88221. [PMID: 38275292 PMCID: PMC10945700 DOI: 10.7554/elife.88221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open
Abstract
Modern microscopy has revealed that core nuclear functions, including transcription, replication, and heterochromatin formation, occur in spatially restricted clusters. Previous work from our lab has shown that subnuclear high-concentration clusters of transcription factors may play a role in regulating RNA synthesis in the early Drosophila embryo. A nearly ubiquitous feature of eukaryotic transcription factors is that they contain intrinsically disordered regions (IDRs) that often arise from low complexity amino acid sequences within the protein. It has been proposed that IDRs within transcription factors drive co-localization of transcriptional machinery and target genes into high-concentration clusters within nuclei. Here, we test that hypothesis directly, by conducting a broad survey of the subnuclear localization of IDRs derived from transcription factors. Using a novel algorithm to identify IDRs in the Drosophila proteome, we generated a library of IDRs from transcription factors expressed in the early Drosophila embryo. We used this library to perform a high-throughput imaging screen in Drosophila Schneider-2 (S2) cells. We found that while subnuclear clustering does not occur when the majority of IDRs are expressed alone, it is frequently seen in full-length transcription factors. These results are consistent in live Drosophila embryos, suggesting that IDRs are insufficient to drive the subnuclear clustering behavior of transcription factors. Furthermore, the clustering of transcription factors in living embryos was unaffected by the deletion of IDR sequences. Our results demonstrate that IDRs are unlikely to be the primary molecular drivers of the clustering observed during transcription, suggesting a more complex and nuanced role for these disordered protein sequences.
Collapse
Affiliation(s)
- Colleen E Hannon
- Howard Hughes Medical Institute, University of CaliforniaBerkeleyUnited States
| | - Michael B Eisen
- Howard Hughes Medical Institute, University of CaliforniaBerkeleyUnited States
| |
Collapse
|
5
|
Pitman C, Santiago-McRae E, Lohia R, Bassi K, Joseph TT, Hansen MEB, Brannigan G. The blobulator: a webtool for identification and visual exploration of hydrophobic modularity in protein sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.15.575761. [PMID: 38293114 PMCID: PMC10827107 DOI: 10.1101/2024.01.15.575761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Motivation Clusters of hydrophobic residues are known to promote structured protein stability and drive protein aggregation. Recent work has shown that identifying contiguous hydrophobic residue clusters (termed "blobs") has proven useful in both intrinsically disordered protein (IDP) simulation and human genome studies. However, a graphical interface was unavailable. Results Here, we present the blobulator: an interactive and intuitive web interface to detect intrinsic modularity in any protein sequence based on hydrophobicity. We demonstrate three use cases of the blobulator and show how identifying blobs with biologically relevant parameters provides useful information about a globular protein, two orthologous membrane proteins, and an IDP. Other potential applications are discussed, including: predicting protein segments with critical roles in tertiary interactions, providing a definition of local order and disorder with clear edges, and aiding in predicting protein features from sequence. Availability The blobulator GUI can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.
Collapse
Affiliation(s)
- Connor Pitman
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Ezry Santiago-McRae
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Ruchi Lohia
- Department of Physiology, University of Toronto, 1 King's College Circle, M5S 1A8, Toronto, Ontario, Canada
| | - Kaitlin Bassi
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| | - Thomas T Joseph
- Department of Anesthesiology and Critical Care, Perelman School of Medicine, University of Pennsylvania, JMB 305, 3620 Hamilton Walk, 19104, PA, USA
| | - Matthew E B Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd, 19104, PA, USA
| | - Grace Brannigan
- Center for Computational and Integrative Biology, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
- Department of Physics, Rutgers University-Camden, 201 Broadway, 08103, NJ, USA
| |
Collapse
|
6
|
Basu S, Zhao B, Biró B, Faraggi E, Gsponer J, Hu G, Kloczkowski A, Malhis N, Mirdita M, Söding J, Steinegger M, Wang D, Wang K, Xu D, Zhang J, Kurgan L. DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options. Nucleic Acids Res 2024; 52:D426-D433. [PMID: 37933852 PMCID: PMC10767971 DOI: 10.1093/nar/gkad985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023] Open
Abstract
The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Bálint Biró
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
- Department of Animal Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
| | - Eshel Faraggi
- Physics Department, Indiana University, Indianapolis, IN, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
- Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Duolin Wang
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, P.R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
7
|
Patel KN, Chavda D, Manna M. Molecular Docking of Intrinsically Disordered Proteins: Challenges and Strategies. Methods Mol Biol 2024; 2780:165-201. [PMID: 38987470 DOI: 10.1007/978-1-0716-3985-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Intrinsically disordered proteins (IDPs) are a novel class of proteins that have established a significant importance and attention within a very short period of time. These proteins are essentially characterized by their inherent structural disorder, encoded mainly by their amino acid sequences. The profound abundance of IDPs and intrinsically disordered regions (IDRs) in the biological world delineates their deep-rooted functionality. IDPs and IDRs convey such extensive functionality through their unique dynamic nature, which enables them to carry out huge number of multifaceted biomolecular interactions and make them "interaction hub" of the cellular systems. Additionally, with such widespread functions, their misfunctioning is also intimately associated with multiple diseases. Thus, understanding the dynamic heterogeneity of various IDPs along with their interactions with respective binding partners is an important field with immense potentials in biomolecular research. In this context, molecular docking-based computational approaches have proven to be remarkable in case of ordered proteins. Molecular docking methods essentially model the biomolecular interactions in both structural and energetic terms and use this information to characterize the putative interactions between the two participant molecules. However, direct applications of the conventional docking methods to study IDPs are largely limited by their structural heterogeneity and demands for unique IDP-centric strategies. Thus, in this chapter, we have presented an overview of current methodologies for successful docking operations involving IDPs and IDRs. These specialized methods majorly include the ensemble-based and fragment-based approaches with their own benefits and limitations. More recently, artificial intelligence and machine learning-assisted approaches are also used to significantly reduce the complexity and computational burden associated with various docking applications. Thus, this chapter aims to provide a comprehensive summary of major challenges and recent advancements of molecular docking approaches in the IDP field for their better utilization and greater applicability.Asp (D).
Collapse
Affiliation(s)
- Keyur N Patel
- Applied Phycology and Biotechnology Division, CSIR Central Salt and Marine Chemicals Research Institute, Bhavnagar, Gujarat, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Dhruvil Chavda
- Applied Phycology and Biotechnology Division, CSIR Central Salt and Marine Chemicals Research Institute, Bhavnagar, Gujarat, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India
| | - Moutusi Manna
- Applied Phycology and Biotechnology Division, CSIR Central Salt and Marine Chemicals Research Institute, Bhavnagar, Gujarat, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, India.
| |
Collapse
|
8
|
Song J, Kurgan L. Availability of web servers significantly boosts citations rates of bioinformatics methods for protein function and disorder prediction. BIOINFORMATICS ADVANCES 2023; 3:vbad184. [PMID: 38146538 PMCID: PMC10749743 DOI: 10.1093/bioadv/vbad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/08/2023] [Accepted: 12/15/2023] [Indexed: 12/27/2023]
Abstract
Motivation Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. Results We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.
Collapse
Affiliation(s)
- Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Clayton, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Clayton, VIC 3800, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, United States
| |
Collapse
|
9
|
Sharma B, Mattaparthi VSK. Prediction of interface between regions of varying degrees of order or disorderness in intrinsically disordered proteins from dihedral angles. J Biomol Struct Dyn 2023:1-11. [PMID: 38116756 DOI: 10.1080/07391102.2023.2294837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 12/06/2023] [Indexed: 12/21/2023]
Abstract
Intrinsically disordered proteins (IDPs) are proteins that do not form uniquely defined three-dimensional (3-D) structures. Experimental research on IDPs is difficult since they go against the traditional protein structure-function paradigm. Although there are several predictors of disorder based on amino acid sequences, but very limited based on the 3-D structures of proteins. Dihedral angles have a significant role in predicting protein structure because they establish a protein's backbone, which, coupled with its side chain, establishes its overall shape. Here, we have carried out atomistic Molecular Dynamics (MD) simulations on four different proteins: one ordered protein (Monellin), two partially disordered proteins (p53-TAD and Amyloid beta (Aβ1-42) peptide), and one completely disordered protein (Histatin 5). The MD simulation trajectories for the corresponding four proteins were used to conduct dihedral angle (ϕ and ѱ) analysis. Then, the average dihedral angles for each of the residues were calculated and plotted against the residue index. We noticed steep rises or falls in the average ϕ value at certain locations in the plot. These sudden shifts in the average ϕ value reflect the interface between regions of varying degrees of order or disorderness in intrinsically disordered proteins. Using this method, the probable conformer of a protein with a higher degree of disorder can be found among the ensembles of structures sampled during the MD simulations. The results of our study offer new understandings on precisely identifying regions of various degrees of disorder in intrinsically disordered proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Babli Sharma
- Molecular Modelling and Simulation Laboratory, Department of Molecular Biology and Biotechnology, Tezpur University, Assam, India
| | - Venkata Satish Kumar Mattaparthi
- Molecular Modelling and Simulation Laboratory, Department of Molecular Biology and Biotechnology, Tezpur University, Assam, India
| |
Collapse
|
10
|
Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins 2023; 91:1925-1934. [PMID: 37621223 DOI: 10.1002/prot.26582] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/22/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023]
Abstract
Protein intrinsic disorder (ID) is a complex and context-dependent phenomenon that covers a continuum between fully disordered states and folded states with long dynamic regions. The lack of a ground truth that fits all ID flavors and the potential for order-to-disorder transitions depending on specific conditions makes ID prediction challenging. The CAID2 challenge aimed to evaluate the performance of different prediction methods across different benchmarks, leveraging the annotation provided by the DisProt database, which stores the coordinates of ID regions when there is experimental evidence in the literature. The CAID2 challenge demonstrated varying performance of different prediction methods across different benchmarks, highlighting the need for continued development of more versatile and efficient prediction software. Depending on the application, researchers may need to balance performance with execution time when selecting a predictor. Methods based on AlphaFold2 seem to be good ID predictors but they are better at detecting absence of order rather than ID regions as defined in DisProt. The CAID2 predictors can be freely used through the CAID Prediction Portal, and CAID has been integrated into OpenEBench, which will become the official platform for running future CAID challenges.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Mahta Mehdiabadi
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Adel Bouhraoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
11
|
Yu S, Liao B, Zhu W, Peng D, Wu F. Accurate prediction and key protein sequence feature identification of cyclins. Brief Funct Genomics 2023; 22:411-419. [PMID: 37118891 DOI: 10.1093/bfgp/elad014] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/30/2023] Open
Abstract
Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
12
|
Sigrist SJ, Haucke V. Orchestrating vesicular and nonvesicular membrane dynamics by intrinsically disordered proteins. EMBO Rep 2023; 24:e57758. [PMID: 37680133 PMCID: PMC10626433 DOI: 10.15252/embr.202357758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023] Open
Abstract
Compartmentalization by membranes is a common feature of eukaryotic cells and serves to spatiotemporally confine biochemical reactions to control physiology. Membrane-bound organelles such as the endoplasmic reticulum (ER), the Golgi complex, endosomes and lysosomes, and the plasma membrane, continuously exchange material via vesicular carriers. In addition to vesicular trafficking entailing budding, fission, and fusion processes, organelles can form membrane contact sites (MCSs) that enable the nonvesicular exchange of lipids, ions, and metabolites, or the secretion of neurotransmitters via subsequent membrane fusion. Recent data suggest that biomolecule and information transfer via vesicular carriers and via MCSs share common organizational principles and are often mediated by proteins with intrinsically disordered regions (IDRs). Intrinsically disordered proteins (IDPs) can assemble via low-affinity, multivalent interactions to facilitate membrane tethering, deformation, fission, or fusion. Here, we review our current understanding of how IDPs drive the formation of multivalent protein assemblies and protein condensates to orchestrate vesicular and nonvesicular transport with a special focus on presynaptic neurotransmission. We further discuss how dysfunction of IDPs causes disease and outline perspectives for future research.
Collapse
Affiliation(s)
- Stephan J Sigrist
- Department of Biology, Chemistry, PharmacyFreie Universität BerlinBerlinGermany
| | - Volker Haucke
- Department of Biology, Chemistry, PharmacyFreie Universität BerlinBerlinGermany
- Department of Molecular Pharmacology and Cell BiologyLeibniz Forschungsinstitut für Molekulare Pharmakologie (FMP)BerlinGermany
| |
Collapse
|
13
|
Basu S, Hegedűs T, Kurgan L. CoMemMoRFPred: Sequence-based Prediction of MemMoRFs by Combining Predictors of Intrinsic Disorder, MoRFs and Disordered Lipid-binding Regions. J Mol Biol 2023; 435:168272. [PMID: 37709009 DOI: 10.1016/j.jmb.2023.168272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 09/01/2023] [Accepted: 09/07/2023] [Indexed: 09/16/2023]
Abstract
Molecular recognition features (MoRFs) are a commonly occurring type of intrinsically disordered regions (IDRs) that undergo disorder-to-order transition upon binding to partner molecules. We focus on recently characterized and functionally important membrane-binding MoRFs (MemMoRFs). Motivated by the lack of computational tools that predict MemMoRFs, we use a dataset of experimentally annotated MemMoRFs to conceptualize, design, evaluate and release an accurate sequence-based predictor. We rely on state-of-the-art tools that predict residues that possess key characteristics of MemMoRFs, such as intrinsic disorder, disorder-to-order transition and lipid-binding. We identify and combine results from three tools that include flDPnn for the disorder prediction, DisoLipPred for the prediction of disordered lipid-binding regions, and MoRFCHiBiLight for the prediction of disorder-to-order transitioning protein binding regions. Our empirical analysis demonstrates that combining results produced by these three methods generates accurate predictions of MemMoRFs. We also show that use of a smoothing operator produces predictions that closely mimic the number and sizes of the native MemMoRF regions. The resulting CoMemMoRFPred method is available as an easy-to-use webserver at http://biomine.cs.vcu.edu/servers/CoMemMoRFPred. This tool will aid future studies of MemMoRFs in the context of exploring their abundance, cellular functions, and roles in pathologic phenomena.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Tamás Hegedűs
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary; ELKH-SE Biophysical Virology Research Group, Eötvös Loránd Research Network, Budapest, Hungary
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA.
| |
Collapse
|
14
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
15
|
Gonzalez JP, Frandsen KEH, Kesten C. The role of intrinsic disorder in binding of plant microtubule-associated proteins to the cytoskeleton. Cytoskeleton (Hoboken) 2023; 80:404-436. [PMID: 37578201 DOI: 10.1002/cm.21773] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/28/2023] [Accepted: 07/30/2023] [Indexed: 08/15/2023]
Abstract
Microtubules (MTs) represent one of the main components of the eukaryotic cytoskeleton and support numerous critical cellular functions. MTs are in principle tube-like structures that can grow and shrink in a highly dynamic manner; a process largely controlled by microtubule-associated proteins (MAPs). Plant MAPs are a phylogenetically diverse group of proteins that nonetheless share many common biophysical characteristics and often contain large stretches of intrinsic protein disorder. These intrinsically disordered regions are determinants of many MAP-MT interactions, in which structural flexibility enables low-affinity protein-protein interactions that enable a fine-tuned regulation of MT cytoskeleton dynamics. Notably, intrinsic disorder is one of the major obstacles in functional and structural studies of MAPs and represents the principal present-day challenge to decipher how MAPs interact with MTs. Here, we review plant MAPs from an intrinsic protein disorder perspective, by providing a complete and up-to-date summary of all currently known members, and address the current and future challenges in functional and structural characterization of MAPs.
Collapse
Affiliation(s)
- Jordy Perez Gonzalez
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Kristian E H Frandsen
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| | - Christopher Kesten
- Department for Plant and Environmental Sciences, University of Copenhagen, Frederiksberg C, Denmark
| |
Collapse
|
16
|
Pajkos M, Erdős G, Dosztányi Z. The Origin of Discrepancies between Predictions and Annotations in Intrinsically Disordered Proteins. Biomolecules 2023; 13:1442. [PMID: 37892124 PMCID: PMC10604070 DOI: 10.3390/biom13101442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/05/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
Disorder prediction methods that can discriminate between ordered and disordered regions have contributed fundamentally to our understanding of the properties and prevalence of intrinsically disordered proteins (IDPs) in proteomes as well as their functional roles. However, a recent large-scale assessment of the performance of these methods indicated that there is still room for further improvements, necessitating novel approaches to understand the strengths and weaknesses of individual methods. In this study, we compared two methods, IUPred and disorder prediction, based on the pLDDT scores derived from AlphaFold2 (AF2) models. We evaluated these methods using a dataset from the DisProt database, consisting of experimentally characterized disordered regions and subsets associated with diverse experimental methods and functions. IUPred and AF2 provided consistent predictions in 79% of cases for long disordered regions; however, for 15% of these cases, they both suggested order in disagreement with annotations. These discrepancies arose primarily due to weak experimental support, the presence of intermediate states, or context-dependent behavior, such as binding-induced transitions. Furthermore, AF2 tended to predict helical regions with high pLDDT scores within disordered segments, while IUPred had limitations in identifying linker regions. These results provide valuable insights into the inherent limitations and potential biases of disorder prediction methods.
Collapse
Affiliation(s)
| | | | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös Loránd University, Pázmány Péter Stny 1/c, H-1117 Budapest, Hungary; (M.P.); (G.E.)
| |
Collapse
|
17
|
Tang YJ, Yan K, Zhang X, Tian Y, Liu B. Protein intrinsically disordered region prediction by combining neural architecture search and multi-objective genetic algorithm. BMC Biol 2023; 21:188. [PMID: 37674132 PMCID: PMC10483879 DOI: 10.1186/s12915-023-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Intrinsically disordered regions (IDRs) are widely distributed in proteins and related to many important biological functions. Accurately identifying IDRs is of great significance for protein structure and function analysis. Because the long disordered regions (LDRs) and short disordered regions (SDRs) share different characteristics, the existing predictors fail to achieve better and more stable performance on datasets with different ratios between LDRs and SDRs. There are two main reasons. First, the existing predictors construct network structures based on their own experiences such as convolutional neural network (CNN) which is used to extract the feature of neighboring residues in protein, and long short-term memory (LSTM) is used to extract the long-distance dependencies feature of protein residues. But these networks cannot capture the hidden feature associated with the length-dependent between residues. Second, many algorithms based on deep learning have been proposed but the complementarity of the existing predictors is not fully explored and used. RESULTS In this study, the neural architecture search (NAS) algorithm was employed to automatically construct the network structures so as to capture the hidden features in protein sequences. In order to stably predict both the LDRs and SDRs, the model constructed by NAS was combined with length-dependent models for capturing the unique features of SDRs or LDRs and general models for capturing the common features between LDRs and SDRs. A new predictor called IDP-Fusion was proposed. CONCLUSIONS Experimental results showed that IDP-Fusion can achieve more stable performance than the other existing predictors on independent test sets with different ratios between SDRs and LDRs.
Collapse
Affiliation(s)
- Yi-Jun Tang
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China
| | - Xingyi Zhang
- School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Ye Tian
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Haidian District, No. 5, South Zhongguancun Street, Beijing, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
18
|
Peck Y, Pickering D, Mobli M, Liddell MJ, Wilson DT, Ruscher R, Ryan S, Buitrago G, McHugh C, Love NC, Pinlac T, Haertlein M, Kron MA, Loukas A, Daly NL. Solution structure of the N-terminal extension domain of a Schistosoma japonicum asparaginyl-tRNA synthetase. J Biomol Struct Dyn 2023:1-11. [PMID: 37572327 DOI: 10.1080/07391102.2023.2241918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/24/2023] [Indexed: 08/14/2023]
Abstract
Several secreted proteins from helminths (parasitic worms) have been shown to have immunomodulatory activities. Asparaginyl-tRNA synthetases are abundantly secreted in the filarial nematode Brugia malayi (BmAsnRS) and the parasitic flatworm Schistosoma japonicum (SjAsnRS), indicating a possible immune function. The suggestion is supported by BmAsnRS alleviating disease symptoms in a T-cell transfer mouse model of colitis. This immunomodulatory function is potentially related to an N-terminal extension domain present in eukaryotic AsnRS proteins but few structure/function studies have been done on this domain. Here we have determined the three-dimensional solution structure of the N-terminal extension domain of SjAsnRS. A protein containing the 114 N-terminal amino acids of SjAsnRS was recombinantly expressed with isotopic labelling to allow structure determination using 3D NMR spectroscopy, and analysis of dynamics using NMR relaxation experiments. Structural comparisons of the N-terminal extension domain of SjAsnRS with filarial and human homologues highlight a high degree of variability in the β-hairpin region of these eukaryotic N-AsnRS proteins, but similarities in the disorder of the C-terminal regions. Limitations in PrDOS-based intrinsically disordered region (IDR) model predictions were also evident in this comparison. Empirical structural data such as that presented in our study for N-SjAsnRS will enhance the prediction of sequence-homology based structure modelling and prediction of IDRs in the future.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Yoshimi Peck
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Darren Pickering
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Mehdi Mobli
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, Australia
| | - Michael J Liddell
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
| | - David T Wilson
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Roland Ruscher
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Stephanie Ryan
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Geraldine Buitrago
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Connor McHugh
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | | | - Theresa Pinlac
- Department of Biochemistry, University of the Philippines, Manila, Philippines
| | | | - Michael A Kron
- Department of Medicine, Division of Infectious Diseases, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Alex Loukas
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| | - Norelle L Daly
- Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, QLD, Australia
| |
Collapse
|
19
|
Zhao B, Ghadermarzi S, Kurgan L. Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins. Comput Struct Biotechnol J 2023; 21:3248-3258. [PMID: 38213902 PMCID: PMC10782001 DOI: 10.1016/j.csbj.2023.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 01/13/2024] Open
Abstract
We expand studies of AlphaFold2 (AF2) in the context of intrinsic disorder prediction by comparing it against a broad selection of 20 accurate, popular and recently released disorder predictors. We use 25% larger benchmark dataset with 646 proteins and cover protein-level predictions of disorder content and fully disordered proteins. AF2-based disorder predictions secure a relatively high Area Under receiver operating characteristic Curve (AUC) of 0.77 and are statistically outperformed by several modern disorder predictors that secure AUCs around 0.8 with median runtime of about 20 s compared to 1200 s for AF2. Moreover, AF2 provides modestly accurate predictions of fully disordered proteins (F1 = 0.59 vs. 0.91 for the best disorder predictor) and disorder content (mean absolute error of 0.21 vs. 0.15). AF2 also generates statistically more accurate disorder predictions for about 20% of proteins that have relatively short sequences and a few disordered regions that tend to be located at the sequence termini, and which are absent of disordered protein-binding regions. Interestingly, AF2 and the most accurate disorder predictors rely on deep neural networks, suggesting that these models are useful for protein structure and disorder predictions.
Collapse
Affiliation(s)
- Bi Zhao
- Genomics program, College of Public Health, University of South Florida, Tampa, FL, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
20
|
Abstract
There are over 100 computational predictors of intrinsic disorder. These methods predict amino acid-level propensities for disorder directly from protein sequences. The propensities can be used to annotate putative disordered residues and regions. This unit provides a practical and holistic introduction to the sequence-based intrinsic disorder prediction. We define intrinsic disorder, explain the format of computational prediction of disorder, and identify and describe several accurate predictors. We also introduce recently released databases of intrinsic disorder predictions and use an illustrative example to provide insights into how predictions should be interpreted and combined. Lastly, we summarize key experimental methods that can be used to validate computational predictions. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
21
|
Basu S, Gsponer J, Kurgan L. DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction. Nucleic Acids Res 2023:7151337. [PMID: 37140058 DOI: 10.1093/nar/gkad330] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 04/12/2023] [Accepted: 04/18/2023] [Indexed: 05/05/2023] Open
Abstract
Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
22
|
Pang Y, Liu B. TransDFL: Identification of Disordered Flexible Linkers in Proteins by Transfer Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:359-369. [PMID: 36272675 PMCID: PMC10626177 DOI: 10.1016/j.gpb.2022.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 09/21/2022] [Accepted: 10/14/2022] [Indexed: 11/27/2022]
Abstract
Disordered flexible linkers (DFLs) are the functional disordered regions in proteins, which are the sub-regions of intrinsically disordered regions (IDRs) and play important roles in connecting domains and maintaining inter-domain interactions. Trained with the limited available DFLs, the existing DFL predictors based on the machine learning techniques tend to predict the ordered residues as DFLs, leading to a high falsepositive rate (FPR) and low prediction accuracy. Previous studies have shown that DFLs are extremely flexible disordered regions, which are usually predicted as disordered residues with high confidence [P(D) > 0.9] by an IDR predictor. Therefore, transferring an IDR predictor to an accurate DFL predictor is of great significance for understanding the functions of IDRs. In this study, we proposed a new predictor called TransDFL for identifying DFLs by transferring the RFPR-IDP predictor for IDR identification to the DFL prediction. The RFPR-IDP was pre-trained with IDR sequences to learn the general features between IDRs and DFLs, which is helpful to reduce the false positives in the ordered regions. RFPR-IDP was fine-tuned with the DFL sequences to capture the specific features of DFLs so as to be transferred into the TransDFL. Experimental results of two application scenarios (prediction of DFLs only in IDRs or prediction of DFLs in entire proteins) showed that TransDFL consistently outperformed other existing DFL predictors with higher accuracy. The corresponding web server of TransDFL can be freely accessed at http://bliulab.net/TransDFL/.
Collapse
Affiliation(s)
- Yihe Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
23
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
24
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
25
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
26
|
Peng Z, Li Z, Meng Q, Zhao B, Kurgan L. CLIP: accurate prediction of disordered linear interacting peptides from protein sequences using co-evolutionary information. Brief Bioinform 2023; 24:6858950. [PMID: 36458437 DOI: 10.1093/bib/bbac502] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 12/04/2022] Open
Abstract
One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
Collapse
Affiliation(s)
- Zhenling Peng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,Frontier Science Center for Nonlinear Expectations, Ministry of Education, Qingdao, 266237, China
| | - Zixia Li
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Qiaozhen Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
27
|
Sun B, Kekenes-Huskey PM. Myofilament-associated proteins with intrinsic disorder (MAPIDs) and their resolution by computational modeling. Q Rev Biophys 2023; 56:e2. [PMID: 36628457 PMCID: PMC11070111 DOI: 10.1017/s003358352300001x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The cardiac sarcomere is a cellular structure in the heart that enables muscle cells to contract. Dozens of proteins belong to the cardiac sarcomere, which work in tandem to generate force and adapt to demands on cardiac output. Intriguingly, the majority of these proteins have significant intrinsic disorder that contributes to their functions, yet the biophysics of these intrinsically disordered regions (IDRs) have been characterized in limited detail. In this review, we first enumerate these myofilament-associated proteins with intrinsic disorder (MAPIDs) and recent biophysical studies to characterize their IDRs. We secondly summarize the biophysics governing IDR properties and the state-of-the-art in computational tools toward MAPID identification and characterization of their conformation ensembles. We conclude with an overview of future computational approaches toward broadening the understanding of intrinsic disorder in the cardiac sarcomere.
Collapse
Affiliation(s)
- Bin Sun
- Research Center for Pharmacoinformatics (The State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Department of Medicinal Chemistry and Natural Medicine Chemistry, College of Pharmacy, Harbin Medical University, Harbin 150081, China
| | | |
Collapse
|
28
|
Pedersen KB, Flores-Canales JC, Schiøtt B. Predicting molecular properties of α-synuclein using force fields for intrinsically disordered proteins. Proteins 2023; 91:47-61. [PMID: 35950933 PMCID: PMC10087257 DOI: 10.1002/prot.26409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 06/17/2022] [Accepted: 07/12/2022] [Indexed: 12/29/2022]
Abstract
Independent force field validation is an essential practice to keep track of developments and for performing meaningful Molecular Dynamics simulations. In this work, atomistic force fields for intrinsically disordered proteins (IDP) are tested by simulating the archetypical IDP α-synuclein in solution for 2.5 μs. Four combinations of protein and water force fields were tested: ff19SB/OPC, ff19SB/TIP4P-D, ff03CMAP/TIP4P-D, and a99SB-disp/TIP4P-disp, with four independent repeat simulations for each combination. We compare our simulations to the results of a 73 μs simulation using the a99SB-disp/TIP4P-disp combination, provided by D. E. Shaw Research. From the trajectories, we predict a range of experimental observations of α-synuclein and compare them to literature data. This includes protein radius of gyration and hydration, intramolecular distances, NMR chemical shifts, and 3 J-couplings. Both ff19SB/TIP4P-D and a99SB-disp/TIP4P-disp produce extended conformational ensembles of α-synuclein that agree well with experimental radius of gyration and intramolecular distances while a99SB-disp/TIP4P-disp reproduces a balanced α-synuclein secondary structure content. It was found that ff19SB/OPC and ff03CMAP/TIP4P-D produce overly compact conformational ensembles and show discrepancies in the secondary structure content compared to the experimental data.
Collapse
Affiliation(s)
| | | | - Birgit Schiøtt
- Department of Chemistry, Aarhus University, Aarhus C, Denmark.,Interdisciplinary Nanoscience Center, Aarhus University, Aarhus C, Denmark
| |
Collapse
|
29
|
Intrinsically Disordered Proteins: An Overview. Int J Mol Sci 2022; 23:ijms232214050. [PMID: 36430530 PMCID: PMC9693201 DOI: 10.3390/ijms232214050] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Many proteins and protein segments cannot attain a single stable three-dimensional structure under physiological conditions; instead, they adopt multiple interconverting conformational states. Such intrinsically disordered proteins or protein segments are highly abundant across proteomes, and are involved in various effector functions. This review focuses on different aspects of disordered proteins and disordered protein regions, which form the basis of the so-called "Disorder-function paradigm" of proteins. Additionally, various experimental approaches and computational tools used for characterizing disordered regions in proteins are discussed. Finally, the role of disordered proteins in diseases and their utility as potential drug targets are explored.
Collapse
|
30
|
Fang M, He Y, Du Z, Uversky VN. DeepCLD: An Efficient Sequence-Based Predictor of Intrinsically Disordered Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3154-3159. [PMID: 34727037 DOI: 10.1109/tcbb.2021.3124273] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Intrinsic disorder is common in proteins, plays important roles in protein functionality, and is commonly associated with various human diseases. To have an accurate tool for the annotation of intrinsic disorder in proteins, this paper proposes a novel algorithm, DeepCLD, for sequence-based prediction of intrinsically disordered proteins. This algorithm uses amino acid position specific scoring matrix (PSSM) to capture the intrinsic variability characteristic of sequence patterns, ResNet to preserve feature space structure, and bidirectional CudnnLSTM as recurrent layer to further improve the efficiency. Futhermore, DeepCLD also utilized the attention mechanism to solve the problem of gradient disappearing in deep network. Comparative analyses show that DeepCLD has faster training speed and higher prediction accuracy than comparable methods.
Collapse
|
31
|
Compositional Bias of Intrinsically Disordered Proteins and Regions and Their Predictions. Biomolecules 2022; 12:biom12070888. [PMID: 35883444 PMCID: PMC9313023 DOI: 10.3390/biom12070888] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 11/17/2022] Open
Abstract
Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.
Collapse
|
32
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
33
|
Micsonai A, Moussong É, Murvai N, Tantos Á, Tőke O, Réfrégiers M, Wien F, Kardos J. Disordered–Ordered Protein Binary Classification by Circular Dichroism Spectroscopy. Front Mol Biosci 2022; 9:863141. [PMID: 35591946 PMCID: PMC9110821 DOI: 10.3389/fmolb.2022.863141] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/24/2022] [Indexed: 12/31/2022] Open
Abstract
Intrinsically disordered proteins lack a stable tertiary structure and form dynamic conformational ensembles due to their characteristic physicochemical properties and amino acid composition. They are abundant in nature and responsible for a large variety of cellular functions. While numerous bioinformatics tools have been developed for in silico disorder prediction in the last decades, there is a need for experimental methods to verify the disordered state. CD spectroscopy is widely used for protein secondary structure analysis. It is usable in a wide concentration range under various buffer conditions. Even without providing high-resolution information, it is especially useful when NMR, X-ray, or other techniques are problematic or one simply needs a fast technique to verify the structure of proteins. Here, we propose an automatized binary disorder–order classification method by analyzing far-UV CD spectroscopy data. The method needs CD data at only three wavelength points, making high-throughput data collection possible. The mathematical analysis applies the k-nearest neighbor algorithm with cosine distance function, which is independent of the spectral amplitude and thus free of concentration determination errors. Moreover, the method can be used even for strong absorbing samples, such as the case of crowded environmental conditions, if the spectrum can be recorded down to the wavelength of 212 nm. We believe the classification method will be useful in identifying disorder and will also facilitate the growth of experimental data in IDP databases. The method is implemented on a webserver and freely available for academic users.
Collapse
Affiliation(s)
- András Micsonai
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Éva Moussong
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Nikoletta Murvai
- Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Orsolya Tőke
- Laboratory for NMR Spectroscopy, Research Centre for Natural Sciences, Budapest, Hungary
| | - Matthieu Réfrégiers
- Synchrotron SOLEIL, Gif-sur-Yvette, France
- Centre de Biophysique Moléculaire, CNRS UPR4301, Orléans, France
| | - Frank Wien
- Synchrotron SOLEIL, Gif-sur-Yvette, France
| | - József Kardos
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary
- *Correspondence: József Kardos,
| |
Collapse
|
34
|
AlphaFold2: A Role for Disordered Protein/Region Prediction? Int J Mol Sci 2022; 23:ijms23094591. [PMID: 35562983 PMCID: PMC9104326 DOI: 10.3390/ijms23094591] [Citation(s) in RCA: 57] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 01/27/2023] Open
Abstract
The development of AlphaFold2 marked a paradigm-shift in the structural biology community. Herein, we assess the ability of AlphaFold2 to predict disordered regions against traditional sequence-based disorder predictors. We find that AlphaFold2 performs well at discriminating disordered regions, but also note that the disorder predictor one constructs from an AlphaFold2 structure determines accuracy. In particular, a naïve, but non-trivial assumption that residues assigned to helices, strands, and H-bond stabilized turns are likely ordered and all other residues are disordered results in a dramatic overestimation in disorder; conversely, the predicted local distance difference test (pLDDT) provides an excellent measure of residue-wise disorder. Furthermore, by employing molecular dynamics (MD) simulations, we note an interesting relationship between the pLDDT and secondary structure, that may explain our observations and suggests a broader application of the pLDDT for characterizing the local dynamics of intrinsically disordered proteins and regions (IDPs/IDRs).
Collapse
|
35
|
McFadden WM, Yanowitz JL. idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R. PLoS One 2022; 17:e0266929. [PMID: 35436286 PMCID: PMC9015136 DOI: 10.1371/journal.pone.0266929] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 03/29/2022] [Indexed: 12/23/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are proteins or protein-domains that do not have a single native structure, rather, they are a class of flexible peptides that can rapidly adopt multiple conformations. IDPs are quite abundant, and their dynamic characteristics provide unique advantages for various biological processes. The field of “unstructured biology” has emerged, in part, because of numerous computational studies that had identified the unique characteristics of IDPs and IDRs. The package ‘idpr’, short for Intrinsically Disordered Proteins in R, implements several R functions that match the established characteristics of IDPs to protein sequences of interest. This includes calculations of residue composition, charge-hydropathy relationships, and predictions of intrinsic disorder. Additionally, idpr integrates several amino acid substitution matrices and calculators to supplement IDP-based workflows. Overall, idpr aims to integrate tools for the computational analysis of IDPs within R, facilitating the analysis of these important, yet under-characterized, proteins. The idpr package can be downloaded from Bioconductor (https://bioconductor.org/packages/idpr/).
Collapse
Affiliation(s)
| | - Judith L. Yanowitz
- Magee-Womens Research Institute, Pittsburgh, PA, United States of America
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States of America
- * E-mail:
| |
Collapse
|
36
|
Pei H, Guo W, Peng Y, Xiong H, Chen Y. Targeting key proteins involved in transcriptional regulation for cancer therapy: Current strategies and future prospective. Med Res Rev 2022; 42:1607-1660. [PMID: 35312190 DOI: 10.1002/med.21886] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/10/2022] [Accepted: 02/22/2022] [Indexed: 12/14/2022]
Abstract
The key proteins involved in transcriptional regulation play convergent roles in cellular homeostasis, and their dysfunction mediates aberrant gene expressions that underline the hallmarks of tumorigenesis. As tumor progression is dependent on such abnormal regulation of transcription, it is important to discover novel chemical entities as antitumor drugs that target key tumor-associated proteins involved in transcriptional regulation. Despite most key proteins (especially transcription factors) involved in transcriptional regulation are historically recognized as undruggable targets, multiple targeting approaches at diverse levels of transcriptional regulation, such as epigenetic intervention, inhibition of DNA-binding of transcriptional factors, and inhibition of the protein-protein interactions (PPIs), have been established in preclinically or clinically studies. In addition, several new approaches have recently been described, such as targeting proteasomal degradation and eliciting synthetic lethality. This review will emphasize on accentuating these developing therapeutic approaches and provide a thorough conspectus of the drug development to target key proteins involved in transcriptional regulation and their impact on future oncotherapy.
Collapse
Affiliation(s)
- Haixiang Pei
- Institute for Advanced Study, Shenzhen University and Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen, China.,Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Weikai Guo
- Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China.,Joint National Laboratory for Antibody Drug Engineering, School of Basic Medical Science, Henan University, Kaifeng, China
| | - Yangrui Peng
- Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Hai Xiong
- Institute for Advanced Study, Shenzhen University and Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen, China
| | - Yihua Chen
- Shanghai Key Laboratory of Regulatory Biology, The Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
37
|
Zhao B, Kurgan L. Deep Learning in Prediction of Intrinsic Disorder in Proteins. Comput Struct Biotechnol J 2022; 20:1286-1294. [PMID: 35356546 PMCID: PMC8927795 DOI: 10.1016/j.csbj.2022.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/12/2022] Open
|
38
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
39
|
Bondos SE, Dunker AK, Uversky VN. Intrinsically disordered proteins play diverse roles in cell signaling. Cell Commun Signal 2022; 20:20. [PMID: 35177069 PMCID: PMC8851865 DOI: 10.1186/s12964-022-00821-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/11/2021] [Indexed: 11/29/2022] Open
Abstract
Abstract Signaling pathways allow cells to detect and respond to a wide variety of chemical (e.g. Ca2+ or chemokine proteins) and physical stimuli (e.g., sheer stress, light). Together, these pathways form an extensive communication network that regulates basic cell activities and coordinates the function of multiple cells or tissues. The process of cell signaling imposes many demands on the proteins that comprise these pathways, including the abilities to form active and inactive states, and to engage in multiple protein interactions. Furthermore, successful signaling often requires amplifying the signal, regulating or tuning the response to the signal, combining information sourced from multiple pathways, all while ensuring fidelity of the process. This sensitivity, adaptability, and tunability are possible, in part, due to the inclusion of intrinsically disordered regions in many proteins involved in cell signaling. The goal of this collection is to highlight the many roles of intrinsic disorder in cell signaling. Following an overview of resources that can be used to study intrinsically disordered proteins, this review highlights the critical role of intrinsically disordered proteins for signaling in widely diverse organisms (animals, plants, bacteria, fungi), in every category of cell signaling pathway (autocrine, juxtacrine, intracrine, paracrine, and endocrine) and at each stage (ligand, receptor, transducer, effector, terminator) in the cell signaling process. Thus, a cell signaling pathway cannot be fully described without understanding how intrinsically disordered protein regions contribute to its function. The ubiquitous presence of intrinsic disorder in different stages of diverse cell signaling pathways suggest that more mechanisms by which disorder modulates intra- and inter-cell signals remain to be discovered. Graphical abstract ![]()
Collapse
Affiliation(s)
- Sarah E Bondos
- Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX, 77843, USA.
| | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA.,Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Moscow Region, Russia, 142290
| |
Collapse
|
40
|
Computational Prediction of Intrinsically Disordered Proteins Based on Protein Sequences and Convolutional Neural Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:4455604. [PMID: 34992646 PMCID: PMC8727116 DOI: 10.1155/2021/4455604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 12/08/2021] [Indexed: 11/17/2022]
Abstract
Intrinsically disordered proteins (IDPs) possess at least one region that lacks a single stable structure in vivo, which makes them play an important role in a variety of biological functions. We propose a prediction method for IDPs based on convolutional neural networks (CNNs) and feature selection. The combination of sequence and evolutionary properties is used to describe the differences between disordered and ordered regions. Especially, to highlight the correlation between the target residue and adjacent residues, multiple windows are selected to preprocess the protein sequence through the selected properties. The shorter windows reflect the characteristics of the central residue, and the longer windows reflect the characteristics of the surroundings around the central residue. Moreover, to highlight the specificity of sequence and evolutionary properties, they are preprocessed, respectively. After that, the preprocessed properties are combined into feature matrices as the input of the constructed CNN. Our method is training as well as testing based on the DisProt database. The simulation results show that the proposed method can predict IDPs effectively, and the performance is competitive in comparison with IsUnstruct and ESpritz.
Collapse
|
41
|
Tamburrini KC, Pesce G, Nilsson J, Gondelaud F, Kajava AV, Berrin JG, Longhi S. Predicting Protein Conformational Disorder and Disordered Binding Sites. Methods Mol Biol 2022; 2449:95-147. [PMID: 35507260 DOI: 10.1007/978-1-0716-2095-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In the last two decades it has become increasingly evident that a large number of proteins adopt either a fully or a partially disordered conformation. Intrinsically disordered proteins are ubiquitous proteins that fulfill essential biological functions while lacking a stable 3D structure. Their conformational heterogeneity is encoded by the amino acid sequence, thereby allowing intrinsically disordered proteins or regions to be recognized based on their sequence properties. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting protein disorder and identifying intrinsically disordered binding sites.
Collapse
Affiliation(s)
- Ketty C Tamburrini
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Giulia Pesce
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Juliet Nilsson
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Frank Gondelaud
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Université Montpellier, Montpellier, France
| | - Jean-Guy Berrin
- INRAE, Aix Marseille Univ, Biodiversité et Biotechnologie Fongiques (BBF), UMR 1163, Marseille, France
| | - Sonia Longhi
- Aix Marseille Univ, CNRS, Architecture et Fonction des Macromolécules Biologiques, AFMB, UMR 7257, Marseille, France.
| |
Collapse
|
42
|
Abstract
INTRODUCTION Intrinsic disorder prediction field develops, assesses, and deploys computational predictors of disorder in protein sequences and constructs and disseminates databases of these predictions. Over 40 years of research resulted in the release of numerous resources. AREAS COVERED We identify and briefly summarize the most comprehensive to date collection of over 100 disorder predictors. We focus on their predictive models, availability and predictive performance. We categorize and study them from a historical point of view to highlight informative trends. EXPERT OPINION We find a consistent trend of improvements in predictive quality as newer and more advanced predictors are developed. The original focus on machine learning methods has shifted to meta-predictors in early 2010s, followed by a recent transition to deep learning. The use of deep learners will continue in foreseeable future given recent and convincing success of these methods. Moreover, a broad range of resources that facilitate convenient collection of accurate disorder predictions is available to users. They include web servers and standalone programs for disorder prediction, servers that combine prediction of disorder and disorder functions, and large databases of pre-computed predictions. We also point to the need to address the shortage of accurate methods that predict disordered binding regions.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
43
|
Nijhawan AK, Chan AM, Hsu DJ, Chen LX, Kohlstedt KL. Resolving Dynamics in the Ensemble: Finding Paths through Intermediate States and Disordered Protein Structures. J Phys Chem B 2021; 125:12401-12412. [PMID: 34748336 PMCID: PMC9096987 DOI: 10.1021/acs.jpcb.1c05820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteins have been found to inhabit a diverse set of three-dimensional structures. The dynamics that govern protein interconversion between structures happen over a wide range of time scales─picoseconds to seconds. Our understanding of protein functions and dynamics is largely reliant upon our ability to elucidate physically populated structures. From an experimental structural characterization perspective, we are often limited to measuring the ensemble-averaged structure both in the steady-state and time-resolved regimes. Generating kinetic models and understanding protein structure-function relationships require atomistic knowledge of the populated states in the ensemble. In this Perspective, we present ensemble refinement methodologies that integrate time-resolved experimental signals with molecular dynamics models. We first discuss integration of experimental structural restraints to molecular models in disordered protein systems that adhere to the principle of maximum entropy for creating a complete set of ensemble structures. We then propose strategies to find kinetic pathways between the refined structures, using time-resolved inputs to guide molecular dynamics trajectories and the use of inference to generate tailored stimuli to prepare a desired ensemble of protein states.
Collapse
Affiliation(s)
- Adam K Nijhawan
- Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| | - Arnold M Chan
- Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| | - Darren J Hsu
- Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| | - Lin X Chen
- Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States
- Chemical Sciences and Engineering Division, Argonne National Laboratory, Argonne, Illinois 60439, United States
| | - Kevin L Kohlstedt
- Department of Chemistry, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| |
Collapse
|
44
|
Structural and thermodynamical insights into the binding and inhibition of FIH-1 by the N-terminal disordered region of Mint3. J Biol Chem 2021; 297:101304. [PMID: 34655613 PMCID: PMC8571082 DOI: 10.1016/j.jbc.2021.101304] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 10/09/2021] [Accepted: 10/12/2021] [Indexed: 11/29/2022] Open
Abstract
Mint3 is known to enhance aerobic ATP production, known as the Warburg effect, by binding to FIH-1. Since this effect is considered to be beneficial for cancer cells, the interaction is a promising target for cancer therapy. However, previous research has suggested that the interacting region of Mint3 with FIH-1 is intrinsically disordered, which makes investigation of this interaction challenging. Therefore, we adopted thermodynamic and structural studies in solution to clarify the structural and thermodynamical changes of Mint3 binding to FIH-1. First, using a combination of circular dichroism, nuclear magnetic resonance, and hydrogen/deuterium exchange–mass spectrometry (HDX-MS), we confirmed that the N-terminal half, which is the interacting part of Mint3, is mostly disordered. Next, we revealed a large enthalpy and entropy change in the interaction of Mint3 using isothermal titration calorimetry (ITC). The profile is consistent with the model that the flexibility of disordered Mint3 is drastically reduced upon binding to FIH-1. Moreover, we performed a series of ITC experiments with several types of truncated Mint3s, an effective approach since the interacting part of Mint3 is disordered, and identified amino acids 78 to 88 as a novel core site for binding to FIH-1. The truncation study of Mint3 also revealed the thermodynamic contribution of each part of Mint3 to the interaction with FIH-1, where the core sites contribute to the affinity (ΔG), while other sites only affect enthalpy (ΔH), by forming noncovalent bonds. This insight can serve as a foothold for further investigation of intrinsically disordered regions (IDRs) and drug development for cancer therapy.
Collapse
|
45
|
Ruff KM, Pappu RV. AlphaFold and Implications for Intrinsically Disordered Proteins. J Mol Biol 2021; 433:167208. [PMID: 34418423 DOI: 10.1016/j.jmb.2021.167208] [Citation(s) in RCA: 226] [Impact Index Per Article: 75.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 08/11/2021] [Accepted: 08/12/2021] [Indexed: 10/20/2022]
Abstract
Accurate predictions of the three-dimensional structures of proteins from their amino acid sequences have come of age. AlphaFold, a deep learning-based approach to protein structure prediction, shows remarkable success in independent assessments of prediction accuracy. A significant epoch in structural bioinformatics was the structural annotation of over 98% of protein sequences in the human proteome. Interestingly, many predictions feature regions of very low confidence, and these regions largely overlap with intrinsically disordered regions (IDRs). That over 30% of regions within the proteome are disordered is congruent with estimates that have been made over the past two decades, as intense efforts have been undertaken to generalize the structure-function paradigm to include the importance of conformational heterogeneity and dynamics. With structural annotations from AlphaFold in hand, there is the temptation to draw inferences regarding the "structures" of IDRs and their interactomes. Here, we offer a cautionary note regarding the misinterpretations that might ensue and highlight efforts that provide concrete understanding of sequence-ensemble-function relationships of IDRs. This perspective is intended to emphasize the importance of IDRs in sequence-function relationships (SERs) and to highlight how one might go about extracting quantitative SERs to make sense of how IDRs function.
Collapse
Affiliation(s)
- Kiersten M Ruff
- Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, Campus Box 1097, St. Louis, MO 63130, USA
| | - Rohit V Pappu
- Department of Biomedical Engineering and Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, Campus Box 1097, St. Louis, MO 63130, USA.
| |
Collapse
|
46
|
Yan K, Wen J, Liu JX, Xu Y, Liu B. Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity Scores. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2008-2016. [PMID: 31940548 DOI: 10.1109/tcbb.2020.2966450] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein fold recognition is one of the most essential steps for protein structure prediction, aiming to classify proteins into known protein folds. There are two main computational approaches: one is the template-based method based on the alignment scores between query-template protein pairs and the other is the machine learning method based on the feature representation and classifier. These two approaches have their own advantages and disadvantages. Can we combine these methods to establish more accurate predictors for protein fold recognition? In this study, we made an initial attempt and proposed two novel algorithms: TSVM-fold and ESVM-fold. TSVM-fold was based on the Support Vector Machines (SVMs), which utilizes a set of pairwise sequence similarity scores generated by three complementary template-based methods, including HHblits, SPARKS-X, and DeepFR. These scores measured the global relationships between query sequences and templates. The comprehensive features of the attributes of the sequences were fed into the SVMs for the prediction. Then the TSVM-fold was further combined with the HHblits algorithm so as to improve its generalization ability. The combined method is called ESVM-fold. Experimental results in two rigorous benchmark datasets (LE and YK datasets) showed that the proposed methods outperform some state-of-the-art methods, indicating that the TSVM-fold and ESVM-fold are efficient predictors for protein fold recognition.
Collapse
|
47
|
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun 2021; 12:4438. [PMID: 34290238 PMCID: PMC8295265 DOI: 10.1038/s41467-021-24773-7] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 01/05/2023] Open
Abstract
Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn's webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/.
Collapse
Affiliation(s)
- Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
48
|
Erdős G, Pajkos M, Dosztányi Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res 2021; 49:W297-W303. [PMID: 34048569 PMCID: PMC8262696 DOI: 10.1093/nar/gkab408] [Citation(s) in RCA: 223] [Impact Index Per Article: 74.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/21/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) exist without a single well-defined conformation. They carry out important biological functions with multifaceted roles which is also reflected in their evolutionary behavior. Computational methods play important roles in the characterization of IDRs. One of the commonly used disorder prediction methods is IUPred, which relies on an energy estimation approach. The IUPred web server takes an amino acid sequence or a Uniprot ID/accession as an input and predicts the tendency for each amino acid to be in a disordered region with an option to also predict context-dependent disordered regions. In this new iteration of IUPred, we added multiple novel features to enhance the prediction capabilities of the server. First, learning from the latest evaluation of disorder prediction methods we introduced multiple new smoothing functions to the prediction that decreases noise and increases the performance of the predictions. We constructed a dataset consisting of experimentally verified ordered/disordered regions with unambiguous annotations which were added to the prediction. We also introduced a novel tool that enables the exploration of the evolutionary conservation of protein disorder coupled to sequence conservation in model organisms. The web server is freely available to users and accessible at https://iupred3.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| |
Collapse
|
49
|
Clerc I, Sagar A, Barducci A, Sibille N, Bernadó P, Cortés J. The diversity of molecular interactions involving intrinsically disordered proteins: A molecular modeling perspective. Comput Struct Biotechnol J 2021; 19:3817-3828. [PMID: 34285781 PMCID: PMC8273358 DOI: 10.1016/j.csbj.2021.06.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 01/15/2023] Open
Abstract
Intrinsically Disordered Proteins and Regions (IDPs/IDRs) are key components of a multitude of biological processes. Conformational malleability enables IDPs/IDRs to perform very specialized functions that cannot be accomplished by globular proteins. The functional role for most of these proteins is related to the recognition of other biomolecules to regulate biological processes or as a part of signaling pathways. Depending on the extent of disorder, the number of interacting sites and the type of partner, very different architectures for the resulting assemblies are possible. More recently, molecular condensates with liquid-like properties composed of multiple copies of IDPs and nucleic acids have been proven to regulate key processes in eukaryotic cells. The structural and kinetic details of disordered biomolecular complexes are difficult to unveil experimentally due to their inherent conformational heterogeneity. Computational approaches, alone or in combination with experimental data, have emerged as unavoidable tools to understand the functional mechanisms of this elusive type of assemblies. The level of description used, all-atom or coarse-grained, strongly depends on the size of the molecular systems and on the timescale of the investigated mechanism. In this mini-review, we describe the most relevant architectures found for molecular interactions involving IDPs/IDRs and the computational strategies applied for their investigation.
Collapse
Affiliation(s)
- Ilinka Clerc
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Amin Sagar
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Alessandro Barducci
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
50
|
Nomoto A, Nishinami S, Shiraki K. Solubility Parameters of Amino Acids on Liquid-Liquid Phase Separation and Aggregation of Proteins. Front Cell Dev Biol 2021; 9:691052. [PMID: 34222258 PMCID: PMC8242209 DOI: 10.3389/fcell.2021.691052] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 05/20/2021] [Indexed: 11/21/2022] Open
Abstract
The solution properties of amino acids determine the folding, aggregation, and liquid–liquid phase separation (LLPS) behaviors of proteins. Various indices of amino acids, such as solubility, hydropathy, and conformational parameter, describe the behaviors of protein folding and solubility both in vitro and in vivo. However, understanding the propensity of LLPS and aggregation is difficult due to the multiple interactions among different amino acids. Here, the solubilities of aromatic amino acids (SAs) were investigated in solution containing 20 types of amino acids as amino acid solvents. The parameters of SAs in amino acid solvents (PSASs) were varied and dependent on the type of the solvent. Specifically, Tyr and Trp had the highest positive values while Glu and Asp had the lowest. The PSAS values represent soluble and insoluble interactions, which collectively are the driving force underlying the formation of droplets and aggregates. Interestingly, the PSAS of a soluble solvent reflected the affinity between amino acids and aromatic rings, while that of an insoluble solvent reflected the affinity between amino acids and water. These findings suggest that the PSAS can distinguish amino acids that contribute to droplet and aggregate formation, and provide a deeper understanding of LLPS and aggregation of proteins.
Collapse
Affiliation(s)
- Akira Nomoto
- Pure and Applied Sciences, University of Tsukuba, Tsukuba, Japan
| | - Suguru Nishinami
- Pure and Applied Sciences, University of Tsukuba, Tsukuba, Japan
| | - Kentaro Shiraki
- Pure and Applied Sciences, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|