1
|
El-Maradny YA, Badawy MA, Mohamed KI, Ragab RF, Moharm HM, Abdallah NA, Elgammal EM, Rubio-Casillas A, Uversky VN, Redwan EM. Unraveling the role of the nucleocapsid protein in SARS-CoV-2 pathogenesis: From viral life cycle to vaccine development. Int J Biol Macromol 2024; 279:135201. [PMID: 39216563 DOI: 10.1016/j.ijbiomac.2024.135201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/24/2024] [Accepted: 08/28/2024] [Indexed: 09/04/2024]
Abstract
BACKGROUND The nucleocapsid protein (N protein) is the most abundant protein in SARS-CoV-2. Viral RNA and this protein are bound by electrostatic forces, forming cytoplasmic helical structures known as nucleocapsids. Subsequently, these nucleocapsids interact with the membrane (M) protein, facilitating virus budding into early secretory compartments. SCOPE OF REVIEW Exploring the role of the N protein in the SARS-CoV-2 life cycle, pathogenesis, post-sequelae consequences, and interaction with host immunity has enhanced our understanding of its function and potential strategies for preventing SARS-CoV-2 infection. MAJOR CONCLUSION This review provides an overview of the N protein's involvement in SARS-CoV-2 infectivity, highlighting its crucial role in the virus-host protein interaction and immune system modulation, which in turn influences viral spread. GENERAL SIGNIFICANCE Understanding these aspects identifies the N protein as a promising target for developing effective antiviral treatments and vaccines against SARS-CoV-2.
Collapse
Affiliation(s)
- Yousra A El-Maradny
- Pharmaceutical and Fermentation Industries Development Center, City of Scientific Research and Technological Applications (SRTA-City), New Borg EL-Arab, Alexandria 21934, Egypt; Microbiology and Immunology, Faculty of Pharmacy, Arab Academy for Science, Technology and Maritime Transport (AASTMT), El Alamein 51718, Egypt.
| | - Moustafa A Badawy
- Industrial Microbiology and Applied Chemistry program, Faculty of Science, Alexandria University, Egypt.
| | - Kareem I Mohamed
- Microbiology and Immunology, Faculty of Pharmacy, Arab Academy for Science, Technology and Maritime Transport (AASTMT), El Alamein 51718, Egypt.
| | - Renad F Ragab
- Microbiology and Immunology, Faculty of Pharmacy, Arab Academy for Science, Technology and Maritime Transport (AASTMT), El Alamein 51718, Egypt.
| | - Hamssa M Moharm
- Genetics, Biotechnology Department, Faculty of Agriculture, Alexandria University, Egypt.
| | - Nada A Abdallah
- Medicinal Plants Department, Faculty of Agriculture, Alexandria University, Egypt.
| | - Esraa M Elgammal
- Microbiology and Immunology, Faculty of Pharmacy, Arab Academy for Science, Technology and Maritime Transport (AASTMT), El Alamein 51718, Egypt.
| | - Alberto Rubio-Casillas
- Autlan Regional Hospital, Health Secretariat, Autlan, JAL 48900, Mexico; Biology Laboratory, Autlan Regional Preparatory School, University of Guadalajara, Autlan, JAL 48900, Mexico.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| | - Elrashdy M Redwan
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia; Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Therapeutic and Protective Proteins Laboratory, Protein Research Department, Genetic Engineering and Biotechnology Research Institute, City of Scientific Research and Technological Applications (SRTA-City), New Borg EL-Arab, 21934 Alexandria, Egypt.
| |
Collapse
|
2
|
Ghosh D, Biswas A, Radhakrishna M. Advanced computational approaches to understand protein aggregation. BIOPHYSICS REVIEWS 2024; 5:021302. [PMID: 38681860 PMCID: PMC11045254 DOI: 10.1063/5.0180691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024]
Abstract
Protein aggregation is a widespread phenomenon implicated in debilitating diseases like Alzheimer's, Parkinson's, and cataracts, presenting complex hurdles for the field of molecular biology. In this review, we explore the evolving realm of computational methods and bioinformatics tools that have revolutionized our comprehension of protein aggregation. Beginning with a discussion of the multifaceted challenges associated with understanding this process and emphasizing the critical need for precise predictive tools, we highlight how computational techniques have become indispensable for understanding protein aggregation. We focus on molecular simulations, notably molecular dynamics (MD) simulations, spanning from atomistic to coarse-grained levels, which have emerged as pivotal tools in unraveling the complex dynamics governing protein aggregation in diseases such as cataracts, Alzheimer's, and Parkinson's. MD simulations provide microscopic insights into protein interactions and the subtleties of aggregation pathways, with advanced techniques like replica exchange molecular dynamics, Metadynamics (MetaD), and umbrella sampling enhancing our understanding by probing intricate energy landscapes and transition states. We delve into specific applications of MD simulations, elucidating the chaperone mechanism underlying cataract formation using Markov state modeling and the intricate pathways and interactions driving the toxic aggregate formation in Alzheimer's and Parkinson's disease. Transitioning we highlight how computational techniques, including bioinformatics, sequence analysis, structural data, machine learning algorithms, and artificial intelligence have become indispensable for predicting protein aggregation propensity and locating aggregation-prone regions within protein sequences. Throughout our exploration, we underscore the symbiotic relationship between computational approaches and empirical data, which has paved the way for potential therapeutic strategies against protein aggregation-related diseases. In conclusion, this review offers a comprehensive overview of advanced computational methodologies and bioinformatics tools that have catalyzed breakthroughs in unraveling the molecular basis of protein aggregation, with significant implications for clinical interventions, standing at the intersection of computational biology and experimental research.
Collapse
Affiliation(s)
- Deepshikha Ghosh
- Department of Biological Sciences and Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | - Anushka Biswas
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gujarat 382355, India
| | | |
Collapse
|
3
|
Chao TH, Rekhi S, Mittal J, Tabor DP. Data-Driven Models for Predicting Intrinsically Disordered Protein Polymer Physics Directly from Composition or Sequence. MOLECULAR SYSTEMS DESIGN & ENGINEERING 2023; 8:1146-1155. [PMID: 38222029 PMCID: PMC10786636 DOI: 10.1039/d3me00053b] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The molecular-level understanding of intrinsically disordered proteins is challenging due to experimental characterization difficulties. Computational understanding of IDPs also requires fundamental advances, as the leading tools for predicting protein folding (e.g., AlphaFold), typically fail to describe the structural ensembles of IDPs. The focus of this paper is to 1) develop new representations for intrinsically disordered proteins and 2) pair these representations with classical machine learning and deep learning models to predict the radius of gyration and derived scaling exponent of IDPs. Here, we build a new physically-motivated feature called the bag of amino acid interactions representation, which encodes pairwise interactions explicitly into the representation. This feature essentially counts and weights all possible non-bonded interactions in a sequence and thus is, in principle, compatible with arbitrary sequence lengths. To see how well this new feature performs, both categorical and physically-motivated featurization techniques are tested on a computational dataset containing 10,000 sequences simulated at the coarse-grained level. The results indicate that this new feature outperforms the other purely categorical and physically-motivated features and possesses solid extrapolation capabilities. For future use, this feature can potentially provide physical insights into amino acid interactions, including their temperature dependence, and be applied to other protein spaces.
Collapse
Affiliation(s)
- Tzu-Hsuan Chao
- Department of Chemistry, Texas A&M University, PO Box 30012, College Station, TX 77842-3012, USA
| | - Shiv Rekhi
- Department of Chemistry, Texas A&M University, PO Box 30012, College Station, TX 77842-3012, USA
| | - Jeetain Mittal
- Department of Chemistry, Texas A&M University, PO Box 30012, College Station, TX 77842-3012, USA
| | - Daniel P Tabor
- Department of Chemistry, Texas A&M University, PO Box 30012, College Station, TX 77842-3012, USA
| |
Collapse
|
4
|
Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023; 21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
One of the key features of intrinsically disordered regions (IDRs) is their ability to interact with a broad range of partner molecules. Multiple types of interacting IDRs were identified including molecular recognition fragments (MoRFs), short linear sequence motifs (SLiMs), and protein-, nucleic acids- and lipid-binding regions. Prediction of binding IDRs in protein sequences is gaining momentum in recent years. We survey 38 predictors of binding IDRs that target interactions with a diverse set of partners, such as peptides, proteins, RNA, DNA and lipids. We offer a historical perspective and highlight key events that fueled efforts to develop these methods. These tools rely on a diverse range of predictive architectures that include scoring functions, regular expressions, traditional and deep machine learning and meta-models. Recent efforts focus on the development of deep neural network-based architectures and extending coverage to RNA, DNA and lipid-binding IDRs. We analyze availability of these methods and show that providing implementations and webservers results in much higher rates of citations/use. We also make several recommendations to take advantage of modern deep network architectures, develop tools that bundle predictions of multiple and different types of binding IDRs, and work on algorithms that model structures of the resulting complexes.
Collapse
|
5
|
Khan A, Singh A, Singh P, Kumar R, Ojha KK, Singh VK, Srivastava A. LCN2-Fungal siderophore-iron binding and uptake leads to oxidative stress and cell death in hepatocellular carcinoma cell line HepG2. J Biomol Struct Dyn 2023; 41:12714-12733. [PMID: 36762696 DOI: 10.1080/07391102.2023.2175380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/05/2023] [Indexed: 02/11/2023]
Abstract
Microorganisms produce non-ribosomal peptides called siderophores for the purpose of iron acquisition. Mammalian immune system is well-known for producing small secretory proteins called lipocalins upon bacterial infection. These proteins sequester siderophores produced by invading bacterial pathogens rendering them unable to acquire iron from the host. However, this is not their sole function. In addition to transferrin and lactoferrin, lipocalins are also known to transport siderophore-bound iron to the host cells. While binding of bacterial siderophores with human lipocalin is well studied, binding of the fungal counterpart is still not confirmed and fully understood. Apart from pathogen-affected cells, developing cancerous cells also show varying expression level of different proteins including those involved in iron transport. The possibility of exogenous fungal siderophore-mediated iron transport via lipocalin and its receptor in mammalian cells has not yet been explored much. In present investigation we have checked differential expression of human lipocalin, LCN2 in hepatocellular carcinoma cell lines HepG2 as well as its normal counterpart WRL-68 and computationally determined the feasibility of LCN2 binding with fungal siderophore. Further in case of a stable complex being formed, whether this complex has the ability to transport iron through its specific receptor was assessed. Also, we have tried to explore possible mechanism of fungal-siderophore mediated oxidative stress leading to significant cell death in cancerous cells. This study will thus be useful towards finding a new way of treating hepatocellular carcinoma via inducing siderophore-mediated cell death in cancerous cells.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Azmi Khan
- Department of Life Science, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Ashutosh Singh
- Department of Life Science, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Pratika Singh
- Department of Life Science, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Rakesh Kumar
- Department of Bioinformatics, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Krishna Kumar Ojha
- Department of Bioinformatics, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Vijay Kumar Singh
- Department of Bioinformatics, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| | - Amrita Srivastava
- Department of Life Science, School of Earth, Biological and Environmental Sciences, Central University of South Bihar, Gaya, Bihar
| |
Collapse
|
6
|
Han B, Ren C, Wang W, Li J, Gong X. Computational Prediction of Protein Intrinsically Disordered Region Related Interactions and Functions. Genes (Basel) 2023; 14:432. [PMID: 36833360 PMCID: PMC9956190 DOI: 10.3390/genes14020432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/02/2023] [Accepted: 02/05/2023] [Indexed: 02/11/2023] Open
Abstract
Intrinsically Disordered Proteins (IDPs) and Regions (IDRs) exist widely. Although without well-defined structures, they participate in many important biological processes. In addition, they are also widely related to human diseases and have become potential targets in drug discovery. However, there is a big gap between the experimental annotations related to IDPs/IDRs and their actual number. In recent decades, the computational methods related to IDPs/IDRs have been developed vigorously, including predicting IDPs/IDRs, the binding modes of IDPs/IDRs, the binding sites of IDPs/IDRs, and the molecular functions of IDPs/IDRs according to different tasks. In view of the correlation between these predictors, we have reviewed these prediction methods uniformly for the first time, summarized their computational methods and predictive performance, and discussed some problems and perspectives.
Collapse
Affiliation(s)
- Bingqing Han
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Chongjiao Ren
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Wenda Wang
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Jiashan Li
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Xinqi Gong
- Mathematical Intelligence Application Lab, Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Beijing Academy of Intelligence, Beijing 100083, China
| |
Collapse
|
7
|
Chen R, Li X, Yang Y, Song X, Wang C, Qiao D. Prediction of protein-protein interaction sites in intrinsically disordered proteins. Front Mol Biosci 2022; 9:985022. [PMID: 36250006 PMCID: PMC9567019 DOI: 10.3389/fmolb.2022.985022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) participate in many biological processes by interacting with other proteins, including the regulation of transcription, translation, and the cell cycle. With the increasing amount of disorder sequence data available, it is thus crucial to identify the IDP binding sites for functional annotation of these proteins. Over the decades, many computational approaches have been developed to predict protein-protein binding sites of IDP (IDP-PPIS) based on protein sequence information. Moreover, there are new IDP-PPIS predictors developed every year with the rapid development of artificial intelligence. It is thus necessary to provide an up-to-date overview of these methods in this field. In this paper, we collected 30 representative predictors published recently and summarized the databases, features and algorithms. We described the procedure how the features were generated based on public data and used for the prediction of IDP-PPIS, along with the methods to generate the feature representations. All the predictors were divided into three categories: scoring functions, machine learning-based prediction, and consensus approaches. For each category, we described the details of algorithms and their performances. Hopefully, our manuscript will not only provide a full picture of the status quo of IDP binding prediction, but also a guide for selecting different methods. More importantly, it will shed light on the inspirations for future development trends and principles.
Collapse
Affiliation(s)
- Ranran Chen
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xinlu Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Yaqing Yang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xixi Song
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Dongdong Qiao
- Shandong Mental Health Center, Shandong University, Jinan, China
| |
Collapse
|
8
|
Caswell RC, Gunning AC, Owens MM, Ellard S, Wright CF. Assessing the clinical utility of protein structural analysis in genomic variant classification: experiences from a diagnostic laboratory. Genome Med 2022; 14:77. [PMID: 35869530 PMCID: PMC9308257 DOI: 10.1186/s13073-022-01082-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 07/04/2022] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The widespread clinical application of genome-wide sequencing has resulted in many new diagnoses for rare genetic conditions, but testing regularly identifies variants of uncertain significance (VUS). The remarkable rise in the amount of genomic data has been paralleled by a rise in the number of protein structures that are now publicly available, which may have clinical utility for the interpretation of missense and in-frame insertions or deletions. METHODS Within a UK National Health Service genomic medicine diagnostic laboratory, we investigated the number of VUS over a 5-year period that were evaluated using protein structural analysis and how often this analysis aided variant classification. RESULTS We found 99 novel missense and in-frame variants across 67 genes that were initially classified as VUS by our diagnostic laboratory using standard variant classification guidelines and for which further analysis of protein structure was requested. Evidence from protein structural analysis was used in the re-assessment of 64 variants, of which 47 were subsequently reclassified as pathogenic or likely pathogenic and 17 remained as VUS. We identified several case studies where protein structural analysis aided variant interpretation by predicting disease mechanisms that were consistent with the observed phenotypes, including loss-of-function through thermodynamic destabilisation or disruption of ligand binding, and gain-of-function through de-repression or escape from proteasomal degradation. CONCLUSIONS We have shown that using in silico protein structural analysis can aid classification of VUS and give insights into the mechanisms of pathogenicity. Based on our experience, we propose a generic evidence-based workflow for incorporating protein structural information into diagnostic practice to facilitate variant classification.
Collapse
Affiliation(s)
- Richard C Caswell
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK.
| | - Adam C Gunning
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK
- Institute of Biomedical and Clinical Science, University of Exeter School of Medicine, Exeter, EX2 5DW, UK
| | - Martina M Owens
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK
| | - Sian Ellard
- Exeter Genomics Laboratory, Royal Devon University Healthcare NHS Foundation Trust, Exeter, EX2 5DW, UK
- Institute of Biomedical and Clinical Science, University of Exeter School of Medicine, Exeter, EX2 5DW, UK
| | - Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter School of Medicine, Exeter, EX2 5DW, UK.
| |
Collapse
|
9
|
Garg A, Dabburu GR, Singhal N, Kumar M. Investigating the disordered regions (MoRFs, SLiMs and LCRs) and functions of mimicry proteins/peptides in silico. PLoS One 2022; 17:e0265657. [PMID: 35421114 PMCID: PMC9009644 DOI: 10.1371/journal.pone.0265657] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 03/04/2022] [Indexed: 11/24/2022] Open
Abstract
Microbial mimicry of the host proteins/peptides can elicit host auto-reactive T- or B-cells resulting in autoimmune disease(s). Since intrinsically disordered protein regions (IDPRs) are involved in several host cell signaling and PPI networks, molecular mimicry of the IDPRs can help the pathogens in substituting their own proteins in the host cell-signaling and PPI networks and, ultimately hijacking the host cellular machinery. Thus, the present study was conducted to discern the structural disorder and intrinsically disordered protein regions (IDPRs) like, molecular recognition features (MoRFs), short linear motifs (SLiMs), and low complexity regions (LCRs) in the experimentally verified mimicry proteins and peptides (mimitopes) of bacteria, viruses and host. Also, functional characteristics of the mimicry proteins were studied in silico. Our results indicated that 78% of the bacterial host mimicry proteins and 45% of the bacterial host mimitopes were moderately/highly disordered while, 73% of the viral host mimicry proteins and 31% of the viral host mimitopes were moderately/highly disordered. Among the pathogens, 27% of the bacterial mimicry proteins and 13% of the bacterial mimitopes were moderately/highly disordered while, 53% of the viral mimicry proteins and 21% of the viral mimitopes were moderately/highly disordered. Though IDPR were frequent in host, bacterial and viral mimicry proteins, only a few mimitopes overlapped with the IDPRs like, MoRFs, SLiMs and LCRs. This suggests that most of the microbes cannot use molecular mimicry to modulate the host PPIs and hijack the host cell machinery. Functional analyses indicated that most of the pathogens exhibited mimicry with the host proteins involved in ion binding and signaling pathways. This is the first report on the disordered regions and functional aspects of experimentally proven host and microbial mimicry proteins.
Collapse
Affiliation(s)
- Anjali Garg
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Govinda Rao Dabburu
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Neelja Singhal
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
- * E-mail: (MK); (NS)
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
- * E-mail: (MK); (NS)
| |
Collapse
|
10
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
11
|
Oldfield CJ, Peng Z, Kurgan L. Disordered RNA-Binding Region Prediction with DisoRDPbind. Methods Mol Biol 2021; 2106:225-239. [PMID: 31889261 DOI: 10.1007/978-1-0716-0231-7_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
RNA chaperone activity is one of the many functions of intrinsically disordered regions (IDRs). IDRs function without the prerequisite of a stable structure. Instead, their functions arise from structural ensembles. A common theme in IDR function is molecular recognition; IDRs mediate interactions with other proteins, RNA, and DNA. Many computational methods are available to predict IDRs from protein sequence, but relatively few are available for predicting IDR functions. Available methods primarily focus on protein-protein interactions. DisoRDPbind was developed to predict several protein functions including interactions with RNA. This method is available as a user-friendly web interface, located at http://biomine.cs.vcu.edu/servers/DisoRDPbind/ . The development and architecture of DisoRDPbind is briefly presented, and its accuracy relative to other RNA-binding residue predictors is discussed. We explain usage of the web interface in detail and provide an example of prediction results and interpretation. While DisoRDPbind does not identify RNA chaperones directly, we provide a case study of an RNA chaperone, HCV core protein, as an example of the method's utility in the study of RNA chaperones.
Collapse
Affiliation(s)
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
12
|
Kurgan L, Li M, Li Y. The Methods and Tools for Intrinsic Disorder Prediction and their Application to Systems Medicine. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11320-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
13
|
Abstract
Functions of intrinsically disordered proteins do not require structure. Such structure-independent functionality has melted away the classic rigid "lock and key" representation of structure-function relationships in proteins, opening a new page in protein science, where molten keys operate on melted locks and where conformational flexibility and intrinsic disorder, structural plasticity and extreme malleability, multifunctionality and binding promiscuity represent a new-fangled reality. Analysis and understanding of this new reality require novel tools, and some of the techniques elaborated for the examination of intrinsically disordered protein functions are outlined in this review.
Collapse
Affiliation(s)
- Vladimir N. Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33620, USA
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Russian Federation
| |
Collapse
|
14
|
O’Brien KT, Mooney C, Lopez C, Pollastri G, Shields DC. Prediction of polyproline II secondary structure propensity in proteins. ROYAL SOCIETY OPEN SCIENCE 2020; 7:191239. [PMID: 32218953 PMCID: PMC7029904 DOI: 10.1098/rsos.191239] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 12/04/2019] [Indexed: 05/29/2023]
Abstract
Background: The polyproline II helix (PPIIH) is an extended protein left-handed secondary structure that usually but not necessarily involves prolines. Short PPIIHs are frequently, but not exclusively, found in disordered protein regions, where they may interact with peptide-binding domains. However, no readily usable software is available to predict this state. Results: We developed PPIIPRED to predict polyproline II helix secondary structure from protein sequences, using bidirectional recurrent neural networks trained on known three-dimensional structures with dihedral angle filtering. The performance of the method was evaluated in an external validation set. In addition to proline, PPIIPRED favours amino acids whose side chains extend from the backbone (Leu, Met, Lys, Arg, Glu, Gln), as well as Ala and Val. Utility for individual residue predictions is restricted by the rarity of the PPIIH feature compared to structurally common features. Conclusion: The software, available at http://bioware.ucd.ie/PPIIPRED, is useful in large-scale studies, such as evolutionary analyses of PPIIH, or computationally reducing large datasets of candidate binding peptides for further experimental validation.
Collapse
Affiliation(s)
- Kevin T. O’Brien
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Cyril Lopez
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Dublin, Ireland
- Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Denis C. Shields
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| |
Collapse
|
15
|
Poddar S, Chakravarty D, Chakrabarti P. Structural changes in DNA-binding proteins on complexation. Nucleic Acids Res 2019. [PMID: 29534202 PMCID: PMC6283420 DOI: 10.1093/nar/gky170] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Characterization and prediction of the DNA-biding regions in proteins are essential for our understanding of how proteins recognize/bind DNA. We analyze the unbound (U) and the bound (B) forms of proteins from the protein–DNA docking benchmark that contains 66 binary protein–DNA complexes along with their unbound counterparts. Proteins binding DNA undergo greater structural changes on complexation (in particular, those in the enzyme category) than those involved in protein–protein interactions (PPI). While interface atoms involved in PPI exhibit an increase in their solvent-accessible surface area (ASA) in the bound form in the majority of the cases compared to the unbound interface, protein–DNA interactions indicate increase and decrease in equal measure. In 25% structures, the U form has missing residues which are located in the interface in the B form. The missing atoms contribute more toward the buried surface area compared to other interface atoms. Lys, Gly and Arg are prominent in disordered segments that get ordered in the interface on complexation. In going from U to B, there may be an increase in coil and helical content at the expense of turns and strands. Consideration of flexibility cannot distinguish the interface residues from the surface residues in the U form.
Collapse
Affiliation(s)
- Sayan Poddar
- Department of Biochemistry, Bose Institute, P1/12 CIT Scheme VIIM, Kolkata 700054, India
| | - Devlina Chakravarty
- Bioinformatics Centre, Bose Institute, P1/12CIT Scheme VIIM, Kolkata 700054, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P1/12 CIT Scheme VIIM, Kolkata 700054, India.,Bioinformatics Centre, Bose Institute, P1/12CIT Scheme VIIM, Kolkata 700054, India
| |
Collapse
|
16
|
Katuwawala A, Ghadermarzi S, Kurgan L. Computational prediction of functions of intrinsically disordered regions. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2019; 166:341-369. [PMID: 31521235 DOI: 10.1016/bs.pmbts.2019.04.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Intrinsically disorder regions (IDRs) are abundant in nature, particularly among Eukaryotes. While they facilitate a wide spectrum of cellular functions including signaling, molecular assembly and recognition, translation, transcription and regulation, only several hundred IDRs are annotated functionally. This annotation gap motivates the development of fast and accurate computational methods that predict IDR functions directly from protein sequences. We introduce and describe a comprehensive collection of 25 methods that provide accurate predictions of IDRs that interact with proteins and nucleic acids, that function as flexible linkers and that moonlight multiple functions. Virtually all of these predictors can be accessed online and many were developed in the last few years. They utilize a wide range of predictive architectures and take advantage of modern machine learning algorithms. Our empirical analysis shows that predictors that are available as webservers enjoy high rates of citations, attesting to their practical value and popularity. The most cited methods include DISOPRED3, ANCHOR, alpha-MoRFpred, MoRFpred, fMoRFpred and MoRFCHiBi. We present two case studies to demonstrate that predictions produced by these computational tools are relatively easy to interpret and that they deliver valuable functional clues. However, the current computational tools cover a relatively narrow range of disorder functions. Further development efforts that would cover a broader range of functions should be pursued. We demonstrate that a sufficient amount of functionally annotated IDRs that are associated with several other disorder functions is already available and can be used to design and validate novel predictors.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States.
| |
Collapse
|
17
|
Katuwawala A, Peng Z, Yang J, Kurgan L. Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions. Comput Struct Biotechnol J 2019; 17:454-462. [PMID: 31007871 PMCID: PMC6453775 DOI: 10.1016/j.csbj.2019.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Revised: 03/22/2019] [Accepted: 03/23/2019] [Indexed: 12/28/2022] Open
Abstract
Molecular recognition features (MoRFs) are short protein-binding regions that undergo disorder-to-order transitions (induced folding) upon binding protein partners. These regions are abundant in nature and can be predicted from protein sequences based on their distinctive sequence signatures. This first-of-its-kind survey covers 14 MoRF predictors and six related methods for the prediction of short protein-binding linear motifs, disordered protein-binding regions and semi-disordered regions. We show that the development of MoRF predictors has accelerated in the recent years. These predictors depend on machine learning-derived models that were generated using training datasets where MoRFs are annotated using putative disorder. Our analysis reveals that they generate accurate predictions. We identified eight methods that offer area under the ROC curve (AUC) ≥ 0.7 on experimentally-validated test datasets. We show that modern MoRF predictors accurately find experimentally annotated MoRFs even though they were trained using the putative disorder annotations. They are relatively highly-cited, particularly the methods available as webservers that on average secure three times more citations than methods without this option. MoRF predictions contribute to the experimental discovery of protein-protein interactions, annotation of protein functions and computational analysis of a variety of proteomes, protein families, and pathways. We outline future development and application directions for these tools, stressing the importance to develop novel tools that would target interactions of disordered regions with other types of partners.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, USA
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, USA
| |
Collapse
|
18
|
Oldfield CJ, Uversky VN, Dunker AK, Kurgan L. Introduction to intrinsically disordered proteins and regions. Proteins 2019. [DOI: 10.1016/b978-0-12-816348-1.00001-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
19
|
Zhao B, Xue B. Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs. Int J Mol Sci 2018; 19:E3052. [PMID: 30301243 PMCID: PMC6213717 DOI: 10.3390/ijms19103052] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 09/24/2018] [Accepted: 10/04/2018] [Indexed: 02/06/2023] Open
Abstract
Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, FL 33620, USA.
| |
Collapse
|
20
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
21
|
Uversky VN. The roles of intrinsic disorder-based liquid-liquid phase transitions in the "Dr. Jekyll-Mr. Hyde" behavior of proteins involved in amyotrophic lateral sclerosis and frontotemporal lobar degeneration. Autophagy 2017; 13:2115-2162. [PMID: 28980860 DOI: 10.1080/15548627.2017.1384889] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Pathological developments leading to amyotrophic lateral sclerosis (ALS) and frontotemporal lobar degeneration (FTLD) are associated with misbehavior of several key proteins, such as SOD1 (superoxide dismutase 1), TARDBP/TDP-43, FUS, C9orf72, and dipeptide repeat proteins generated as a result of the translation of the intronic hexanucleotide expansions in the C9orf72 gene, PFN1 (profilin 1), GLE1 (GLE1, RNA export mediator), PURA (purine rich element binding protein A), FLCN (folliculin), RBM45 (RNA binding motif protein 45), SS18L1/CREST, HNRNPA1 (heterogeneous nuclear ribonucleoprotein A1), HNRNPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1), ATXN2 (ataxin 2), MAPT (microtubule associated protein tau), and TIA1 (TIA1 cytotoxic granule associated RNA binding protein). Although these proteins are structurally and functionally different and have rather different pathological functions, they all possess some levels of intrinsic disorder and are either directly engaged in or are at least related to the physiological liquid-liquid phase transitions (LLPTs) leading to the formation of various proteinaceous membrane-less organelles (PMLOs), both normal and pathological. This review describes the normal and pathological functions of these ALS- and FTLD-related proteins, describes their major structural properties, glances at their intrinsic disorder status, and analyzes the involvement of these proteins in the formation of normal and pathological PMLOs, with the ultimate goal of better understanding the roles of LLPTs and intrinsic disorder in the "Dr. Jekyll-Mr. Hyde" behavior of those proteins.
Collapse
Affiliation(s)
- Vladimir N Uversky
- a Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute , Morsani College of Medicine , University of South Florida , Tampa , FL , USA.,b Institute for Biological Instrumentation of the Russian Academy of Sciences , Pushchino, Moscow region , Russia
| |
Collapse
|
22
|
Meng F, Uversky VN, Kurgan L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 2017; 74:3069-3090. [PMID: 28589442 PMCID: PMC11107660 DOI: 10.1007/s00018-017-2555-4] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 06/01/2017] [Indexed: 12/19/2022]
Abstract
Computational prediction of intrinsic disorder in protein sequences dates back to late 1970 and has flourished in the last two decades. We provide a brief historical overview, and we review over 30 recent predictors of disorder. We are the first to also cover predictors of molecular functions of disorder, including 13 methods that focus on disordered linkers and disordered protein-protein, protein-RNA, and protein-DNA binding regions. We overview their predictive models, usability, and predictive performance. We highlight newest methods and predictors that offer strong predictive performance measured based on recent comparative assessments. We conclude that the modern predictors are relatively accurate, enjoy widespread use, and many of them are fast. Their predictions are conveniently accessible to the end users, via web servers and databases that store pre-computed predictions for millions of proteins. However, research into methods that predict many not yet addressed functions of intrinsic disorder remains an outstanding challenge.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, USA.
| |
Collapse
|
23
|
Du Z, Uversky VN. Functional roles of intrinsic disorder in CRISPR-associated protein Cas9. MOLECULAR BIOSYSTEMS 2017; 13:1770-1780. [PMID: 28692085 DOI: 10.1039/c7mb00279c] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein intrinsic disorder is an important characteristic commonly detected in multifunctional or RNA- and DNA-binding proteins. Due to their high conformational flexibility and solvent accessibility, intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) execute diverse functions including interaction with multiple partners, and are frequently subjected to various post-translational modifications. Recent studies on the components comprising the CRISPR (clustered regularly interspaced short palindromic repeats) system have elucidated the crystal structure of Cas9 proteins and the mechanism by which the Cas9-sgRNA complex recognizes and cleaves its target DNA. Yet the extent and functional implications of intrinsic disorder in the Cas9 protein have never been fully assessed. Here, we present a comprehensive computational analysis based on both sequence and structural data in an attempt to investigate the roles of IDPRs in the functioning of Cas9 proteins of different origin. We conclude that among the functional roles of IDPRs in Cas9 proteins are recognition of the target DNA and mediation of nucleic acid and protein binding.
Collapse
Affiliation(s)
- Zhihua Du
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, Florida, USA
| | | |
Collapse
|
24
|
Meng F, Kurgan L. DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences. Bioinformatics 2017; 32:i341-i350. [PMID: 27307636 PMCID: PMC4908364 DOI: 10.1093/bioinformatics/btw280] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation:http://biomine.ece.ualberta.ca/DFLpred/. Contact:lkurgan@vcu.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 2V4, Canada Department of Computer Science, Virginia Commonwealth University, Richmond, 23284, U.S.A
| |
Collapse
|
25
|
Abstract
Intrinsically disordered proteins and regions (IDPs and IDRs) are involved in a wide range of cellular functions and they often facilitate interactions with RNAs, DNAs, and proteins. Although many computational methods can predict IDPs and IDRs in protein sequences, only a few methods predict their functions and these functions primarily concern protein binding. We describe how to use the first computational method DisoRDPbind for high-throughput prediction of multiple functions of disordered regions. Our method predicts the RNA-, DNA-, and protein-binding residues located in IDRs in the input protein sequences. DisoRDPbind provides accurate predictions and is sufficiently fast to make predictions for full genomes. Our method is implemented as a user-friendly webserver that is freely available at http://biomine.ece.ualberta.ca/DisoRDPbind/ . We overview our predictor, discuss how to run the webserver, and show how to interpret the corresponding results. We also demonstrate the utility of our method based on two case studies, human BRCA1 protein that binds various proteins and DNA, and yeast 60S ribosomal protein L4 that interacts with proteins and RNA.
Collapse
|
26
|
Dinkel H, Van Roey K, Michael S, Kumar M, Uyar B, Altenberg B, Milchevskaya V, Schneider M, Kühn H, Behrendt A, Dahl SL, Damerell V, Diebel S, Kalman S, Klein S, Knudsen AC, Mäder C, Merrill S, Staudt A, Thiel V, Welti L, Davey NE, Diella F, Gibson TJ. ELM 2016--data update and new functionality of the eukaryotic linear motif resource. Nucleic Acids Res 2016; 44:D294-300. [PMID: 26615199 PMCID: PMC4702912 DOI: 10.1093/nar/gkv1291] [Citation(s) in RCA: 224] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Revised: 11/04/2015] [Accepted: 11/05/2015] [Indexed: 01/18/2023] Open
Abstract
The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org) is a manually curated database of short linear motifs (SLiMs). In this update, we present the latest additions to this resource, along with more improvements to the web interface. ELM 2016 contains more than 240 different motif classes with over 2700 experimentally validated instances, manually curated from more than 2400 scientific publications. In addition, more data have been made available as individually searchable pages and are downloadable in various formats.
Collapse
Affiliation(s)
- Holger Dinkel
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Kim Van Roey
- Health Services Research Unit, Operational Direction Public Health and Surveillance, Scientific Institute of Public Health (WIV-ISP), 1050 Brussels, Belgium
| | - Sushama Michael
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Manjeet Kumar
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Bora Uyar
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Brigitte Altenberg
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Vladislava Milchevskaya
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | - Helen Kühn
- Ruprecht-Karls-Universität, Heidelberg, Germany
| | | | | | | | | | - Sara Kalman
- Ruprecht-Karls-Universität, Heidelberg, Germany
| | | | | | | | | | | | - Vera Thiel
- Ruprecht-Karls-Universität, Heidelberg, Germany
| | - Lukas Welti
- Ruprecht-Karls-Universität, Heidelberg, Germany
| | - Norman E Davey
- Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland
| | - Francesca Diella
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Toby J Gibson
- Structural and Computational Biology, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| |
Collapse
|
27
|
Peng Z, Kurgan L. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder. Nucleic Acids Res 2015; 43:e121. [PMID: 26109352 PMCID: PMC4605291 DOI: 10.1093/nar/gkv585] [Citation(s) in RCA: 117] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Revised: 04/24/2015] [Accepted: 05/24/2015] [Indexed: 01/05/2023] Open
Abstract
Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein-protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/.
Collapse
Affiliation(s)
- Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, P.R. China Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, T6G 2V4, Canada
| | - Lukasz Kurgan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, T6G 2V4, Canada
| |
Collapse
|
28
|
Kuenemann MA, Sperandio O, Labbé CM, Lagorce D, Miteva MA, Villoutreix BO. In silico design of low molecular weight protein-protein interaction inhibitors: Overall concept and recent advances. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2015; 119:20-32. [PMID: 25748546 DOI: 10.1016/j.pbiomolbio.2015.02.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2014] [Revised: 02/18/2015] [Accepted: 02/24/2015] [Indexed: 12/22/2022]
Abstract
Protein-protein interactions (PPIs) are carrying out diverse functions in living systems and are playing a major role in the health and disease states. Low molecular weight (LMW) "drug-like" inhibitors of PPIs would be very valuable not only to enhance our understanding over physiological processes but also for drug discovery endeavors. However, PPIs were deemed intractable by LMW chemicals during many years. But today, with the new experimental and in silico technologies that have been developed, about 50 PPIs have already been inhibited by LMW molecules. Here, we first focus on general concepts about protein-protein interactions, present a consensual view about ligandable pockets at the protein interfaces and the possibilities of using fast and cost effective structure-based virtual screening methods to identify PPI hits. We then discuss the design of compound collections dedicated to PPIs. Recent financial analyses of the field suggest that LMW PPI modulators could be gaining momentum over biologics in the coming years supporting further research in this area.
Collapse
Affiliation(s)
- Mélaine A Kuenemann
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France
| | - Olivier Sperandio
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France; CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse, 59000 Lille, France
| | - Céline M Labbé
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France; CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse, 59000 Lille, France
| | - David Lagorce
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France
| | - Maria A Miteva
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France
| | - Bruno O Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France; Inserm, U973, Paris 75013, France; CDithem, Faculté de Pharmacie, 1 rue du Prof Laguesse, 59000 Lille, France.
| |
Collapse
|
29
|
Malhis N, Gsponer J. Computational identification of MoRFs in protein sequences. ACTA ACUST UNITED AC 2015; 31:1738-44. [PMID: 25637562 DOI: 10.1093/bioinformatics/btv060] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 01/25/2015] [Indexed: 11/14/2022]
Abstract
MOTIVATION Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is the binding of molecular recognition features (MoRFs) to globular protein domains in a process known as a disorder-to-order transition. Predicting the location of MoRFs in protein sequences with high accuracy remains an important computational challenge. METHOD In this study, we introduce MoRFCHiBi, a new computational approach for fast and accurate prediction of MoRFs in protein sequences. MoRFCHiBi combines the outcomes of two support vector machine (SVM) models that take advantage of two different kernels with high noise tolerance. The first, SVMS, is designed to extract maximal information from the general contrast in amino acid compositions between MoRFs, their surrounding regions (Flanks), and the remainders of the sequences. The second, SVMT, is used to identify similarities between regions in a query sequence and MoRFs of the training set. RESULTS We evaluated the performance of our predictor by comparing its results with those of two currently available MoRF predictors, MoRFpred and ANCHOR. Using three test sets that have previously been collected and used to evaluate MoRFpred and ANCHOR, we demonstrate that MoRFCHiBi outperforms the other predictors with respect to different evaluation metrics. In addition, MoRFCHiBi is downloadable and fast, which makes it useful as a component in other computational prediction tools. AVAILABILITY AND IMPLEMENTATION http://www.chibi.ubc.ca/morf/.
Collapse
Affiliation(s)
- Nawar Malhis
- Centre for High-Throughput Biology and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Jörg Gsponer
- Centre for High-Throughput Biology and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada Centre for High-Throughput Biology and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
30
|
Duffy FJ, Devocelle M, Shields DC. Computational approaches to developing short cyclic peptide modulators of protein-protein interactions. Methods Mol Biol 2015; 1268:241-71. [PMID: 25555728 DOI: 10.1007/978-1-4939-2285-7_11] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Cyclic peptides are a promising class of bioactive molecules potentially capable of modulating "difficult" targets, such as protein-protein interactions. Cyclic peptides have long been used as therapeutics derived from natural product derivatives, but remain an underexplored class of compounds from the perspective of rational drug design, possibly due to the known weaknesses of peptide drugs in general. While cyclic peptides are non"druglike" by the accepted empirical rules, their unique structure may lend itself to both membrane permeability and proteolytic resistance-the main barriers to oral delivery. The constrained shape of cyclic peptides also lends itself better to virtual screening approaches, and new tools and successes in this area have been recently noted. An increasing number of strategies are available, both to generate and screen cyclic peptide libraries, and best practises and current successes are described within. This chapter will describe various computational strategies for virtual screening cyclic peptides, along with known implementations and applications. We will explore the generation and screening of diverse combinatorial virtual libraries, incorporating a range of cyclization strategies and structural modifications. More advanced approaches covered include evolutionary algorithms designed to aid in screening large structural libraries, machine learning approaches, and harnessing bioinformatics resources to bias cyclic peptide virtual libraries towards known bioactive structures.
Collapse
Affiliation(s)
- Fergal J Duffy
- School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | | | | |
Collapse
|
31
|
Jones DT, Cozzetto D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. ACTA ACUST UNITED AC 2014; 31:857-63. [PMID: 25391399 PMCID: PMC4380029 DOI: 10.1093/bioinformatics/btu744] [Citation(s) in RCA: 616] [Impact Index Per Article: 61.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level. Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein-binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools. Availability and implementation:http://bioinf.cs.ucl.ac.uk/disopred Contact:d.t.jones@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | - Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|