1
|
Basu S, Zhao B, Biró B, Faraggi E, Gsponer J, Hu G, Kloczkowski A, Malhis N, Mirdita M, Söding J, Steinegger M, Wang D, Wang K, Xu D, Zhang J, Kurgan L. DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options. Nucleic Acids Res 2024; 52:D426-D433. [PMID: 37933852 PMCID: PMC10767971 DOI: 10.1093/nar/gkad985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023] Open
Abstract
The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Bálint Biró
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
- Department of Animal Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
| | - Eshel Faraggi
- Physics Department, Indiana University, Indianapolis, IN, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Andrzej Kloczkowski
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
- Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Duolin Wang
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, P.R. China
| | - Dong Xu
- Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, USA
| | - Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, P.R. China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
2
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
3
|
Gunther MJ, Pavlović RZ, Finnegan TJ, Wang X, Badjić JD. Enantioselective Construction of Modular and Asymmetric Baskets. Angew Chem Int Ed Engl 2021; 60:25075-25081. [PMID: 34672062 DOI: 10.1002/anie.202110849] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Indexed: 12/19/2022]
Abstract
The precise positioning of functional groups about the inner space of abiotic hosts is a challenging task and of interest for developing more effective receptors and catalysts akin to those found in nature. To address it, we herein report a synthetic methodology for preparing basket-like cavitands comprised of three different aromatics as side arms with orthogonal esters at the rim for further functionalization. First, enantioenriched A (borochloronorbornene), B (iodobromonorbornene), and C (boronorbornene) building blocks were obtained by stereoselective syntheses. Second, consecutive A-to-B and then AB-to-C Suzuki-Miyaura (SM) couplings were optimized to give enantioenriched ABC cavitand as the principal product. The robust synthetic protocol allowed us to prepare (a) an enantioenriched basket with three benzene sides and each holding either tBu, Et, or Me esters, (b) both enantiomers of a so-called "spiral staircase" basket with benzene, naphthalene, and anthracene groups surrounding the inner space, and (c) a photo-responsive basket bearing one anthracene and two benzene arms.
Collapse
Affiliation(s)
- Michael J Gunther
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH, USA
| | - Radoslav Z Pavlović
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH, USA
| | - Tyler J Finnegan
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH, USA
| | - Xiuze Wang
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH, USA
| | - Jovica D Badjić
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH, USA
| |
Collapse
|
4
|
Gunther MJ, Pavlović RZ, Finnegan TJ, Wang X, Badjić JD. Enantioselective Construction of Modular and Asymmetric Baskets. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202110849] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Michael J. Gunther
- Department of Chemistry & Biochemistry The Ohio State University 100 West 18th Avenue Columbus OH USA
| | - Radoslav Z. Pavlović
- Department of Chemistry & Biochemistry The Ohio State University 100 West 18th Avenue Columbus OH USA
| | - Tyler J. Finnegan
- Department of Chemistry & Biochemistry The Ohio State University 100 West 18th Avenue Columbus OH USA
| | - Xiuze Wang
- Department of Chemistry & Biochemistry The Ohio State University 100 West 18th Avenue Columbus OH USA
| | - Jovica D. Badjić
- Department of Chemistry & Biochemistry The Ohio State University 100 West 18th Avenue Columbus OH USA
| |
Collapse
|
5
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
6
|
Izumi H, Nafie LA, Dukor RK. SSSCPreds: Deep Neural Network-Based Software for the Prediction of Conformational Variability and Application to SARS-CoV-2. ACS OMEGA 2020; 5:30556-30567. [PMID: 33283104 PMCID: PMC7687297 DOI: 10.1021/acsomega.0c04472] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 11/05/2020] [Indexed: 05/05/2023]
Abstract
Amino acid mutations that improve protein stability and rigidity can accompany increases in binding affinity. Therefore, conserved amino acids located on a protein surface may be successfully targeted by antibodies. The quantitative deep mutational scanning approach is an excellent technique to understand viral evolution, and the obtained data can be utilized to develop a vaccine. However, the application of the approach to all of the proteins in general is difficult in terms of cost. To address this need, we report the construction of a deep neural network-based program for sequence-based prediction of supersecondary structure codes (SSSCs), called SSSCPrediction (SSSCPred). Further, to predict conformational flexibility or rigidity in proteins, a comparison program called SSSCPreds that consists of three deep neural network-based prediction systems (SSSCPred, SSSCPred100, and SSSCPred200) has also been developed. Using our algorithms we calculated here shows the degree of flexibility for the receptor-binding motif of SARS-CoV-2 spike protein and the rigidity of the unique motif (SSSC: SSSHSSHHHH) at the S2 subunit and has a value independent of the X-ray and Cryo-EM structures. The fact that the sequence flexibility/rigidity map of SARS-CoV-2 RBD resembles the sequence-to-phenotype maps of ACE2-binding affinity and expression, which were experimentally obtained by deep mutational scanning, suggests that the identical SSSC sequences among the ones predicted by three deep neural network-based systems correlate well with the sequences with both lower ACE2-binding affinity and lower expression. The combined analysis of predicted and observed SSSCs with keyword-tagged datasets would be helpful in understanding the structural correlation to the examined system.
Collapse
Affiliation(s)
- Hiroshi Izumi
- National
Institute of Advanced Industrial Science and Technology (AIST), AIST
Tsukuba West, 16-1 Onogawa, Tsukuba, Ibaraki 305-8569, Japan
| | - Laurence A. Nafie
- Department
of Chemistry, Syracuse University, Syracuse, New York 13244-4100, United States
- BioTools
Inc., 17546 SR 710 (Bee
Line Hwy), Jupiter, Florida 33458, United States
| | - Rina K. Dukor
- BioTools
Inc., 17546 SR 710 (Bee
Line Hwy), Jupiter, Florida 33458, United States
| |
Collapse
|
7
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
8
|
Zamora-Carreras H, Maestro B, Sanz JM, Jiménez MA. Turncoat Polypeptides: We Adapt to Our Environment. Chembiochem 2019; 21:432-441. [PMID: 31456307 DOI: 10.1002/cbic.201900446] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Indexed: 01/25/2023]
Abstract
A common interpretation of Anfinsen's hypothesis states that one amino acid sequence should fold into a single, native, ordered state, or a highly similar set thereof, coinciding with the global minimum in the folding-energy landscape, which, in turn, is responsible for the function of the protein. However, this classical view is challenged by many proteins and peptide sequences, which can adopt exchangeable, significantly dissimilar conformations that even fulfill different biological roles. The similarities and differences of concepts related to these proteins, mainly chameleon sequences, metamorphic proteins, and switch peptides, which are all denoted herein "turncoat" polypeptides, are reviewed. As well as adding a twist to the conventional view of protein folding, the lack of structural definition adds clear versatility to the activity of proteins and can be used as a tool for protein design and further application in biotechnology and biomedicine.
Collapse
Affiliation(s)
- Héctor Zamora-Carreras
- Instituto de Química-Física Rocasolano (IQFR), Consejo Superior de Investigaciones Científicas (CSIC), Serrano 119, 28006, Madrid, Spain
| | - Beatriz Maestro
- Centro de Investigaciones Biológicas (CIB), Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain
| | - Jesús M Sanz
- Centro de Investigaciones Biológicas (CIB), Consejo Superior de Investigaciones Científicas (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Av. Monforte de Lemos, 3-5. Pabellón, 28029, Madrid, Spain
| | - M Angeles Jiménez
- Instituto de Química-Física Rocasolano (IQFR), Consejo Superior de Investigaciones Científicas (CSIC), Serrano 119, 28006, Madrid, Spain
| |
Collapse
|