1
|
Pei J, Andreeva A, Chuguransky S, Lázaro Pinto B, Paysan-Lafosse T, Dustin Schaeffer R, Bateman A, Cong Q, Grishin NV. Bridging the Gap between Sequence and Structure Classifications of Proteins with AlphaFold Models. J Mol Biol 2024; 436:168764. [PMID: 39197652 DOI: 10.1016/j.jmb.2024.168764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/13/2024] [Accepted: 08/20/2024] [Indexed: 09/01/2024]
Abstract
Classification of protein domains based on homology and structural similarity serves as a fundamental tool to gain biological insights into protein function. Recent advancements in protein structure prediction, exemplified by AlphaFold, have revolutionized the availability of protein structural data. We focus on classifying about 9000 Pfam families into ECOD (Evolutionary Classification of Domains) by using predicted AlphaFold models and the DPAM (Domain Parser for AlphaFold Models) tool. Our results offer insights into their homologous relationships and domain boundaries. More than half of these Pfam families contain DPAM domains that can be confidently assigned to the ECOD hierarchy. Most assigned domains belong to highly populated folds such as Immunoglobulin-like (IgL), Armadillo (ARM), helix-turn-helix (HTH), and Src homology 3 (SH3). A large fraction of DPAM domains, however, cannot be confidently assigned to ECOD homologous groups. These unassigned domains exhibit statistically different characteristics, including shorter average length, fewer secondary structure elements, and more abundant transmembrane segments. They could potentially define novel families remotely related to domains with known structures or novel superfamilies and folds. Manual scrutiny of a subset of these domains revealed an abundance of internal duplications and recurring structural motifs. Exploring sequence and structural features such as disulfide bond patterns, metal-binding sites, and enzyme active sites helped uncover novel structural folds as well as remote evolutionary relationships. By bridging the gap between sequence-based Pfam and structure-based ECOD domain classifications, our study contributes to a more comprehensive understanding of the protein universe by providing structural and functional insights into previously uncharacterized proteins.
Collapse
Affiliation(s)
- Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Beatriz Lázaro Pinto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
2
|
Toledo-Patiño S, Goetz SK, Shanmugaratnam S, Höcker B, Farías-Rico JA. Molecular handcraft of a well-folded protein chimera. FEBS Lett 2024; 598:1375-1386. [PMID: 38508768 DOI: 10.1002/1873-3468.14856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]
Abstract
Modular assembly is a compelling pathway to create new proteins, a concept supported by protein engineering and millennia of evolution. Natural evolution provided a repository of building blocks, known as domains, which trace back to even shorter segments that underwent numerous 'copy-paste' processes culminating in the scaffolds we see today. Utilizing the subdomain-database Fuzzle, we constructed a fold-chimera by integrating a flavodoxin-like fragment into a periplasmic binding protein. This chimera is well-folded and a crystal structure reveals stable interfaces between the fragments. These findings demonstrate the adaptability of α/β-proteins and offer a stepping stone for optimization. By emphasizing the practicality of fragment databases, our work pioneers new pathways in protein engineering. Ultimately, the results substantiate the conjecture that periplasmic binding proteins originated from a flavodoxin-like ancestor.
Collapse
Affiliation(s)
- Saacnicteh Toledo-Patiño
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Okinawa Institute of Science and Technology Graduate University, Japan
| | | | - Sooruban Shanmugaratnam
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - José Arcadio Farías-Rico
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| |
Collapse
|
3
|
Zhang J, Chen Q, Liu B. iNucRes-ASSH: Identifying nucleic acid-binding residues in proteins by using self-attention-based structure-sequence hybrid neural network. Proteins 2024; 92:395-410. [PMID: 37915276 DOI: 10.1002/prot.26626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 09/27/2023] [Accepted: 10/17/2023] [Indexed: 11/03/2023]
Abstract
Interaction between proteins and nucleic acids is crucial to many cellular activities. Accurately detecting nucleic acid-binding residues (NABRs) in proteins can help researchers better understand the interaction mechanism between proteins and nucleic acids. Structure-based methods can generally make more accurate predictions than sequence-based methods. However, the existing structure-based methods are sensitive to protein conformational changes, causing limited generalizability. More effective and robust approaches should be further explored. In this study, we propose iNucRes-ASSH to identify nucleic acid-binding residues with a self-attention-based structure-sequence hybrid neural network. It improves the generalizability and robustness of NABR prediction from two levels: residue representation and prediction model. Experimental results show that iNucRes-ASSH can predict the nucleic acid-binding residues even when the experimentally validated structures are unavailable and outperforms five competing methods on a recent benchmark dataset and a widely used test dataset.
Collapse
Affiliation(s)
- Jun Zhang
- National Engineering Laboratory for Big Data System Computing Technology, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, China
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
4
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
5
|
Vander Meersche Y, Cretin G, Gheeraert A, Gelly JC, Galochkina T. ATLAS: protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Res 2024; 52:D384-D392. [PMID: 37986215 PMCID: PMC10767941 DOI: 10.1093/nar/gkad1084] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/15/2023] [Accepted: 10/30/2023] [Indexed: 11/22/2023] Open
Abstract
Dynamical behaviour is one of the most crucial protein characteristics. Despite the advances in the field of protein structure resolution and prediction, analysis and prediction of protein dynamic properties remains a major challenge, mostly due to the low accessibility of data and its diversity and heterogeneity. To address this issue, we present ATLAS, a database of standardised all-atom molecular dynamics simulations, accompanied by their analysis in the form of interactive diagrams and trajectory visualisation. ATLAS offers a large-scale view and valuable insights on protein dynamics for a large and representative set of proteins, by combining data obtained through molecular dynamics simulations with information extracted from experimental structures. Users can easily analyse dynamic properties of functional protein regions, such as domain limits (hinge positions) and residues involved in interaction with other biological molecules. Additionally, the database enables exploration of proteins with uncommon dynamic properties conditioned by their environment such as chameleon subsequences and Dual Personality Fragments. The ATLAS database is freely available at https://www.dsimb.inserm.fr/ATLAS.
Collapse
Affiliation(s)
- Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| |
Collapse
|
6
|
Alvarez-Carreño C, Arciniega M, Ribas de Pouplana L, Petrov AS, Hernández-González A, Dimas-Torres JU, Valencia-Sánchez MI, Williams LD, Torres-Larios A. Common evolutionary origins of the bacterial glycyl tRNA synthetase and alanyl tRNA synthetase. Protein Sci 2023; 33:e4844. [PMID: 38009704 PMCID: PMC10895455 DOI: 10.1002/pro.4844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/07/2023] [Accepted: 11/18/2023] [Indexed: 11/29/2023]
Abstract
Aminoacyl-tRNA synthetases (aaRSs) establish the genetic code. Each aaRS covalently links a given canonical amino acid to a cognate set of tRNA isoacceptors. Glycyl tRNA aminoacylation is unusual in that it is catalyzed by different aaRSs in different lineages of the Tree of Life. We have investigated the phylogenetic distribution and evolutionary history of bacterial glycyl tRNA synthetase (bacGlyRS). This enzyme is found in early diverging bacterial phyla such as Firmicutes, Acidobacteria, and Proteobacteria, but not in archaea or eukarya. We observe relationships between each of six domains of bacGlyRS and six domains of four different RNA-modifying proteins. Component domains of bacGlyRS show common ancestry with (i) the catalytic domain of class II tRNA synthetases; (ii) the HD domain of the bacterial RNase Y; (iii) the body and tail domains of the archaeal CCA-adding enzyme; (iv) the anti-codon binding domain of the arginyl tRNA synthetase; and (v) a previously unrecognized domain that we call ATL (Ancient tRNA latch). The ATL domain has been found thus far only in bacGlyRS and in the universal alanyl tRNA synthetase (uniAlaRS). Further, the catalytic domain of bacGlyRS is more closely related to the catalytic domain of uniAlaRS than to any other aminoacyl tRNA synthetase. The combined results suggest that the ATL and catalytic domains of these two enzymes are ancestral to bacGlyRS and uniAlaRS, which emerged from common protein ancestors by bricolage, stepwise accumulation of protein domains, before the last universal common ancestor of life.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marcelino Arciniega
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Lluís Ribas de Pouplana
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia, Spain
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Adriana Hernández-González
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Jorge-Uriel Dimas-Torres
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marco Igor Valencia-Sánchez
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Alfredo Torres-Larios
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
7
|
Medvedev KE, Schaeffer RD, Chen KS, Grishin NV. Pan-cancer structurome reveals overrepresentation of beta sandwiches and underrepresentation of alpha helical domains. Sci Rep 2023; 13:11988. [PMID: 37491511 PMCID: PMC10368619 DOI: 10.1038/s41598-023-39273-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/22/2023] [Indexed: 07/27/2023] Open
Abstract
The recent progress in the prediction of protein structures marked a historical milestone. AlphaFold predicted 200 million protein models with an accuracy comparable to experimental methods. Protein structures are widely used to understand evolution and to identify potential drug targets for the treatment of various diseases, including cancer. Thus, these recently predicted structures might convey previously unavailable information about cancer biology. Evolutionary classification of protein domains is challenging and different approaches exist. Recently our team presented a classification of domains from human protein models released by AlphaFold. Here we evaluated the pan-cancer structurome, domains from over and under expressed proteins in 21 cancer types, using the broadest levels of the ECOD classification: the architecture (A-groups) and possible homology (X-groups) levels. Our analysis reveals that AlphaFold has greatly increased the three-dimensional structural landscape for proteins that are differentially expressed in these 21 cancer types. We show that beta sandwich domains are significantly overrepresented and alpha helical domains are significantly underrepresented in the majority of cancer types. Our data suggest that the prevalence of the beta sandwiches is due to the high levels of immunoglobulins and immunoglobulin-like domains that arise during tumor development-related inflammation. On the other hand, proteins with exclusively alpha domains are important elements of homeostasis, apoptosis and transmembrane transport. Therefore cancer cells tend to reduce representation of these proteins to promote successful oncogeneses.
Collapse
Affiliation(s)
- Kirill E Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kenneth S Chen
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Children's Medical Center Research Institute, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| |
Collapse
|
8
|
Schaeffer RD, Zhang J, Kinch LN, Pei J, Cong Q, Grishin NV. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci U S A 2023; 120:e2214069120. [PMID: 36917664 PMCID: PMC10041065 DOI: 10.1073/pnas.2214069120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 02/06/2023] [Indexed: 03/16/2023] Open
Abstract
Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX75390
- HHMI, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Jimin Pei
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX75390
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX75390
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX75390
| |
Collapse
|
9
|
Pan T, Li C, Bi Y, Wang Z, Gasser RB, Purcell AW, Akutsu T, Webb GI, Imoto S, Song J. PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships. Bioinformatics 2023; 39:7043095. [PMID: 36794913 PMCID: PMC9978587 DOI: 10.1093/bioinformatics/btad094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 02/17/2023] Open
Abstract
MOTIVATION The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. RESULTS Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. AVAILABILITY AND IMPLEMENTATION PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tong Pan
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Yue Bi
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Zhikang Wang
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Anthony W Purcell
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo 108-8639, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Jiangning Song
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia.,Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
10
|
Dapkūnas J, Margelevičius M. The COMER web server for protein analysis by homology. Bioinformatics 2022; 39:6909010. [PMID: 36519835 PMCID: PMC9825750 DOI: 10.1093/bioinformatics/btac807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 11/04/2022] [Accepted: 12/14/2022] [Indexed: 12/23/2022] Open
Abstract
SUMMARY Sequence homology is a basic concept in protein evolution, structure and function studies. However, there are not many different tools and services for homology searches being sensitive, accurate and fast at the same time. We present a new web server for protein analysis based on COMER2, a sequence alignment and homology search method that exhibits these characteristics. COMER2 has been upgraded since its last publication to improve its alignment quality and ease of use. We demonstrate how the user can benefit from using it by providing examples of extensive annotation of proteins of unknown function. Among the distinctive features of the web server is the user's ability to submit multiple queries with one click of a button. This and other features allow for transparently running homology searches-in a command-line, programmatic or graphical environment-across multiple databases with multiple queries. They also promote extensive simultaneous protein analysis at the sequence, structure and function levels. AVAILABILITY AND IMPLEMENTATION The COMER web server is available at https://bioinformatics.lt/comer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
A proteome-wide map of chaperone-assisted protein refolding in a cytosol-like milieu. Proc Natl Acad Sci U S A 2022; 119:e2210536119. [PMID: 36417429 PMCID: PMC9860312 DOI: 10.1073/pnas.2210536119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The journey by which proteins navigate their energy landscapes to their native structures is complex, involving (and sometimes requiring) many cellular factors and processes operating in partnership with a given polypeptide chain's intrinsic energy landscape. The cytosolic environment and its complement of chaperones play critical roles in granting many proteins safe passage to their native states; however, it is challenging to interrogate the folding process for large numbers of proteins in a complex background with most biophysical techniques. Hence, most chaperone-assisted protein refolding studies are conducted in defined buffers on single purified clients. Here, we develop a limited proteolysis-mass spectrometry approach paired with an isotope-labeling strategy to globally monitor the structures of refolding Escherichia coli proteins in the cytosolic medium and with the chaperones, GroEL/ES (Hsp60) and DnaK/DnaJ/GrpE (Hsp70/40). GroEL can refold the majority (85%) of the E. coli proteins for which we have data and is particularly important for restoring acidic proteins and proteins with high molecular weight, trends that come to light because our assay measures the structural outcome of the refolding process itself, rather than binding or aggregation. For the most part, DnaK and GroEL refold a similar set of proteins, supporting the view that despite their vastly different structures, these two chaperones unfold misfolded states, as one mechanism in common. Finally, we identify a cohort of proteins that are intransigent to being refolded with either chaperone. We suggest that these proteins may fold most efficiently cotranslationally, and then remain kinetically trapped in their native conformations.
Collapse
|
12
|
Johansson-Åkhe I, Wallner B. Improving peptide-protein docking with AlphaFold-Multimer using forced sampling. FRONTIERS IN BIOINFORMATICS 2022; 2:959160. [PMID: 36304330 PMCID: PMC9580857 DOI: 10.3389/fbinf.2022.959160] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/16/2022] [Indexed: 12/02/2022] Open
Abstract
Protein interactions are key in vital biological processes. In many cases, particularly in regulation, this interaction is between a protein and a shorter peptide fragment. Such peptides are often part of larger disordered regions in other proteins. The flexible nature of peptides enables the rapid yet specific regulation of important functions in cells, such as their life cycle. Consequently, knowledge of the molecular details of peptide-protein interactions is crucial for understanding and altering their function, and many specialized computational methods have been developed to study them. The recent release of AlphaFold and AlphaFold-Multimer has led to a leap in accuracy for the computational modeling of proteins. In this study, the ability of AlphaFold to predict which peptides and proteins interact, as well as its accuracy in modeling the resulting interaction complexes, are benchmarked against established methods. We find that AlphaFold-Multimer predicts the structure of peptide-protein complexes with acceptable or better quality (DockQ ≥0.23) for 66 of the 112 complexes investigated-25 of which were high quality (DockQ ≥0.8). This is a massive improvement on previous methods with 23 or 47 acceptable models and only four or eight high quality models, when using energy-based docking or interaction templates, respectively. In addition, AlphaFold-Multimer can be used to predict whether a peptide and a protein will interact. At 1% false positives, AlphaFold-Multimer found 26% of the possible interactions with a precision of 85%, the best among the methods benchmarked. However, the most interesting result is the possibility of improving AlphaFold by randomly perturbing the neural network weights to force the network to sample more of the conformational space. This increases the number of acceptable models from 66 to 75 and improves the median DockQ from 0.47 to 0.55 (17%) for first ranked models. The best possible DockQ improves from 0.58 to 0.72 (24%), indicating that selecting the best possible model is still a challenge. This scheme of generating more structures with AlphaFold should be generally useful for many applications involving multiple states, flexible regions, and disorder.
Collapse
Affiliation(s)
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| |
Collapse
|
13
|
Kozlova MI, Shalaeva DN, Dibrova DV, Mulkidjanian AY. Common Patterns of Hydrolysis Initiation in P-loop Fold Nucleoside Triphosphatases. Biomolecules 2022; 12:1345. [PMID: 36291554 PMCID: PMC9599529 DOI: 10.3390/biom12101345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 08/20/2022] [Accepted: 09/14/2022] [Indexed: 11/24/2022] Open
Abstract
The P-loop fold nucleoside triphosphate (NTP) hydrolases (also known as Walker NTPases) function as ATPases, GTPases, and ATP synthases, are often of medical importance, and represent one of the largest and evolutionarily oldest families of enzymes. There is still no consensus on their catalytic mechanism. To clarify this, we performed the first comparative structural analysis of more than 3100 structures of P-loop NTPases that contain bound substrate Mg-NTPs or their analogues. We proceeded on the assumption that structural features common to these P-loop NTPases may be essential for catalysis. Our results are presented in two articles. Here, in the first, we consider the structural elements that stimulate hydrolysis. Upon interaction of P-loop NTPases with their cognate activating partners (RNA/DNA/protein domains), specific stimulatory moieties, usually Arg or Lys residues, are inserted into the catalytic site and initiate the cleavage of gamma phosphate. By analyzing a plethora of structures, we found that the only shared feature was the mechanistic interaction of stimulators with the oxygen atoms of gamma-phosphate group, capable of causing its rotation. One of the oxygen atoms of gamma phosphate coordinates the cofactor Mg ion. The rotation must pull this oxygen atom away from the Mg ion. This rearrangement should affect the properties of the other Mg ligands and may initiate hydrolysis according to the mechanism elaborated in the second article.
Collapse
Affiliation(s)
- Maria I. Kozlova
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Daria N. Shalaeva
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Daria V. Dibrova
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
| | - Armen Y. Mulkidjanian
- School of Physics, Osnabrueck University, D-49069 Osnabrueck, Germany
- Center of Cellular Nanoanalytics, Osnabrueck University, D-49069 Osnabrueck, Germany
| |
Collapse
|
14
|
Mushegian A. Methyltransferases of Riboviria. Biomolecules 2022; 12:1247. [PMID: 36139088 PMCID: PMC9496149 DOI: 10.3390/biom12091247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/01/2022] [Accepted: 09/03/2022] [Indexed: 11/17/2022] Open
Abstract
Many viruses from the realm Riboviria infecting eukaryotic hosts encode protein domains with sequence similarity to S-adenosylmethionine-dependent methyltransferases. These protein domains are thought to be involved in methylation of the 5'-terminal cap structures in virus mRNAs. Some methyltransferase-like domains of Riboviria are homologous to the widespread cellular FtsJ/RrmJ-like methyltransferases involved in modification of cellular RNAs; other methyltransferases, found in a subset of positive-strand RNA viruses, have been assigned to a separate "Sindbis-like" family; and coronavirus-specific Nsp13/14-like methyltransferases appeared to be different from both those classes. The representative structures of proteins from all three groups belong to a specific variety of the Rossmann fold with a seven-stranded β-sheet, but it was unclear whether this structural similarity extends to the level of conserved sequence signatures. Here I survey methyltransferases in Riboviria and derive a joint sequence alignment model that covers all groups of virus methyltransferases and subsumes the previously defined conserved sequence motifs. Analysis of the spatial structures indicates that two highly conserved residues, a lysine and an aspartate, frequently contact a water molecule, which is located in the enzyme active center next to the methyl group of S-adenosylmethionine cofactor and could play a key role in the catalytic mechanism of the enzyme. Phylogenetic evidence indicates a likely origin of all methyltransferases of Riboviria from cellular RrmJ-like enzymes and their rapid divergence with infrequent horizontal transfer between distantly related viruses.
Collapse
Affiliation(s)
- Arcady Mushegian
- Division of Molecular and Cellular Biosciences, National Science Foundation, 2415 Eisenhower Ave., Alexandria, VA 22314, USA
| |
Collapse
|
15
|
Lecoy J, Sachin Ranade S, Rosario García-Gil M. Analysis of the ASR and LP3 homologous gene families reveal positive selection acting on LP3-3 gene. Gene 2022; 850:146935. [DOI: 10.1016/j.gene.2022.146935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/20/2022] [Accepted: 09/26/2022] [Indexed: 11/17/2022]
|
16
|
Holm L. Dali server: structural unification of protein families. Nucleic Acids Res 2022; 50:W210-W215. [PMID: 35610055 PMCID: PMC9252788 DOI: 10.1093/nar/gkac387] [Citation(s) in RCA: 392] [Impact Index Per Article: 196.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 04/27/2022] [Accepted: 05/02/2022] [Indexed: 12/26/2022] Open
Abstract
Protein structure is key to understanding biological function. Structure comparison deciphers deep phylogenies, providing insight into functional conservation and functional shifts during evolution. Until recently, structural coverage of the protein universe was limited by the cost and labour involved in experimental structure determination. Recent breakthroughs in deep learning revolutionized structural bioinformatics by providing accurate structural models of numerous protein families for which no structural information existed. The Dali server for 3D protein structure comparison is widely used by crystallographers to relate new structures to pre-existing ones. Here, we report two most recent upgrades to the web server: (i) the foldomes of key organisms in the AlphaFold Database (version 1) are searchable by Dali, (ii) structural alignments are annotated with protein families. Using these new features, we discovered a novel functionally diverse subgroup within the WRKY/GCM1 clan. This was accomplished by linking the structurally characterized SWI/SNF and NAM families as well as the structural models of the CG-1 family and uncharacterized proteins to the structure of Gti1/Pac2, a previously known member of the WRKY/GCM1 clan. The Dali server is available at http://ekhidna2.biocenter.helsinki.fi/dali. This website is free and open to all users and there is no login requirement.
Collapse
Affiliation(s)
- Liisa Holm
- Institute of Biotechnology, Helsinki Institute of Life Sciences, and Organismal and Evolutionary Biology Research Program, Faculty of Biosciences, University of Helsinki, Finland
| |
Collapse
|
17
|
Mestre MR, Gao LA, Shah SA, López-Beltrán A, González-Delgado A, Martínez-Abarca F, Iranzo J, Redrejo-Rodríguez M, Zhang F, Toro N. UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions. Nucleic Acids Res 2022; 50:6084-6101. [PMID: 35648479 PMCID: PMC9226505 DOI: 10.1093/nar/gkac467] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/11/2022] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
Reverse transcriptases (RTs) are enzymes capable of synthesizing DNA using RNA as a template. Within the last few years, a burst of research has led to the discovery of novel prokaryotic RTs with diverse antiviral properties, such as DRTs (Defense-associated RTs), which belong to the so-called group of unknown RTs (UG) and are closely related to the Abortive Infection system (Abi) RTs. In this work, we performed a systematic analysis of UG and Abi RTs, increasing the number of UG/Abi members up to 42 highly diverse groups, most of which are predicted to be functionally associated with other gene(s) or domain(s). Based on this information, we classified these systems into three major classes. In addition, we reveal that most of these groups are associated with defense functions and/or mobile genetic elements, and demonstrate the antiphage role of four novel groups. Besides, we highlight the presence of one of these systems in novel families of human gut viruses infecting members of the Bacteroidetes and Firmicutes phyla. This work lays the foundation for a comprehensive and unified understanding of these highly diverse RTs with enormous biotechnological potential.
Collapse
Affiliation(s)
- Mario Rodríguez Mestre
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Linyi Alex Gao
- Howard Hughes Medical Institute, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Society of Fellows, Harvard University, Cambridge, MA 02138, USA
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood, Copenhagen University Hospital, Herlev-Gentofte, Ledreborg Allé 34, DK-2820 Gentofte, Denmark
| | - Adrián López-Beltrán
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Alejandro González-Delgado
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Francisco Martínez-Abarca
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
| | - Modesto Redrejo-Rodríguez
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Feng Zhang
- Howard Hughes Medical Institute, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nicolás Toro
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| |
Collapse
|
18
|
Young RT, Czapla L, Wefers ZO, Cohen BM, Olson WK. Revisiting DNA Sequence-Dependent Deformability in High-Resolution Structures: Effects of Flanking Base Pairs on Dinucleotide Morphology and Global Chain Configuration. LIFE (BASEL, SWITZERLAND) 2022; 12:life12050759. [PMID: 35629425 PMCID: PMC9146901 DOI: 10.3390/life12050759] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/13/2022] [Accepted: 05/15/2022] [Indexed: 11/24/2022]
Abstract
DNA carries more than the list of biochemical ingredients that drive the basic functions of living systems. The sequence of base pairs includes a multitude of structural and energetic signals, which determine the degree to which the long, threadlike molecule moves and how it responds to proteins and other molecules that control its processing and govern its packaging. The chemical composition of base pairs directs the spatial disposition and fluctuations of successive residues. The observed arrangements of these moieties in high-resolution protein–DNA crystal structures provide one of the best available estimates of the natural, sequence-dependent structure and deformability of the double-helical molecule. Here, we update the set of knowledge-based elastic potentials designed to describe the observed equilibrium structures and configurational fluctuations of the ten unique base-pair steps. The large number of currently available structures makes it possible to characterize the configurational preferences of the DNA base-pair steps within the context of their immediate neighbors, i.e., tetrameric context. Use of these knowledge-based potentials shows promise in accounting for known effects of sequence in long chain molecules, e.g., the degree of curvature reported in classic gel mobility studies and the recently reported sequence-dependent responses of supercoiled minicircles to nuclease cleavage.
Collapse
Affiliation(s)
- Robert T. Young
- Department of Chemistry & Chemical Biology, Center for Quantitative Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.T.Y.); (L.C.); (Z.O.W.); (B.M.C.)
| | - Luke Czapla
- Department of Chemistry & Chemical Biology, Center for Quantitative Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.T.Y.); (L.C.); (Z.O.W.); (B.M.C.)
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zoe O. Wefers
- Department of Chemistry & Chemical Biology, Center for Quantitative Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.T.Y.); (L.C.); (Z.O.W.); (B.M.C.)
| | - Benjamin M. Cohen
- Department of Chemistry & Chemical Biology, Center for Quantitative Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.T.Y.); (L.C.); (Z.O.W.); (B.M.C.)
| | - Wilma K. Olson
- Department of Chemistry & Chemical Biology, Center for Quantitative Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; (R.T.Y.); (L.C.); (Z.O.W.); (B.M.C.)
- Correspondence:
| |
Collapse
|
19
|
Black MH, Gradowski M, Pawłowski K, Tagliabracci VS. Methods for discovering catalytic activities for pseudokinases. Methods Enzymol 2022; 667:575-610. [PMID: 35525554 DOI: 10.1016/bs.mie.2022.03.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Pseudoenzymes resemble active enzymes, but lack key catalytic residues believed to be required for activity. Many pseudoenzymes appear to be inactive in conventional enzyme assays. However, an alternative explanation for their apparent lack of activity is that pseudoenzymes are being assayed for the wrong reaction. We have discovered several new protein kinase-like families which have revealed how different binding orientations of adenosine triphosphate (ATP) and active site residue migration can generate a novel reaction from a common kinase scaffold. These results have exposed the catalytic versatility of the protein kinase fold and suggest that atypical kinases and pseudokinases should be analyzed for alternative transferase activities. In this chapter, we discuss a general approach for bioinformatically identifying divergent or atypical members of an enzyme superfamily, then present an experimental approach to characterize their catalytic activity.
Collapse
Affiliation(s)
- Miles H Black
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Marcin Gradowski
- Department of Biochemistry and Microbiology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Krzysztof Pawłowski
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States; Department of Biochemistry and Microbiology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland.
| | - Vincent S Tagliabracci
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, United States; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, United States; Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, United States; Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, United States.
| |
Collapse
|
20
|
Kojima K, Sunagawa N, Yoshimi Y, Tryfona T, Samejima M, Dupree P, Igarashi K. Acetylated xylan degradation by glycoside hydrolase family 10 and 11 xylanases from the white-rot fungus <i>Phanerochaete chrysosporium</i>. J Appl Glycosci (1999) 2022; 69:35-43. [PMID: 35891899 PMCID: PMC9276525 DOI: 10.5458/jag.jag.jag-2021_0017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 02/28/2022] [Indexed: 11/22/2022] Open
Abstract
Endo-type xylanases are key enzymes in microbial xylanolytic systems, and xylanases belonging to glycoside hydrolase (GH) families 10 or 11 are the major enzymes degrading xylan in nature. These enzymes have typically been characterized using xylan prepared by alkaline extraction, which removes acetyl sidechains from the substrate, and thus the effect of acetyl groups on xylan degradation remains unclear. Here, we compare the ability of GH10 and 11 xylanases, PcXyn10A and PcXyn11B, from the white-rot basidiomycete Phanerochaete chrysosporium to degrade acetylated and deacetylated xylan from various plants. Product quantification revealed that PcXyn10A effectively degraded both acetylated xylan extracted from Arabidopsis thaliana and the deacetylated xylan obtained by alkaline treatment, generating xylooligosaccharides. In contrast, PcXyn11B showed limited activity towards acetyl xylan, but showed significantly increased activity after deacetylation of the xylan. Polysaccharide analysis using carbohydrate gel electrophoresis showed that PcXyn11B generated a broad range of products from native acetylated xylans extracted from birch wood and rice straw, including large residual xylooligosaccharides, while non-acetylated xylan from Japanese cedar was readily degraded into xylooligosaccharides. These results suggest that the degradability of native xylan by GH11 xylanases is highly dependent on the extent of acetyl group substitution. Analysis of 31 fungal genomes in the Carbohydrate-Active enZymes database indicated that the presence of GH11 xylanases is correlated to that of carbohydrate esterase (CE) family 1 acetyl xylan esterases (AXEs), while this is not the case for GH10 xylanases. These findings may imply co-evolution of GH11 xylanases and CE1 AXEs.
Collapse
Affiliation(s)
- Keisuke Kojima
- Department of Biomaterial Sciences, The University of Tokyo
| | - Naoki Sunagawa
- Department of Biomaterial Sciences, The University of Tokyo
| | | | | | | | - Paul Dupree
- Department of Biochemistry, University of Cambridge
| | | |
Collapse
|
21
|
Ludwiczak J, Winski A, Dunin-Horkawicz S. Localpdb- a Python package to manage protein structures and their annotations. Bioinformatics 2022; 38:2633-2635. [PMID: 35199148 PMCID: PMC9048648 DOI: 10.1093/bioinformatics/btac121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/07/2022] [Accepted: 02/21/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation The wealth of protein structures collected in the Protein Data Bank enabled large-scale studies of their function and evolution. Such studies, however, require the generation of customized datasets combining the structural data with miscellaneous accessory resources providing functional, taxonomic and other annotations. Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage frequently requires laborious surveying of various data sources and resolving inconsistencies between their versions. Results To address this problem, we developed localpdb, a versatile Python library for the management of protein structures and their annotations. The library features a flexible plugin system enabling seamless unification of the structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly customized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving large-scale protein structural analyses and machine learning. Availability and implementation localpdb is freely available at https://github.com/labstructbioinf/localpdb. Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.
Collapse
Affiliation(s)
- Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Aleksander Winski
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| |
Collapse
|
22
|
Longo LM, Kolodny R, McGlynn SE. Evidence for the emergence of β-trefoils by 'Peptide Budding' from an IgG-like β-sandwich. PLoS Comput Biol 2022; 18:e1009833. [PMID: 35157697 PMCID: PMC8880906 DOI: 10.1371/journal.pcbi.1009833] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 02/25/2022] [Accepted: 01/13/2022] [Indexed: 12/02/2022] Open
Abstract
As sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both 'β-trefoil bridging themes' (evolutionarily-related sequence segments) and 'β-trefoil-like motifs' (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself-namely, that it is a derived fold formed by 'budding' from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the evolution of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature's sewing table.
Collapse
Affiliation(s)
- Liam M. Longo
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | - Shawn E. McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
- Blue Marble Space Institute of Science, Seattle, Washington, United States of America
| |
Collapse
|
23
|
Structural dynamics in the evolution of a bilobed protein scaffold. Proc Natl Acad Sci U S A 2021; 118:2026165118. [PMID: 34845009 PMCID: PMC8694067 DOI: 10.1073/pnas.2026165118] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2021] [Indexed: 11/18/2022] Open
Abstract
Proteins conduct numerous complex biological functions by use of tailored structural dynamics. The molecular details of how these emerged from ancestral peptides remains mysterious. How does nature utilize the same repertoire of folds to diversify function? To shed light on this, we analyzed bilobed proteins with a common structural core, which is spread throughout the tree of life and is involved in diverse biological functions such as transcription, enzymatic catalysis, membrane transport, and signaling. We show here that the structural dynamics of the structural core differentiate predominantly via terminal additions during a long-period evolution. This diversifies substrate specificity and, ultimately, biological function. Novel biophysical tools allow the structural dynamics of proteins and the regulation of such dynamics by binding partners to be explored in unprecedented detail. Although this has provided critical insights into protein function, the means by which structural dynamics direct protein evolution remain poorly understood. Here, we investigated how proteins with a bilobed structure, composed of two related domains from the periplasmic-binding protein–like II domain family, have undergone divergent evolution, leading to adaptation of their structural dynamics. We performed a structural analysis on ∼600 bilobed proteins with a common primordial structural core, which we complemented with biophysical studies to explore the structural dynamics of selected examples by single-molecule Förster resonance energy transfer and Hydrogen–Deuterium exchange mass spectrometry. We show that evolutionary modifications of the structural core, largely at its termini, enable distinct structural dynamics, allowing the diversification of these proteins into transcription factors, enzymes, and extracytoplasmic transport-related proteins. Structural embellishments of the core created interdomain interactions that stabilized structural states, reshaping the active site geometry, and ultimately altered substrate specificity. Our findings reveal an as-yet-unrecognized mechanism for the emergence of functional promiscuity during long periods of evolution and are applicable to a large number of domain architectures.
Collapse
|
24
|
PDB-wide identification of physiological hetero-oligomeric assemblies based on conserved quaternary structure geometry. Structure 2021; 29:1303-1311.e3. [PMID: 34520740 PMCID: PMC8575123 DOI: 10.1016/j.str.2021.07.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 03/22/2021] [Accepted: 07/23/2021] [Indexed: 11/21/2022]
Abstract
An accurate understanding of biomolecular mechanisms and diseases requires information on protein quaternary structure (QS). A critical challenge in inferring QS information from crystallography data is distinguishing biological interfaces from fortuitous crystal-packing contacts. Here, we employ QS conservation across homologs to infer the biological relevance of hetero-oligomers. We compare the structures and compositions of hetero-oligomers, which allow us to annotate 7,810 complexes as physiologically relevant, 1,060 as likely errors, and 1,432 with comparative information on subunit stoichiometry and composition. Excluding immunoglobulins, these annotations encompass over 51% of hetero-oligomers in the PDB. We curate a dataset of 577 hetero-oligomeric complexes to benchmark these annotations, which reveals an accuracy >94%. When homology information is not available, we compare QS across repositories (PDB, PISA, and EPPIC) to derive confidence estimates. This work provides high-quality annotations along with a large benchmark dataset of hetero-assemblies.
Collapse
|
25
|
Alvarez-Carreño C, Penev PI, Petrov AS, Williams LD. Fold Evolution before LUCA: Common Ancestry of SH3 Domains and OB Domains. Mol Biol Evol 2021; 38:5134-5143. [PMID: 34383917 PMCID: PMC8557408 DOI: 10.1093/molbev/msab240] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SH3 and OB are the simplest, oldest, and most common protein domains within the translation system. SH3 and OB domains are β-barrels that are structurally similar but are topologically distinct. To transform an OB domain to a SH3 domain, β-strands must be permuted in a multistep and evolutionarily implausible mechanism. Here, we explored relationships between SH3 and OB domains of ribosomal proteins, initiation, and elongation factors using a combined sequence- and structure-based approach. We detect a common core of SH3 and OB domains, as a region of significant structure and sequence similarity. The common core contains four β-strands and a loop, but omits the fifth β-strand, which is variable and is absent from some OB and SH3 domain proteins. The structure of the common core immediately suggests a simple permutation mechanism for interconversion between SH3 and OB domains, which appear to share an ancestor. The OB domain was formed by duplication and adaptation of the SH3 domain core, or vice versa, in a simple and probable transformation. By employing the folding algorithm AlphaFold2, we demonstrated that an ancestral reconstruction of a permuted SH3 sequence folds into an OB structure, and an ancestral reconstruction of a permuted OB sequence folds into a SH3 structure. The tandem SH3 and OB domains in the universal ribosomal protein uL2 share a common ancestor, suggesting that the divergence of these two domains occurred before the last universal common ancestor.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
26
|
Youkharibache P. Topological and Structural Plasticity of the Single Ig Fold and the Double Ig Fold Present in CD19. Biomolecules 2021; 11:biom11091290. [PMID: 34572502 PMCID: PMC8470474 DOI: 10.3390/biom11091290] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/18/2021] [Accepted: 08/25/2021] [Indexed: 12/12/2022] Open
Abstract
The Ig fold has had a remarkable success in vertebrate evolution, with a presence in over 2% of human genes. The Ig fold is not just the elementary structural domain of antibodies and TCRs, it is also at the heart of a staggering 30% of immunologic cell surface receptors, making it a major orchestrator of cell–cell interactions. While BCRs, TCRs, and numerous Ig-based cell surface receptors form homo- or heterodimers on the same cell surface (in cis), many of them interface as ligand-receptors (checkpoints) on interacting cells (in trans) through their Ig domains. New Ig-Ig interfaces are still being discovered between Ig-based cell surface receptors, even in well-known families such as B7. What is largely ignored, however, is that the Ig fold itself is pseudosymmetric, a property that makes the Ig domain a versatile self-associative 3D structure and may, in part, explain its success in evolution, especially through its ability to bind in cis or in trans in the context of cell surface receptor–ligand interactions. In this paper, we review the Ig domains’ tertiary and quaternary pseudosymmetries, with particular attention to the newly identified double Ig fold in the solved CD19 molecular structure to highlight the underlying fundamental folding elements of Ig domains, i.e., Ig protodomains. This pseudosymmetric property of Ig domains gives us a decoding frame of reference to understand the fold, relate all Ig domain forms, single or double, and suggest new protein engineering avenues.
Collapse
|
27
|
Gruic-Sovulj I, Longo LM, Jabłońska J, Tawfik DS. The evolutionary history of the HUP domain. Crit Rev Biochem Mol Biol 2021; 57:1-15. [PMID: 34384295 DOI: 10.1080/10409238.2021.1957764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Among the enzyme lineages that undoubtedly emerged prior to the last universal common ancestor is the so-called HUP, which includes Class I aminoacyl tRNA synthetases (AARSs) as well as enzymes mediating NAD, FAD, and CoA biosynthesis. Here, we provide a detailed analysis of HUP evolution, from emergence to structural and functional diversification. The HUP is a nucleotide binding domain that uniquely catalyzes adenylation via the release of pyrophosphate. In contrast to other ancient nucleotide binding domains with the αβα sandwich architecture, such as P-loop NTPases, the HUP's most conserved feature is not phosphate binding, but rather ribose binding by backbone interactions to the tips of β1 and/or β4. Indeed, the HUP exhibits unusual evolutionary plasticity and, while ribose binding is conserved, the location and mode of binding to the base and phosphate moieties of the nucleotide, and to the substrate(s) reacting with it, have diverged with time, foremost along the emergence of the AARSs. The HUP also beautifully demonstrates how a well-packed scaffold combined with evolvable surface elements promotes evolutionary innovation. Finally, we offer a scenario for the emergence of the HUP from a seed βαβ fragment, and suggest that despite an identical architecture, the HUP and the Rossmann represent independent emergences.
Collapse
Affiliation(s)
- Ita Gruic-Sovulj
- Department of Chemistry, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Liam M Longo
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.,Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan
| | - Jagoda Jabłońska
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Dan S Tawfik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
28
|
The expanding world of protein kinase-like families in bacteria: forty families and counting. Biochem Soc Trans 2021; 48:1337-1352. [PMID: 32677675 DOI: 10.1042/bst20190712] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/24/2020] [Accepted: 06/29/2020] [Indexed: 12/14/2022]
Abstract
The protein kinase-like clan/superfamily is a large group of regulatory, signaling and biosynthetic enzymes that were historically regarded as typically eukaryotic proteins, although bacterial members have also been known for a long time. In this review, we explore the diversity of bacterial protein kinase like families, and discuss functional versatility of these enzymes, both the ones acting within the bacterial cell, and those acting within eukaryotic cells as effectors during infection. We focus on novel bacterial kinase-like families discovered in the last five years. A bioinformatics perspective is held here, hence sequence and structure comparison overview is presented, and also a comparison of genomic neighbourhoods of the families. We perform a phylum-level census of the families. Also, we discuss apparent pseudokinases that turned out to perform alternative catalytic functions by repurposing their atypical kinase-like active sites. We also highlight some 'unpopular' kinase-like families that await characterisation.
Collapse
|
29
|
Konagurthu AS, Subramanian R, Allison L, Abramson D, Stuckey PJ, Garcia de la Banda M, Lesk AM. Universal Architectural Concepts Underlying Protein Folding Patterns. Front Mol Biosci 2021; 7:612920. [PMID: 33996891 PMCID: PMC8120156 DOI: 10.3389/fmolb.2020.612920] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 12/16/2020] [Indexed: 11/17/2022] Open
Abstract
What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
Collapse
Affiliation(s)
- Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Ramanan Subramanian
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - David Abramson
- Research Computing Center, University of Queensland, Brisbane, QLD, Australia
| | - Peter J Stuckey
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia
| | - Maria Garcia de la Banda
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States.,MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
30
|
Skolnick J, Gao M. The role of local versus nonlocal physicochemical restraints in determining protein native structure. Curr Opin Struct Biol 2020; 68:1-8. [PMID: 33129066 DOI: 10.1016/j.sbi.2020.10.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/03/2020] [Accepted: 10/05/2020] [Indexed: 12/15/2022]
Abstract
The tertiary structure of a native protein is dictated by the interplay of local secondary structure propensities, hydrogen bonding, and tertiary interactions. It is argued that the space of known protein topologies covers all single domain folds and results from the compactness of the native structure and excluded volume. Protein compactness combined with the chirality of the protein's side chains also yields native-like Ramachandran plots. It is the many-body, tertiary interactions among residues that collectively select for the global structure that a particular protein sequence adopts. This explains why the recent advances in deep-learning approaches that predict protein side-chain contacts, the distance matrix between residues, and sequence alignments are successful. They succeed because they implicitly learned the many-body interactions among protein residues.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, GA 30332, United States.
| |
Collapse
|
31
|
Sucher J, Mbengue M, Dresen A, Barascud M, Didelon M, Barbacci A, Raffaele S. Phylotranscriptomics of the Pentapetalae Reveals Frequent Regulatory Variation in Plant Local Responses to the Fungal Pathogen Sclerotinia sclerotiorum. THE PLANT CELL 2020; 32:1820-1844. [PMID: 32265317 PMCID: PMC7268813 DOI: 10.1105/tpc.19.00806] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 03/16/2020] [Accepted: 03/30/2020] [Indexed: 05/13/2023]
Abstract
Quantitative disease resistance (QDR) is a conserved form of plant immunity that limits infections caused by a broad range of pathogens. QDR has a complex genetic determinism. The extent to which molecular components of the QDR response vary across plant species remains elusive. The fungal pathogen Sclerotinia sclerotiorum, causal agent of white mold diseases on hundreds of plant species, triggers QDR in host populations. To document the diversity of local responses to S. sclerotiorum at the molecular level, we analyzed the complete transcriptomes of six species spanning the Pentapetalae (Phaseolus vulgaris, Ricinus communis, Arabidopsis [Arabidopsis thaliana], Helianthus annuus, Solanum lycopersicum, and Beta vulgaris) inoculated with the same strain of S. sclerotiorum About one-third of plant transcriptomes responded locally to S. sclerotiorum, including a high proportion of broadly conserved genes showing frequent regulatory divergence at the interspecific level. Evolutionary inferences suggested a trend toward the acquisition of gene induction relatively recently in several lineages. Focusing on a group of ABCG transporters, we propose that exaptation by regulatory divergence contributed to the evolution of QDR. This evolutionary scenario has implications for understanding the QDR spectrum and durability. Our work provides resources for functional studies of gene regulation and QDR molecular mechanisms across the Pentapetalae.
Collapse
Affiliation(s)
- Justine Sucher
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Malick Mbengue
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Axel Dresen
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Marielle Barascud
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Marie Didelon
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Adelin Barbacci
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| | - Sylvain Raffaele
- Laboratoire des Interactions Plantes-Microorganismes (LIPM), Institut National de Recherche pour l'Agriculture, l'alimentation et l'Environement (INRAE) - Centre National de la Recherche Scientifique (CNRS), F31326 Castanet Tolosan, France
| |
Collapse
|
32
|
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. The Pfam protein families database in 2019. Nucleic Acids Res 2020; 47:D427-D432. [PMID: 30357350 PMCID: PMC6324024 DOI: 10.1093/nar/gky995] [Citation(s) in RCA: 2904] [Impact Index Per Article: 726.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 10/09/2018] [Indexed: 12/11/2022] Open
Abstract
The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.
Collapse
Affiliation(s)
- Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sean R Eddy
- HHMI, Harvard University, 16 Divinity Ave Cambridge, MA 02138 USA
| | - Aurélien Luciani
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alfredo Smart
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 17121 Solna, Sweden
| | - Layla Hirsh
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy.,Dept. of Engineering, Pontificia Universidad Católica del Perú 1801, San Miguel 15088, Lima, Perú
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
33
|
Pseudo-Symmetric Assembly of Protodomains as a Common Denominator in the Evolution of Polytopic Helical Membrane Proteins. J Mol Evol 2020; 88:319-344. [PMID: 32189026 PMCID: PMC7162841 DOI: 10.1007/s00239-020-09934-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 02/16/2020] [Indexed: 11/05/2022]
Abstract
The polytopic helical membrane proteome is dominated by proteins containing seven transmembrane helices (7TMHs). They cannot be grouped under a monolithic fold or superfold. However, a parallel structural analysis of folds around that magic number of seven in distinct protein superfamilies (SWEET, PnuC, TRIC, FocA, Aquaporin, GPCRs) reveals a common homology, not in their structural fold, but in their systematic pseudo-symmetric construction during their evolution. Our analysis leads to guiding principles of intragenic duplication and pseudo-symmetric assembly of ancestral transmembrane helical protodomains, consisting of 3 (or 4) helices. A parallel deconstruction and reconstruction of these domains provides a structural and mechanistic framework for their evolutionary paths. It highlights the conformational plasticity inherent to fold formation itself, the role of structural as well as functional constraints in shaping that fold, and the usefulness of protodomains as a tool to probe convergent vs divergent evolution. In the case of FocA vs. Aquaporin, this protodomain analysis sheds new light on their potential divergent evolution at the protodomain level followed by duplication and parallel evolution of the two folds. GPCR domains, whose function does not seem to require symmetry, nevertheless exhibit structural pseudo-symmetry. Their construction follows the same protodomain assembly as any other pseudo-symmetric protein suggesting their potential evolutionary origins. Interestingly, all the 6/7/8TMH pseudo-symmetric folds in this study also assemble as oligomeric forms in the membrane, emphasizing the role of symmetry in evolution, revealing self-assembly and co-evolution not only at the protodomain level but also at the domain level.
Collapse
|
34
|
Rosas‐Lemus M, Minasov G, Shuvalova L, Wawrzak Z, Kiryukhina O, Mih N, Jaroszewski L, Palsson B, Godzik A, Satchell KJF. Structure of galactarate dehydratase, a new fold in an enolase involved in bacterial fitness after antibiotic treatment. Protein Sci 2020; 29:711-722. [PMID: 31811683 PMCID: PMC7021002 DOI: 10.1002/pro.3796] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/24/2019] [Accepted: 12/04/2019] [Indexed: 11/06/2022]
Abstract
Galactarate dehydratase (GarD) is the first enzyme in the galactarate/glucarate pathway and catalyzes the dehydration of galactarate to 3-keto-5-dehydroxygalactarate. This protein is known to increase colonization fitness of intestinal pathogens in antibiotic-treated mice and to promote bacterial survival during stress. The galactarate/glucarate pathway is widespread in bacteria, but not in humans, and thus could be a target to develop new inhibitors for use in combination therapy to combat antibiotic resistance. The structure of almost all the enzymes of the galactarate/glucarate pathway were solved previously, except for GarD, for which only the structure of the N-terminal domain was determined previously. Herein, we report the first crystal structure of full-length GarD solved using a seleno-methoionine derivative revealing a new protein fold. The protein consists of three domains, each presenting a novel twist as compared to their distant homologs. GarD in the crystal structure forms dimers and each monomer consists of three domains. The N-terminal domain is comprised of a β-clip fold, connected to the second domain by a long unstructured linker. The second domain serves as a dimerization interface between two monomers. The C-terminal domain forms an unusual variant of a Rossmann fold with a crossover and is built around a seven-stranded parallel β-sheet supported by nine α-helices. A metal binding site in the C-terminal domain is occupied by Ca2+ . The activity of GarD was corroborated by the production of 5-keto-4-deoxy-D-glucarate under reducing conditions and in the presence of iron. Thus, GarD is an unusual enolase with a novel protein fold never previously seen in this class of enzymes.
Collapse
Affiliation(s)
- Monica Rosas‐Lemus
- Department of Microbiology‐ImmunologyNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
| | - George Minasov
- Department of Microbiology‐ImmunologyNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
| | - Ludmilla Shuvalova
- Department of Microbiology‐ImmunologyNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
| | - Zdzislaw Wawrzak
- Northwestern Synchrotron Research Center–LS‐CATNorthwestern UniversityArgonneIllinois
| | - Olga Kiryukhina
- Department of Microbiology‐ImmunologyNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
| | - Nathan Mih
- Department of BioengineeringUniversity of California San DiegoLa JollaCalifornia
| | - Lukasz Jaroszewski
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Department of Biomedical SciencesUniversity of California at RiversideRiversideCalifornia
| | - Bernhard Palsson
- Department of BioengineeringUniversity of California San DiegoLa JollaCalifornia
- Systems Biology Center for Antibiotic ResistanceUniversity of California San DiegoLa JollaCalifornia
| | - Adam Godzik
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Department of Biomedical SciencesUniversity of California at RiversideRiversideCalifornia
| | - Karla J. F. Satchell
- Department of Microbiology‐ImmunologyNorthwestern University, Feinberg School of MedicineChicagoIllinois
- Center for Structural Genomics of Infectious DiseasesNorthwestern University, Feinberg School of MedicineChicagoIllinois
| |
Collapse
|
35
|
Medvedev KE, Kinch LN, Schaeffer RD, Grishin NV. Functional analysis of Rossmann-like domains reveals convergent evolution of topology and reaction pathways. PLoS Comput Biol 2019; 15:e1007569. [PMID: 31869345 PMCID: PMC6957218 DOI: 10.1371/journal.pcbi.1007569] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 01/13/2020] [Accepted: 11/26/2019] [Indexed: 12/18/2022] Open
Abstract
Rossmann folds are ancient, frequently diverged domains found in many biological reaction pathways where they have adapted for different functions. Consequently, discernment and classification of their homologous relations and function can be complicated. We define a minimal Rossmann-like structure motif (RLM) that corresponds for the common core of known Rossmann domains and use this motif to identify all RLM domains in the Protein Data Bank (PDB), thus finding they constitute about 20% of all known 3D structures. The Evolutionary Classification of protein structure Domains (ECOD) classifies RLM domains in a number of groups that lack evidence for homology (X-groups), which suggests that they could have evolved independently multiple times. Closely related, homologous RLM enzyme families can diverge to bind different ligands using similar binding sites and to catalyze different reactions. Conversely, non-homologous RLM domains can converge to catalyze the same reactions or to bind the same ligand with alternate binding modes. We discuss a special case of such convergent evolution that is relevant to the polypharmacology paradigm, wherein the same drug (methotrexate) binds to multiple non-homologous RLM drug targets with different topologies. Finally, assigning proteins with RLM domain to the Enzyme Commission classification suggest that RLM enzymes function mainly in metabolism (and comprise 38% of reference metabolic pathways) and are overrepresented in extant pathways that represent ancient biosynthetic routes such as nucleotide metabolism, energy metabolism, and metabolism of amino acids. In fact, RLM enzymes take part in five out of eight enzymatic reactions of the Wood-Ljungdahl metabolic pathway thought to be used by the last universal common ancestor (LUCA). The prevalence of RLM domains in this ancient metabolism might explain their wide distribution among enzymes.
Collapse
Affiliation(s)
- Kirill E. Medvedev
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - R. Dustin Schaeffer
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
36
|
Guo J, Quensen JF, Sun Y, Wang Q, Brown CT, Cole JR, Tiedje JM. Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. Front Genet 2019; 10:957. [PMID: 31749830 PMCID: PMC6843070 DOI: 10.3389/fgene.2019.00957] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 09/09/2019] [Indexed: 12/28/2022] Open
Abstract
Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.
Collapse
Affiliation(s)
- Jiarong Guo
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - John F. Quensen
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - Yanni Sun
- Department of Electronical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Qiong Wang
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, Davis, CA, United States
| | - James R. Cole
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| | - James M. Tiedje
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
37
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
38
|
Liao Y, Schaeffer RD, Pei J, Grishin NV. A sequence family database built on ECOD structural domains. Bioinformatics 2019; 34:2997-3003. [PMID: 29659718 DOI: 10.1093/bioinformatics/bty214] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 04/03/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation The ECOD database classifies protein domains based on their evolutionary relationships, considering both remote and close homology. The family group in ECOD provides classification of domains that are closely related to each other based on sequence similarity. Due to different perspectives on domain definition, direct application of existing sequence domain databases, such as Pfam, to ECOD struggles with several shortcomings. Results We created multiple sequence alignments and profiles from ECOD domains with the help of structural information in alignment building and boundary delineation. We validated the alignment quality by scoring structure superposition to demonstrate that they are comparable to curated seed alignments in Pfam. Comparison to Pfam and CDD reveals that 27 and 16% of ECOD families are new, but they are also dominated by small families, likely because of the sampling bias from the PDB database. There are 35 and 48% of families whose boundaries are modified comparing to counterparts in Pfam and CDD, respectively. Availability and implementation The new families are now integrated in the ECOD website. The aggregate HMMER profile library and alignment are available for download on ECOD website (http://prodata.swmed.edu/ecod). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuxing Liao
- Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - R Dustin Schaeffer
- Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA.,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jimin Pei
- Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA.,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Nick V Grishin
- Department of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, USA.,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
39
|
Greener JG, Kandathil SM, Jones DT. Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat Commun 2019; 10:3977. [PMID: 31484923 PMCID: PMC6726615 DOI: 10.1038/s41467-019-11994-0] [Citation(s) in RCA: 115] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/14/2019] [Indexed: 01/30/2023] Open
Abstract
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
| |
Collapse
|
40
|
Identification of functional signatures in the metabolism of the three cellular domains of life. PLoS One 2019; 14:e0217083. [PMID: 31136618 PMCID: PMC6538242 DOI: 10.1371/journal.pone.0217083] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 05/04/2019] [Indexed: 11/19/2022] Open
Abstract
In order to identify common and specific enzymatic activities associated with the metabolism of the three cellular domains of life, the conservation and variations between the enzyme contents of Bacteria, Archaea, and Eukarya organisms were evaluated. To this end, the content of enzymes belonging to a particular pathway and their abundance and distribution in 1507 organisms that have been annotated and deposited in the KEGG database were assessed. In addition, we evaluated the consecutive enzymatic reaction pairs obtained from metabolic pathway reactions and transformed into sequences of enzymatic reactions, with catalytic activities encoded in the Enzyme Commission numbers, which are linked by a substrate. Both analyses are complementary: the first considers individual reactions associated with each organism and metabolic map, and the second evaluates the functional associations between pairs of consecutive reactions. From these comparisons, we found a set of five enzymatic reactions that were widely distributed in all the organisms and considered here as universal to Bacteria, Archaea, and Eukarya; whereas 132 pairs out of 3151 reactions were identified as significant, only 5 of them were found to be widely distributed in all the taxonomic divisions. However, these universal reactions are not widely distributed along the metabolic maps, suggesting their dispensability to all metabolic processes. Finally, we found that universal reactions are also associated with ancestral domains, such as those related to phosphorus-containing groups with a phosphate group as acceptor or those related to the ribulose-phosphate binding barrel, triosephosphate isomerase, and D-ribose-5-phosphate isomerase (RpiA) lid domain, among others. Therefore, we consider that this analysis provides clues about the functional constraints associated with the repertoire of enzymatic functions per organism.
Collapse
|
41
|
A global map of the protein shape universe. PLoS Comput Biol 2019; 15:e1006969. [PMID: 30978181 PMCID: PMC6481876 DOI: 10.1371/journal.pcbi.1006969] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 04/24/2019] [Accepted: 03/20/2019] [Indexed: 11/19/2022] Open
Abstract
Proteins are involved in almost all functions in a living cell, and functions of proteins are realized by their tertiary structures. Obtaining a global perspective of the variety and distribution of protein structures lays a foundation for our understanding of the building principle of protein structures. In light of the rapid accumulation of low-resolution structure data from electron tomography and cryo-electron microscopy, here we map and classify three-dimensional (3D) surface shapes of proteins into a similarity space. Surface shapes of proteins were represented with 3D Zernike descriptors, mathematical moment-based invariants, which have previously been demonstrated effective for biomolecular structure similarity search. In addition to single chains of proteins, we have also analyzed the shape space occupied by protein complexes. From the mapping, we have obtained various new insights into the relationship between shapes, main-chain folds, and complex formation. The unique view obtained from shape mapping opens up new ways to understand design principles, functions, and evolution of proteins. Proteins are the major molecules involved in almost all cellular processes. In this work, we present a novel mapping of protein shapes that represents the variety and the similarities of 3D shapes of proteins and their assemblies. This mapping provides various novel insights into protein shapes including determinant factors of protein 3D shapes, which enhance our understanding of the design principles of protein shapes. The mapping will also be a valuable resource for artificial protein design as well as references for classifying medium- to low-resolution protein structure images of determined by cryo-electron microscopy and tomography.
Collapse
|
42
|
Rigden DJ, Fernández X. The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection. Nucleic Acids Res 2019; 47:D1-D7. [PMID: 30626175 PMCID: PMC6323895 DOI: 10.1093/nar/gky1267] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The 2019 Nucleic Acids Research (NAR) Database Issue contains 168 papers spanning molecular biology. Among them, 64 are new and another 92 are updates describing resources that appeared in the Issue previously. The remaining 12 are updates on databases most recently published elsewhere. This Issue contains two Breakthrough articles, on the Virtual Metabolic Human (VMH) database which links human and gut microbiota metabolism with diet and disease, and Vibrism DB, a database of mouse brain anatomy and gene (co-)expression with sophisticated visualization and session sharing. Major returning nucleic acid databases include RNAcentral, miRBase and LncRNA2Target. Protein sequence databases include UniProtKB, InterPro and Pfam, while wwPDB and RCSB cover protein structure. STRING and KEGG update in the section on metabolism and pathways. Microbial genomes are covered by IMG/M and resources for human and model organism genomics include Ensembl, UCSC Genome Browser, GENCODE and Flybase. Genomic variation and disease are well-covered by GWAS Catalog, PopHumanScan, OMIM and COSMIC, CADD being another major newcomer. Major new proteomics resources reporting here include iProX and jPOSTdb. The entire database issue is freely available online on the NAR website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 506 entries, adding 66 new resources and eliminating 147 discontinued URLs, bringing the current total to 1613 databases. It is available at http://www.oxfordjournals.org/nar/database/c.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
43
|
MacCarthy E, Perry D, Kc DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol 2019; 1958:15-45. [PMID: 30945212 DOI: 10.1007/978-1-4939-9161-7_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Due to the advancement in various sequencing technologies, the gap between the number of protein sequences and the number of experimental protein structures is ever increasing. Community-wide initiatives like CASP have resulted in considerable efforts in the development of computational methods to accurately model protein structures from sequences. Sequence-based prediction of super-secondary structure has direct application in protein structure prediction, and there have been significant efforts in the prediction of super-secondary structure in the last decade. In this chapter, we first introduce the protein structure prediction problem and highlight some of the important progress in the field of protein structure prediction. Next, we discuss recent methods for the prediction of super-secondary structures. Finally, we discuss applications of super-secondary structure prediction in structure prediction/analysis of proteins. We also discuss prediction of protein structures that are composed of simple super-secondary structure repeats and protein structures that are composed of complex super-secondary structure repeats. Finally, we also discuss the recent trends in the field.
Collapse
Affiliation(s)
- Elijah MacCarthy
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Derrick Perry
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
| |
Collapse
|
44
|
White JT, Li J, Grasso E, Wrabl JO, Hilser VJ. Ensemble allosteric model: energetic frustration within the intrinsically disordered glucocorticoid receptor. Philos Trans R Soc Lond B Biol Sci 2018; 373:20170175. [PMID: 29735729 PMCID: PMC5941170 DOI: 10.1098/rstb.2017.0175] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2018] [Indexed: 01/21/2023] Open
Abstract
Allostery is an important regulatory phenomenon enabling precise control of biological function. Initial understanding of allostery was gained from seminal work on conformational changes exhibited by structured proteins. Within the last decade, protein allostery has also been demonstrated to occur within intrinsically disordered proteins. This emerging concept of disorder-mediated allostery can be usefully understood in the context of a thermodynamic ensemble. The advantage of this ensemble allosteric model is that it unifies the explanations of allostery occurring within both structured and disordered proteins. One central finding from this model is that energetic coupling, the transmission of a signal between separate regions (or domains) of a protein, is maximized when one or more domains are disordered. This is due to a disorder-order transition that contributes additional coupling energy to the allosteric system through formation of a molecular interaction surface or interface. A second key finding is that multiple interfaces may constructively or destructively interfere with each other, resulting in a new form of allosteric regulation called 'energetic frustration'. Articulating protein allostery in terms of the thermodynamic ensemble permits formulation of experimentally testable hypotheses which can increase fundamental understanding and direct drug-design efforts. These ideas are illustrated here with the specific case of human glucocorticoid receptor, a medically important multi-domain allosteric protein that contains both structured and disordered regions and exemplifies 'energetic frustration'.This article is part of a discussion meeting issue 'Allostery and molecular machines'.
Collapse
Affiliation(s)
- Jordan T White
- Department of Biology, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Jing Li
- Department of Biology, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
- Thomas C. Jenkins Department of Biophysics, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Emily Grasso
- Department of Biology, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
- Thomas C. Jenkins Department of Biophysics, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - James O Wrabl
- Department of Biology, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | - Vincent J Hilser
- Department of Biology, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
- Thomas C. Jenkins Department of Biophysics, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| |
Collapse
|
45
|
Yang Z, Tsui SKW. Functional Annotation of Proteins Encoded by the Minimal Bacterial Genome Based on Secondary Structure Element Alignment. J Proteome Res 2018; 17:2511-2520. [PMID: 29757649 DOI: 10.1021/acs.jproteome.8b00262] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In synthetic biology, one of the key focuses is building a minimal artificial cell which can provide basic chassis for functional study. Recently, the J. Craig Venter Institute published the latest version of the minimal bacterial genome JCVI-syn3.0, which only encoded 438 essential proteins. However, among them functions of 149 proteins remain unknown because of the lack of effective annotation method. Here, we report a secondary structure element alignment method called SSEalign based on an effective training data set extracting from various bacterial genomes. The experimentally validated homologous genes in different species were selected as training positives, while unrelated genes in different species were selected as training negatives. Moreover, SSEalign used a set of well-defined basic alignment elements with the backtracking line search algorithm to derive the best parameters for accurate prediction. Experimental results showed that SSEalign achieved 88.2% test accuracy, which is better than the existing prediction methods. SSEalign was subsequently applied to identify the functions of those unannotated proteins in the latest published minimal bacteria genome JCVI-syn3.0. Results indicated that at least 136 proteins out of 149 unannotated proteins in the JCVI-syn3.0 genome could be annotated by SSEalign. Our method is effective for the identification of protein homology in JCVI-syn3.0 and can be used to annotate those hypothetical proteins in other bacterial genomes.
Collapse
Affiliation(s)
- Zhiyuan Yang
- College of Life Information Science & Instrument Engineering , Hangzhou Dianzi University , Hangzhou 310018 , China.,School of Biomedical Sciences , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Hong Kong Bioinformatics Centre , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong
| | - Stephen Kwok-Wing Tsui
- School of Biomedical Sciences , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Hong Kong Bioinformatics Centre , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.,Centre for Microbial Genomics and Proteomics , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong
| |
Collapse
|
46
|
Liao Q, Li S, Siu SWI, Yang B, Huang C, Chan JYW, Morlighem JÉRL, Wong CTT, Rádis-Baptista G, Lee SMY. Novel Kunitz-like Peptides Discovered in the Zoanthid Palythoa caribaeorum through Transcriptome Sequencing. J Proteome Res 2018; 17:891-902. [PMID: 29285938 DOI: 10.1021/acs.jproteome.7b00686] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Palythoa caribaeorum (class Anthozoa) is a zoanthid that together jellyfishes, hydra, and sea anemones, which are venomous and predatory, belongs to the Phyllum Cnidaria. The distinguished feature in these marine animals is the cnidocytes in the body tissues, responsible for toxin production and injection that are used majorly for prey capture and defense. With exception for other anthozoans, the toxin cocktails of zoanthids have been scarcely studied and are poorly known. Here, on the basis of the analysis of P. caribaeorum transcriptome, numerous predicted venom-featured polypeptides were identified including allergens, neurotoxins, membrane-active, and Kunitz-like peptides (PcKuz). The three predicted PcKuz isotoxins (1-3) were selected for functional studies. Through computational processing comprising structural phylogenetic analysis, molecular docking, and dynamics simulation, PcKuz3 was shown to be a potential voltage gated potassium-channel inhibitor. PcKuz3 fitted well as new functional Kunitz-type toxins with strong antilocomotor activity as in vivo assessed in zebrafish larvae, with weak inhibitory effect toward proteases, as evaluated in vitro. Notably, PcKuz3 can suppress, at low concentration, the 6-OHDA-induced neurotoxicity on the locomotive behavior of zebrafish, which indicated PcKuz3 may have a neuroprotective effect. Taken together, PcKuz3 figures as a novel neurotoxin structure, which differs from known homologous peptides expressed in sea anemone. Moreover, the novel PcKuz3 provides an insightful hint for biodrug development for prospective neurodegenerative disease treatment.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Jean-Étienne R L Morlighem
- Laboratory of Biochemistry and Biotechnology, Institute for Marine Sciences, Federal University of Ceará , Fortaleza 60020-181, Brazil
| | | | - Gandhi Rádis-Baptista
- Laboratory of Biochemistry and Biotechnology, Institute for Marine Sciences, Federal University of Ceará , Fortaleza 60020-181, Brazil
| | | |
Collapse
|
47
|
Zimmermann L, Stephens A, Nam SZ, Rau D, Kübler J, Lozajic M, Gabler F, Söding J, Lupas AN, Alva V. A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. J Mol Biol 2017; 430:2237-2243. [PMID: 29258817 DOI: 10.1016/j.jmb.2017.12.007] [Citation(s) in RCA: 1554] [Impact Index Per Article: 222.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2017] [Revised: 12/10/2017] [Accepted: 12/11/2017] [Indexed: 12/12/2022]
Abstract
The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) is a free, one-stop web service for protein bioinformatic analysis. It currently offers 34 interconnected external and in-house tools, whose functionality covers sequence similarity searching, alignment construction, detection of sequence features, structure prediction, and sequence classification. This breadth has made the Toolkit an important resource for experimental biology and for teaching bioinformatic inquiry. Recently, we replaced the first version of the Toolkit, which was released in 2005 and had served around 2.5 million queries, with an entirely new version, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching. For instance, our popular remote homology detection server, HHpred, now allows pairwise comparison of two sequences or alignments and offers additional profile HMMs for several model organisms and domain databases. Here, we introduce the new version of our Toolkit and its application to the analysis of proteins.
Collapse
Affiliation(s)
- Lukas Zimmermann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Andrew Stephens
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Seung-Zin Nam
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - David Rau
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Jonas Kübler
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Marko Lozajic
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Felix Gabler
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany
| | - Johannes Söding
- Group for Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen D-37077, Germany
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany.
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen D-72076, Germany.
| |
Collapse
|
48
|
Structure-based prediction of ligand-protein interactions on a genome-wide scale. Proc Natl Acad Sci U S A 2017; 114:13685-13690. [PMID: 29229851 DOI: 10.1073/pnas.1705381114] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We report a template-based method, LT-scanner, which scans the human proteome using protein structural alignment to identify proteins that are likely to bind ligands that are present in experimentally determined complexes. A scoring function that rapidly accounts for binding site similarities between the template and the proteins being scanned is a crucial feature of the method. The overall approach is first tested based on its ability to predict the residues on the surface of a protein that are likely to bind small-molecule ligands. The algorithm that we present, LBias, is shown to compare very favorably to existing algorithms for binding site residue prediction. LT-scanner's performance is evaluated based on its ability to identify known targets of Food and Drug Administration (FDA)-approved drugs and it too proves to be highly effective. The specificity of the scoring function that we use is demonstrated by the ability of LT-scanner to identify the known targets of FDA-approved kinase inhibitors based on templates involving other kinases. Combining sequence with structural information further improves LT-scanner performance. The approach we describe is extendable to the more general problem of identifying binding partners of known ligands even if they do not appear in a structurally determined complex, although this will require the integration of methods that combine protein structure and chemical compound databases.
Collapse
|
49
|
Zhou Q, Wang A, Duan R, Yan J, Zhao G, Nevo E, Chen G. Comparative transcriptome profile of the leaf elongation zone of wild barley (Hordeum spontaneum) eibi1 mutant and its isogenic wild type. Genet Mol Biol 2017; 40:834-843. [PMID: 29064514 PMCID: PMC5738607 DOI: 10.1590/1678-4685-gmb-2016-0321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 08/13/2017] [Indexed: 11/21/2022] Open
Abstract
The naturally occurring wild barley mutant eibi1/hvabcg31
suffers from severe water loss due to the permeable leaf cuticle.
Eibi1/HvABCG31 encodes a full ATP-binding cassette (ABC)
transporter, HvABCG31, playing a role in cutin deposition in the elongation zone
of growing barley leaves. The eibi1 allele has pleiotropic
effects on the appearance of leaves, plant stature, fertility, spike and grain
size, and rate of germination. Comparative transcriptome profile of the leaf
elongation zone of the eibi1 mutant as well as its isogenic
wild type showed that various pathogenesis-related genes were up-regulated in
the eibi1 mutant. The known cuticle-related genes that we
analyzed did not show significant expression difference between the mutant and
wild type. These results suggest that the pleiotropic effects may be a
compensatory consequence of the activation of defense genes in the
eibi1 mutation. Furthermore, we were able to find the
mutation of the eibi1/hvabcg31 allele by comparing transcript
sequences, which indicated that the RNA-Seq is useful not only for researches on
general molecular mechanism but also for the identification of possible mutant
genes.
Collapse
Affiliation(s)
- Qin Zhou
- Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Aidong Wang
- Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China
| | - Ruijun Duan
- College of Eco-Environmental Engineering, Qinghai University, Xining, Qinghai, China
| | - Jun Yan
- School of Pharmacy and Bioengineering, Chengdu University, Chengdu, Sichuan. China
| | - Gang Zhao
- School of Pharmacy and Bioengineering, Chengdu University, Chengdu, Sichuan. China
| | - Eviatar Nevo
- Institute of Evolution, University of Haifa, Haifa Israel
| | - Guoxiong Chen
- Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China
| |
Collapse
|
50
|
Lupas AN, Alva V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded proteins. J Struct Biol 2017; 198:74-81. [DOI: 10.1016/j.jsb.2017.04.007] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 04/23/2017] [Accepted: 04/24/2017] [Indexed: 11/16/2022]
|