1
|
Genomics-based strategies toward the identification of a Z-ISO carotenoid biosynthetic enzyme suitable for structural studies. Methods Enzymol 2022; 671:171-205. [PMID: 35878977 DOI: 10.1016/bs.mie.2021.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Over the past 20years, structural genomics efforts have proven enormously successful for the determination of integral membrane protein structures, particularly for those of prokaryotic origin. However, traditional genomic expansion screens have included up to hundreds of targets, necessitating the use of robotics and other automation not available to most laboratories. Moreover, such large-scale screens of eukaryotic targets are not easily performed at such a scale. To have broader appeal, traditional structural genomic approaches need to be modified and improved such that they are feasible for most laboratories and especially so for proteins from eukaryotic organisms. One such refinement, termed "microgenomic expansion," has been recently described. This approach improves the process of target selection by making target screening a two-step process, with a minimal number of targets tested at each step. Microgenomic expansion methods are applied here theoretically to a project that has the objective of acquiring a structure for the plant 15-cis-ζ-carotene isomerase, Z-ISO.
Collapse
|
2
|
Loch JI, Imiolczyk B, Sliwiak J, Wantuch A, Bejger M, Gilski M, Jaskolski M. Crystal structures of the elusive Rhizobium etli L-asparaginase reveal a peculiar active site. Nat Commun 2021; 12:6717. [PMID: 34795296 PMCID: PMC8602277 DOI: 10.1038/s41467-021-27105-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 11/01/2021] [Indexed: 12/04/2022] Open
Abstract
Rhizobium etli, a nitrogen-fixing bacterial symbiont of legume plants, encodes an essential L-asparaginase (ReAV) with no sequence homology to known enzymes with this activity. High-resolution crystal structures of ReAV show indeed a structurally distinct, dimeric enzyme, with some resemblance to glutaminases and β-lactamases. However, ReAV has no glutaminase or lactamase activity, and at pH 9 its allosteric asparaginase activity is relatively high, with Km for L-Asn at 4.2 mM and kcat of 438 s-1. The active site of ReAV, deduced from structural comparisons and confirmed by mutagenesis experiments, contains a highly specific Zn2+ binding site without a catalytic role. The extensive active site includes residues with unusual chemical properties. There are two Ser-Lys tandems, all connected through a network of H-bonds to the Zn center, and three tightly bound water molecules near Ser48, which clearly indicate the catalytic nucleophile.
Collapse
Affiliation(s)
- Joanna I Loch
- Department of Crystal Chemistry and Crystal Physics, Faculty of Chemistry, Jagiellonian University, Krakow, Poland
| | - Barbara Imiolczyk
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Joanna Sliwiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Anna Wantuch
- Department of Crystal Chemistry and Crystal Physics, Faculty of Chemistry, Jagiellonian University, Krakow, Poland
| | - Magdalena Bejger
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Miroslaw Gilski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
- Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, Poland
| | - Mariusz Jaskolski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
- Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, Poland.
| |
Collapse
|
3
|
Structural genomics and the Protein Data Bank. J Biol Chem 2021; 296:100747. [PMID: 33957120 PMCID: PMC8166929 DOI: 10.1016/j.jbc.2021.100747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/16/2021] [Accepted: 04/30/2021] [Indexed: 12/14/2022] Open
Abstract
The field of Structural Genomics arose over the last 3 decades to address a large and rapidly growing divergence between microbial genomic, functional, and structural data. Several international programs took advantage of the vast genomic sequence information and evaluated the feasibility of structure determination for expanded and newly discovered protein families. As a consequence, structural genomics has developed structure-determination pipelines and applied them to a wide range of novel, uncharacterized proteins, often from “microbial dark matter,” and later to proteins from human pathogens. Advances were especially needed in protein production and rapid de novo structure solution. The experimental three-dimensional models were promptly made public, facilitating structure determination of other members of the family and helping to understand their molecular and biochemical functions. Improvements in experimental methods and databases resulted in fast progress in molecular and structural biology. The Protein Data Bank structure repository played a central role in the coordination of structural genomics efforts and the structural biology community as a whole. It facilitated development of standards and validation tools essential for maintaining high quality of deposited structural data.
Collapse
|
4
|
Ortega C, Abreu C, Oppezzo P, Correa A. Overview of High-Throughput Cloning Methods for the Post-genomic Era. Methods Mol Biol 2019; 2025:3-32. [PMID: 31267446 DOI: 10.1007/978-1-4939-9624-7_1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The advent of new DNA sequencing technologies leads to a dramatic increase in the number of available genome sequences and therefore of target genes with potential for functional analysis. The insertion of these sequences into proper expression vectors requires a simple an efficient cloning method. In addition, when expressing a target protein, quite often it is necessary to evaluate different DNA constructs to achieve a soluble and homogeneous expression of the target with satisfactory yields. The development of new molecular methods made possible the cloning of a huge number of DNA sequences in a high-throughput manner, necessary for meeting the increasing demands for soluble protein expression and characterization. In this chapter several molecular methods suitable for high-throughput cloning are reviewed.
Collapse
Affiliation(s)
- Claudia Ortega
- Recombinant Protein Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
- Research Laboratory on Chronic Lymphocytic Leukemia, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Cecilia Abreu
- Recombinant Protein Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
- Molecular, Cellular and Animal Technology Program, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Pablo Oppezzo
- Recombinant Protein Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay
- Research Laboratory on Chronic Lymphocytic Leukemia, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Agustín Correa
- Recombinant Protein Unit, Institut Pasteur de Montevideo, Montevideo, Uruguay.
- Research Laboratory on Chronic Lymphocytic Leukemia, Institut Pasteur de Montevideo, Montevideo, Uruguay.
| |
Collapse
|
5
|
Abstract
The ProFunc web server is a tool for helping identify the function of a given protein whose 3D coordinates have been experimentally determined or homology modeled. It uses a cocktail of both sequence- and structure-based methods to identify matches to other proteins that may, in turn, suggest the query protein's most likely function. The server was originally developed to aid the worldwide structural genomics effort at the start of the millennium. It accepts a file containing the protein's 3D coordinates in PDB format, and, when processing is complete, sends an email containing a link to the password-protected result pages. The results include an at-a-glance summary, as well as separate pages containing more detailed analyses. The server can be found at: http://www.ebi.ac.uk/thornton-srv/databases/profunc .
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
6
|
Sousa FL, Parente DJ, Hessman JA, Chazelle A, Teichmann SA, Swint-Kruse L. Data on publications, structural analyses, and queries used to build and utilize the AlloRep database. Data Brief 2016; 8:948-57. [PMID: 27508249 PMCID: PMC4961497 DOI: 10.1016/j.dib.2016.07.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Revised: 06/22/2016] [Accepted: 07/04/2016] [Indexed: 01/08/2023] Open
Abstract
The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, “AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators” (Sousa et al., 2016) [1].
Collapse
Affiliation(s)
- Filipa L Sousa
- Institute of Molecular Evolution, Heinrich-Heine Universität Düsseldorf, Universitätstrasse 1, 40225 Düsseldorf, Germany
| | - Daniel J Parente
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Jacob A Hessman
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Allen Chazelle
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Sarah A Teichmann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Liskin Swint-Kruse
- The Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| |
Collapse
|
7
|
AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators. J Mol Biol 2015; 428:671-678. [PMID: 26410588 DOI: 10.1016/j.jmb.2015.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/04/2015] [Accepted: 09/17/2015] [Indexed: 11/20/2022]
Abstract
Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change.
Collapse
|
8
|
Bhattacharyya M, Upadhyay R, Vishveshwara S. Interaction signatures stabilizing the NAD(P)-binding Rossmann fold: a structure network approach. PLoS One 2012; 7:e51676. [PMID: 23284738 PMCID: PMC3524241 DOI: 10.1371/journal.pone.0051676] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Accepted: 11/05/2012] [Indexed: 11/19/2022] Open
Abstract
The fidelity of the folding pathways being encoded in the amino acid sequence is met with challenge in instances where proteins with no sequence homology, performing different functions and no apparent evolutionary linkage, adopt a similar fold. The problem stated otherwise is that a limited fold space is available to a repertoire of diverse sequences. The key question is what factors lead to the formation of a fold from diverse sequences. Here, with the NAD(P)-binding Rossmann fold domains as a case study and using the concepts of network theory, we have unveiled the consensus structural features that drive the formation of this fold. We have proposed a graph theoretic formalism to capture the structural details in terms of the conserved atomic interactions in global milieu, and hence extract the essential topological features from diverse sequences. A unified mathematical representation of the different structures together with a judicious concoction of several network parameters enabled us to probe into the structural features driving the adoption of the NAD(P)-binding Rossmann fold. The atomic interactions at key positions seem to be better conserved in proteins, as compared to the residues participating in these interactions. We propose a "spatial motif" and several "fold specific hot spots" that form the signature structural blueprints of the NAD(P)-binding Rossmann fold domain. Excellent agreement of our data with previous experimental and theoretical studies validates the robustness and validity of the approach. Additionally, comparison of our results with statistical coupling analysis (SCA) provides further support. The methodology proposed here is general and can be applied to similar problems of interest.
Collapse
Affiliation(s)
| | - Roopali Upadhyay
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | |
Collapse
|
9
|
Current challenges in genome annotation through structural biology and bioinformatics. Curr Opin Struct Biol 2012; 22:594-601. [PMID: 22884875 DOI: 10.1016/j.sbi.2012.07.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Revised: 06/29/2012] [Accepted: 07/09/2012] [Indexed: 01/25/2023]
Abstract
With the huge volume in genomic sequences being generated from high-throughout sequencing projects the requirement for providing accurate and detailed annotations of gene products has never been greater. It is proving to be a huge challenge for computational biologists to use as much information as possible from experimental data to provide annotations for genome data of unknown function. A central component to this process is to use experimentally determined structures, which provide a means to detect homology that is not discernable from just the sequence and permit the consequences of genomic variation to be realized at the molecular level. In particular, structures also form the basis of many bioinformatics methods for improving the detailed functional annotations of enzymes in combination with similarities in sequence and chemistry.
Collapse
|
10
|
Parisien M, Freed KF, Sosnick TR. On docking, scoring and assessing protein-DNA complexes in a rigid-body framework. PLoS One 2012; 7:e32647. [PMID: 22393431 PMCID: PMC3290582 DOI: 10.1371/journal.pone.0032647] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 01/28/2012] [Indexed: 01/20/2023] Open
Abstract
We consider the identification of interacting protein-nucleic acid partners using the rigid body docking method FTdock, which is systematic and exhaustive in the exploration of docking conformations. The accuracy of rigid body docking methods is tested using known protein-DNA complexes for which the docked and undocked structures are both available. Additional tests with large decoy sets probe the efficacy of two published statistically derived scoring functions that contain a huge number of parameters. In contrast, we demonstrate that state-of-the-art machine learning techniques can enormously reduce the number of parameters required, thereby identifying the relevant docking features using a miniscule fraction of the number of parameters in the prior works. The present machine learning study considers a 300 dimensional vector (dependent on only 15 parameters), termed the Chemical Context Profile (CCP), where each dimension reflects a specific type of protein amino acid-nucleic acid base interaction. The CCP is designed to capture the chemical complementarities of the interface and is well suited for machine learning techniques. Our objective function is the Chemical Context Discrepancy (CCD), which is defined as the angle between the native system's CCP vector and the decoy's vector and which serves as a substitute for the more commonly used root mean squared deviation (RMSD). We demonstrate that the CCP provides a useful scoring function when certain dimensions are properly weighted. Finally, we explore how the amino acids on a protein's surface can help guide DNA binding, first through long-range interactions, followed by direct contacts, according to specific preferences for either the major or minor grooves of the DNA.
Collapse
Affiliation(s)
- Marc Parisien
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Karl F. Freed
- Department of Chemistry, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- The James Frank Institute, University of Chicago, Chicago, Illinois, United States of America
| | - Tobin R. Sosnick
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago, Chicago, Illinois, United States of America
- Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
11
|
Shoemaker BA, Zhang D, Tyagi M, Thangudu RR, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR. IBIS (Inferred Biomolecular Interaction Server) reports, predicts and integrates multiple types of conserved interactions for proteins. Nucleic Acids Res 2011; 40:D834-40. [PMID: 22102591 PMCID: PMC3245142 DOI: 10.1093/nar/gkr997] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The server's webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein-ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.
Collapse
Affiliation(s)
- Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Stacy R, Begley DW, Phan I, Staker BL, Van Voorhis WC, Varani G, Buchko GW, Stewart LJ, Myler PJ. Structural genomics of infectious disease drug targets: the SSGCID. Acta Crystallogr Sect F Struct Biol Cryst Commun 2011; 67:979-84. [PMID: 21904037 PMCID: PMC3169389 DOI: 10.1107/s1744309111029204] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2011] [Accepted: 07/19/2011] [Indexed: 11/29/2022]
Abstract
The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium of researchers at Seattle BioMed, Emerald BioStructures, the University of Washington and Pacific Northwest National Laboratory that was established to apply structural genomics approaches to drug targets from infectious disease organisms. The SSGCID is currently funded over a five-year period by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of 400 proteins from a variety of Category A, B and C pathogens. Target selection engages the infectious disease research and drug-therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. The protein-expression systems, purified proteins, ligand screens and three-dimensional structures produced by SSGCID constitute a valuable resource for drug-discovery research, all of which is made freely available to the greater scientific community. This issue of Acta Crystallographica Section F, entirely devoted to the work of the SSGCID, covers the details of the high-throughput pipeline and presents a series of structures from a broad array of pathogenic organisms. Here, a background is provided on the structural genomics of infectious disease, the essential components of the SSGCID pipeline are discussed and a survey of progress to date is presented.
Collapse
Affiliation(s)
- Robin Stacy
- Seattle Structural Genomics Center for Infectious Disease, USA
- Seattle Biomedical Research Institute, 307 Westlake Avenue North, Suite 500, Seattle, WA 98109-5219, USA
| | - Darren W. Begley
- Seattle Structural Genomics Center for Infectious Disease, USA
- Emerald BioStructures, 7869 NE Day Road West, Bainbridge Island, WA 98110, USA
| | - Isabelle Phan
- Seattle Structural Genomics Center for Infectious Disease, USA
- Seattle Biomedical Research Institute, 307 Westlake Avenue North, Suite 500, Seattle, WA 98109-5219, USA
| | - Bart L. Staker
- Seattle Structural Genomics Center for Infectious Disease, USA
- Emerald BioStructures, 7869 NE Day Road West, Bainbridge Island, WA 98110, USA
| | - Wesley C. Van Voorhis
- Seattle Structural Genomics Center for Infectious Disease, USA
- Department of Medicine, Division of Allergy and Infectious Diseases, University of Washington, Box 357185, Seattle, WA 98195, USA
| | - Gabriele Varani
- Seattle Structural Genomics Center for Infectious Disease, USA
- Departments of Chemistry and Biochemistry, University of Washington, Box 351700, Seattle, WA 98185, USA
| | - Garry W. Buchko
- Seattle Structural Genomics Center for Infectious Disease, USA
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Lance J. Stewart
- Seattle Structural Genomics Center for Infectious Disease, USA
- Emerald BioStructures, 7869 NE Day Road West, Bainbridge Island, WA 98110, USA
| | - Peter J. Myler
- Seattle Structural Genomics Center for Infectious Disease, USA
- Seattle Biomedical Research Institute, 307 Westlake Avenue North, Suite 500, Seattle, WA 98109-5219, USA
- Departments of Global Health and Medical Education and Biomedical Informatics, University of Washington, Box 357238, Seattle, WA 98195, USA
| |
Collapse
|