1
|
Structural genomics and the Protein Data Bank. J Biol Chem 2021; 296:100747. [PMID: 33957120 PMCID: PMC8166929 DOI: 10.1016/j.jbc.2021.100747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/16/2021] [Accepted: 04/30/2021] [Indexed: 12/14/2022] Open
Abstract
The field of Structural Genomics arose over the last 3 decades to address a large and rapidly growing divergence between microbial genomic, functional, and structural data. Several international programs took advantage of the vast genomic sequence information and evaluated the feasibility of structure determination for expanded and newly discovered protein families. As a consequence, structural genomics has developed structure-determination pipelines and applied them to a wide range of novel, uncharacterized proteins, often from “microbial dark matter,” and later to proteins from human pathogens. Advances were especially needed in protein production and rapid de novo structure solution. The experimental three-dimensional models were promptly made public, facilitating structure determination of other members of the family and helping to understand their molecular and biochemical functions. Improvements in experimental methods and databases resulted in fast progress in molecular and structural biology. The Protein Data Bank structure repository played a central role in the coordination of structural genomics efforts and the structural biology community as a whole. It facilitated development of standards and validation tools essential for maintaining high quality of deposited structural data.
Collapse
|
2
|
Romero PR, Kobayashi N, Wedell JR, Baskaran K, Iwata T, Yokochi M, Maziuk D, Yao H, Fujiwara T, Kurusu G, Ulrich EL, Hoch JC, Markley JL. BioMagResBank (BMRB) as a Resource for Structural Biology. Methods Mol Biol 2020; 2112:187-218. [PMID: 32006287 DOI: 10.1007/978-1-0716-0270-6_14] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB), founded in 1988, serves as the archive for data generated by nuclear magnetic resonance (NMR) spectroscopy of biological systems. NMR spectroscopy is unique among biophysical approaches in its ability to provide a broad range of atomic and higher-level information relevant to the structural, dynamic, and chemical properties of biological macromolecules, as well as report on metabolite and natural product concentrations in complex mixtures and their chemical structures. BMRB became a core member of the Worldwide Protein Data Bank (wwPDB) in 2007, and the BMRB archive is now a core archive of the wwPDB. Currently, about 10% of the structures deposited into the PDB archive are based on NMR spectroscopy. BMRB stores experimental and derived data from biomolecular NMR studies. Newer BMRB biopolymer depositions are divided about evenly between those associated with structure determinations (atomic coordinates and supporting information archived in the PDB) and those reporting experimental information on molecular dynamics, conformational transitions, ligand binding, assigned chemical shifts, or other results from NMR spectroscopy. BMRB also provides resources for NMR studies of metabolites and other small molecules that are often macromolecular ligands and/or nonstandard residues. This chapter is directed to the structural biology community rather than the metabolomics and natural products community. Our goal is to describe various BMRB services offered to structural biology researchers and how they can be accessed and utilized. These services can be classified into four main groups: (1) data deposition, (2) data retrieval, (3) data analysis, and (4) services for NMR spectroscopists and software developers. The chapter also describes the NMR-STAR data format used by BMRB and the tools provided to facilitate its use. For programmers, BMRB offers an application programming interface (API) and libraries in the Python and R languages that enable users to develop their own BMRB-based tools for data analysis, visualization, and manipulation of NMR-STAR formatted files. BMRB also provides users with direct access tools through the NMRbox platform.
Collapse
Affiliation(s)
- Pedro R Romero
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Naohiro Kobayashi
- PDBj-BMRB, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Jonathan R Wedell
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Kumaran Baskaran
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Takeshi Iwata
- PDBj-BMRB, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Masashi Yokochi
- PDBj-BMRB, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Dimitri Maziuk
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Hongyang Yao
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Toshimichi Fujiwara
- PDBj-BMRB, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Genji Kurusu
- PDBj-BMRB, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Eldon L Ulrich
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
| | - Jeffrey C Hoch
- BMRB, Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, USA
| | - John L Markley
- BMRB, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
3
|
Malhotra AG, Singh S, Jha M, Pandey KM. A Parametric Targetability Evaluation Approach for Vitiligo Proteome Extracted through Integration of Gene Ontologies and Protein Interaction Topologies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1830-1842. [PMID: 29994537 DOI: 10.1109/tcbb.2018.2835459] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Vitiligo is a well-known skin disorder with complex etiology. Vitiligo pathogenesis is multifaceted with many ramifications. A computational systemic path was designed to first propose candidate disease proteins by merging properties from protein interaction networks and gene ontology terms. All in all, 109 proteins were identified and suggested to be involved in the onset of disease or its progression. Later, a composite approach was employed to prioritize vitiligo disease proteins by comparing and benchmarking the properties against standard target identification criteria. This includes sequence-based, structural, functional, essentiality, protein-protein interaction, vulnerability, secretability, assayability, and druggability information. The existing information was seamlessly integrated into efficient pipelines to propose a novel protocol for assessment of targetability of disease proteins. Using the online data resources and the scripting, an illustrative list of 68 potential drug targets was generated for vitiligo. While this list is broadly consistent with the research community's current interest in certain specific proteins, and suggests novel target candidates that may merit further study, it can still be modified to correspond to a user-specific environment, either by adjusting the weights for chosen criteria (i.e., a quantitative approach) or by changing the considered criteria (i.e., a qualitative approach).
Collapse
|
4
|
Sun Y, Du P, Lu X, Xie P, Qian Z, Fan S, Zhu Z. Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods. BIOMEDICAL OPTICS EXPRESS 2018; 9:2917-2929. [PMID: 29984075 PMCID: PMC6033555 DOI: 10.1364/boe.9.002917] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 05/02/2018] [Accepted: 05/27/2018] [Indexed: 05/04/2023]
Abstract
The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.
Collapse
Affiliation(s)
- Yiwen Sun
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong, Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Department of Biomedical, Engineering, School of Medicine, Shenzhen University, Shenzhen 518060, China
| | - Pengju Du
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong, Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Department of Biomedical, Engineering, School of Medicine, Shenzhen University, Shenzhen 518060, China
| | - Xingxing Lu
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong, Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Department of Biomedical, Engineering, School of Medicine, Shenzhen University, Shenzhen 518060, China
| | - Pengfei Xie
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong, Key Laboratory for Biomedical Measurements and Ultrasound Imaging, Department of Biomedical, Engineering, School of Medicine, Shenzhen University, Shenzhen 518060, China
| | - Zhengfang Qian
- College of Electronic Science and Technology, Shenzhen University, Shenzhen 518060, China
| | - Shuting Fan
- College of Electronic Science and Technology, Shenzhen University, Shenzhen 518060, China
- M013, School of Physics and Astrophysics, The University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| |
Collapse
|
5
|
Abstract
In this review, we describe how the interplay among science, technology and community interests contributed to the evolution of four structural biology data resources. We present the method by which data deposited by scientists are prepared for worldwide distribution, and argue that data archiving in a trusted repository must be an integral part of any scientific investigation.
Collapse
Affiliation(s)
- Helen M. Berman
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Catherine L. Lawson
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Brinda Vallat
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| | - Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Institute for Quantitative Biomedicine, Department of Chemistry and Chemical Biology, 174 Frelinghuysen Road, Piscataway New Jersey 08854
| |
Collapse
|
6
|
The impact of structural genomics: the first quindecennial. ACTA ACUST UNITED AC 2016; 17:1-16. [PMID: 26935210 DOI: 10.1007/s10969-016-9201-5] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 02/17/2016] [Indexed: 12/21/2022]
Abstract
The period 2000-2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives alone have produced over 2000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.
Collapse
|
7
|
Polyclonal Antibody Production for Membrane Proteins via Genetic Immunization. Sci Rep 2016; 6:21925. [PMID: 26908053 PMCID: PMC4764931 DOI: 10.1038/srep21925] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 02/02/2016] [Indexed: 01/08/2023] Open
Abstract
Antibodies are essential for structural determinations and functional studies of membrane proteins, but antibody generation is limited by the availability of properly-folded and purified antigen. We describe the first application of genetic immunization to a structurally diverse set of membrane proteins to show that immunization of mice with DNA alone produced antibodies against 71% (n = 17) of the bacterial and viral targets. Antibody production correlated with prior reports of target immunogenicity in host organisms, underscoring the efficiency of this DNA-gold micronanoplex approach. To generate each antigen for antibody characterization, we also developed a simple in vitro membrane protein expression and capture method. Antibody specificity was demonstrated upon identifying, for the first time, membrane-directed heterologous expression of the native sequences of the FopA and FTT1525 virulence determinants from the select agent Francisella tularensis SCHU S4. These approaches will accelerate future structural and functional investigations of therapeutically-relevant membrane proteins.
Collapse
|
8
|
McKay T, Hart K, Horn A, Kessler H, Dodge G, Bardhi K, Bardhi K, Mills JL, Bernstein HJ, Craig PA. Annotation of proteins of unknown function: initial enzyme results. ACTA ACUST UNITED AC 2015; 16:43-54. [PMID: 25630330 DOI: 10.1007/s10969-015-9194-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 01/16/2015] [Indexed: 01/21/2023]
Abstract
Working with a combination of ProMOL (a plugin for PyMOL that searches a library of enzymatic motifs for local structural homologs), BLAST and Pfam (servers that identify global sequence homologs), and Dali (a server that identifies global structural homologs), we have begun the process of assigning functional annotations to the approximately 3,500 structures in the Protein Data Bank that are currently classified as having "unknown function". Using a limited template library of 388 motifs, over 500 promising in silico matches have been identified by ProMOL, among which 65 exceptionally good matches have been identified. The characteristics of the exceptionally good matches are discussed.
Collapse
|
9
|
Abstract
A key reason three-dimensional (3-D) protein structures are annotated with supporting or derived information is to understand the molecular basis of protein function. To this end, protein structure annotation databases curate key facts and observations, based on community-accepted standards, about the ~100,000 3-D experimental protein structures to date. This review will introduce the primary structure repositories, databases, and value-added structural annotation databases, as well as the range of information they provide. The different levels of annotation data (primary vs. derived vs. inferred) and how they should all be considered accordingly will also be described.
Collapse
Affiliation(s)
- Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Helen M. Berman
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| |
Collapse
|
10
|
Berman HM, Gabanyi MJ, Groom CR, Johnson JE, Murshudov GN, Nicholls RA, Reddy V, Schwede T, Zimmerman MD, Westbrook J, Minor W. Data to knowledge: how to get meaning from your result. IUCRJ 2015; 2:45-58. [PMID: 25610627 PMCID: PMC4285880 DOI: 10.1107/s2052252514023306] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 10/22/2014] [Indexed: 05/19/2023]
Abstract
Structural and functional studies require the development of sophisticated 'Big Data' technologies and software to increase the knowledge derived and ensure reproducibility of the data. This paper presents summaries of the Structural Biology Knowledge Base, the VIPERdb Virus Structure Database, evaluation of homology modeling by the Protein Model Portal, the ProSMART tool for conformation-independent structure comparison, the LabDB 'super' laboratory information management system and the Cambridge Structural Database. These techniques and technologies represent important tools for the transformation of crystallographic data into knowledge and information, in an effort to address the problem of non-reproducibility of experimental results.
Collapse
Affiliation(s)
- Helen M. Berman
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Colin R. Groom
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England
| | - John E. Johnson
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - Garib N. Murshudov
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, England
| | - Robert A. Nicholls
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, England
| | - Vijay Reddy
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - Torsten Schwede
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland
- SIB-Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Matthew D. Zimmerman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
| | - John Westbrook
- Center for Integrative Proteomics Research, Department of Chemistry and Chemical Biology, Rutgers, State University of New Jersey, Piscataway, NJ 08854, USA
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
11
|
Guo X, Crawford JM. An atypical orphan carbohydrate-NRPS genomic island encodes a novel lytic transglycosylase. ACTA ACUST UNITED AC 2014; 21:1271-1277. [PMID: 25219963 DOI: 10.1016/j.chembiol.2014.07.025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 07/19/2014] [Accepted: 07/22/2014] [Indexed: 10/24/2022]
Abstract
Microbial genome sequencing platforms have produced a deluge of orphan biosynthetic pathways suspected of biosynthesizing new small molecules with pharmacological relevance. Genome synteny analysis provides an assessment of genomic island content, which is enriched in natural product gene clusters. Here we identified an atypical orphan carbohydrate-nonribosomal peptide synthetase genomic island in Photorhabdus luminescens using genome synteny analysis. Heterologous expression of the pathway led to the characterization of five oligosaccharide metabolites with lysozyme inhibitory activities. The oligosaccharides harbor a 1,6-anhydro-β-D-N-acetyl-glucosamine moiety, a rare structural feature for natural products. Gene deletion analysis and biochemical reconstruction of oligosaccharide production led to the discovery that a hypothetical protein in the pathway is a lytic transglycosylase responsible for bicyclic sugar formation. The example presented here supports the notion that targeting select genomic islands with reduced reliance on known protein homologies could enhance the discovery of new metabolic chemistry and biology.
Collapse
Affiliation(s)
- Xun Guo
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Chemical Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Jason M Crawford
- Department of Chemistry, Yale University, New Haven, CT 06520, USA; Chemical Biology Institute, Yale University, West Haven, CT 06516, USA; Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, CT 06510, USA.
| |
Collapse
|
12
|
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative. Proc Natl Acad Sci U S A 2014; 111:3733-8. [PMID: 24567391 DOI: 10.1073/pnas.1321614111] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Collapse
|
13
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
14
|
Pieper U, Webb BM, Dong GQ, Schneidman-Duhovny D, Fan H, Kim SJ, Khuri N, Spill YG, Weinkam P, Hammel M, Tainer JA, Nilges M, Sali A. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2013; 42:D336-46. [PMID: 24271400 PMCID: PMC3965011 DOI: 10.1093/nar/gkt1144] [Citation(s) in RCA: 216] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (http://salilab.org/allosmod), the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (http://salilab.org/allosmod-foxs), the FoXSDock server for protein–protein docking filtered by an SAXS profile (http://salilab.org/foxsdock), the SAXS Merge server for automatic merging of SAXS profiles (http://salilab.org/saxsmerge) and the Pose & Rank server for scoring protein–ligand complexes (http://salilab.org/poseandrank). In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, USA, Department of Pharmaceutical Chemistry, California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, USA, Graduate Group in Biophysics, University of California at San Francisco, CA 94158, USA, Structural Bioinformatics Unit, Structural Biology and Chemistry department, Institut Pasteur, 25 rue du Docteur Roux, 75015 Paris, France, Université Paris Diderot-Paris 7, école doctorale iViv, Paris Rive Gauche, 5 rue Thomas Mann, 75013 Paris, France, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA, Department of Molecular Biology, Skaggs Institute of Chemical Biology, The Scripps Research Institute, La Jolla, CA 92037, USA, Life Sciences Division, Department of Molecular Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Spitzer R, Cleves AE, Varela R, Jain AN. Protein function annotation by local binding site surface similarity. Proteins 2013; 82:679-94. [PMID: 24166661 DOI: 10.1002/prot.24450] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Revised: 10/02/2013] [Accepted: 10/10/2013] [Indexed: 11/06/2022]
Abstract
Hundreds of protein crystal structures exist for proteins whose function cannot be confidently determined from sequence similarity. Surflex-PSIM, a previously reported surface-based protein similarity algorithm, provides an alternative method for hypothesizing function for such proteins. The method now supports fully automatic binding site detection and is fast enough to screen comprehensive databases of protein binding sites. The binding site detection methodology was validated on apo/holo cognate protein pairs, correctly identifying 91% of ligand binding sites in holo structures and 88% in apo structures where corresponding sites existed. For correctly detected apo binding sites, the cognate holo site was the most similar binding site 87% of the time. PSIM was used to screen a set of proteins that had poorly characterized functions at the time of crystallization, but were later biochemically annotated. Using a fully automated protocol, this set of 8 proteins was screened against ∼60,000 ligand binding sites from the PDB. PSIM correctly identified functional matches that predated query protein biochemical annotation for five out of the eight query proteins. A panel of 12 currently unannotated proteins was also screened, resulting in a large number of statistically significant binding site matches, some of which suggest likely functions for the poorly characterized proteins.
Collapse
Affiliation(s)
- Russell Spitzer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, California
| | | | | | | |
Collapse
|
16
|
Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 2013; 10:42-9. [DOI: 10.1038/nchembio.1387] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022]
|
17
|
Seiler CY, Park JG, Sharma A, Hunter P, Surapaneni P, Sedillo C, Field J, Algar R, Price A, Steel J, Throop A, Fiacco M, LaBaer J. DNASU plasmid and PSI:Biology-Materials repositories: resources to accelerate biological research. Nucleic Acids Res 2013; 42:D1253-60. [PMID: 24225319 PMCID: PMC3964992 DOI: 10.1093/nar/gkt1060] [Citation(s) in RCA: 166] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The mission of the DNASU Plasmid Repository is to accelerate research by providing high-quality, annotated plasmid samples and online plasmid resources to the research community through the curated DNASU database, website and repository (http://dnasu.asu.edu or http://dnasu.org). The collection includes plasmids from grant-funded, high-throughput cloning projects performed in our laboratory, plasmids from external researchers, and large collections from consortia such as the ORFeome Collaboration and the NIGMS-funded Protein Structure Initiative: Biology (PSI:Biology). Through DNASU, researchers can search for and access detailed information about each plasmid such as the full length gene insert sequence, vector information, associated publications, and links to external resources that provide additional protein annotations and experimental protocols. Plasmids can be requested directly through the DNASU website. DNASU and the PSI:Biology-Materials Repositories were previously described in the 2010 NAR Database Issue (Cormier, C.Y., Mohr, S.E., Zuo, D., Hu, Y., Rolfs, A., Kramer, J., Taycher, E., Kelley, F., Fiacco, M., Turnbull, G. et al. (2010) Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community. Nucleic Acids Res., 38, D743-D749.). In this update we will describe the plasmid collection and highlight the new features in the website redesign, including new browse/search options, plasmid annotations and a dynamic vector mapping feature that was developed in collaboration with LabGenius. Overall, these plasmid resources continue to enable research with the goal of elucidating the role of proteins in both normal biological processes and disease.
Collapse
Affiliation(s)
- Catherine Y Seiler
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, 1001 S. McAllister Dr. Tempe, AZ 85287-6401, USA and LabGenius, 20-22 Bedford Row, London, WC1R 4JS, UK
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Alcántara R, Onwubiko J, Cao H, Matos PD, Cham JA, Jacobsen J, Holliday GL, Fischer JD, Rahman SA, Jassal B, Goujon M, Rowland F, Velankar S, López R, Overington JP, Kleywegt GJ, Hermjakob H, O'Donovan C, Martín MJ, Thornton JM, Steinbeck C. The EBI enzyme portal. Nucleic Acids Res 2012; 41:D773-80. [PMID: 23175605 PMCID: PMC3531056 DOI: 10.1093/nar/gks1112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The availability of comprehensive information about enzymes plays an important role in answering questions relevant to interdisciplinary fields such as biochemistry, enzymology, biofuels, bioengineering and drug discovery. At the EMBL European Bioinformatics Institute, we have developed an enzyme portal (http://www.ebi.ac.uk/enzymeportal) to provide this wealth of information on enzymes from multiple in-house resources addressing particular data classes: protein sequence and structure, reactions, pathways and small molecules. The fact that these data reside in separate databases makes information discovery cumbersome. The main goal of the portal is to simplify this process for end users.
Collapse
Affiliation(s)
- Rafael Alcántara
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|