1
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
2
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
3
|
Wilson IA, Stanfield RL. 50 Years of structural immunology. J Biol Chem 2021; 296:100745. [PMID: 33957119 PMCID: PMC8163984 DOI: 10.1016/j.jbc.2021.100745] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 03/24/2021] [Accepted: 04/30/2021] [Indexed: 12/12/2022] Open
Abstract
Fifty years ago, the first landmark structures of antibodies heralded the dawn of structural immunology. Momentum then started to build toward understanding how antibodies could recognize the vast universe of potential antigens and how antibody-combining sites could be tailored to engage antigens with high specificity and affinity through recombination of germline genes (V, D, J) and somatic mutation. Equivalent groundbreaking structures in the cellular immune system appeared some 15 to 20 years later and illustrated how processed protein antigens in the form of peptides are presented by MHC molecules to T cell receptors. Structures of antigen receptors in the innate immune system then explained their inherent specificity for particular microbial antigens including lipids, carbohydrates, nucleic acids, small molecules, and specific proteins. These two sides of the immune system act immediately (innate) to particular microbial antigens or evolve (adaptive) to attain high specificity and affinity to a much wider range of antigens. We also include examples of other key receptors in the immune system (cytokine receptors) that regulate immunity and inflammation. Furthermore, these antigen receptors use a limited set of protein folds to accomplish their various immunological roles. The other main players are the antigens themselves. We focus on surface glycoproteins in enveloped viruses including SARS-CoV-2 that enable entry and egress into host cells and are targets for the antibody response. This review covers what we have learned over the past half century about the structural basis of the immune response to microbial pathogens and how that information can be utilized to design vaccines and therapeutics.
Collapse
MESH Headings
- Adaptive Immunity
- Allergy and Immunology/history
- Animals
- Antibodies, Viral/chemistry
- Antibodies, Viral/genetics
- Antibodies, Viral/immunology
- Antibody Specificity
- Antigen Presentation
- Antigens, Viral/chemistry
- Antigens, Viral/genetics
- Antigens, Viral/immunology
- COVID-19/immunology
- COVID-19/virology
- Crystallography/history
- Crystallography/methods
- History, 20th Century
- History, 21st Century
- Humans
- Immunity, Innate
- Protein Folding
- Protein Interaction Domains and Motifs
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Receptors, Cytokine/chemistry
- Receptors, Cytokine/genetics
- Receptors, Cytokine/immunology
- SARS-CoV-2/immunology
- SARS-CoV-2/pathogenicity
- V(D)J Recombination
Collapse
Affiliation(s)
- Ian A Wilson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA; The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California, USA.
| | - Robyn L Stanfield
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| |
Collapse
|
4
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
5
|
Ferreira de Freitas R, Schapira M. A systematic analysis of atomic protein-ligand interactions in the PDB. MEDCHEMCOMM 2017; 8:1970-1981. [PMID: 29308120 PMCID: PMC5708362 DOI: 10.1039/c7md00381a] [Citation(s) in RCA: 232] [Impact Index Per Article: 33.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 09/15/2017] [Indexed: 12/20/2022]
Abstract
As the protein databank (PDB) recently passed the cap of 123 456 structures, it stands more than ever as an important resource not only to analyze structural features of specific biological systems, but also to study the prevalence of structural patterns observed in a large body of unrelated structures, that may reflect rules governing protein folding or molecular recognition. Here, we compiled a list of 11 016 unique structures of small-molecule ligands bound to proteins - 6444 of which have experimental binding affinity - representing 750 873 protein-ligand atomic interactions, and analyzed the frequency, geometry and impact of each interaction type. We find that hydrophobic interactions are generally enriched in high-efficiency ligands, but polar interactions are over-represented in fragment inhibitors. While most observations extracted from the PDB will be familiar to seasoned medicinal chemists, less expected findings, such as the high number of C-H···O hydrogen bonds or the relatively frequent amide-π stacking between the backbone amide of proteins and aromatic rings of ligands, uncover underused ligand design strategies.
Collapse
Affiliation(s)
| | - Matthieu Schapira
- Structural Genomics Consortium , University of Toronto , Toronto , ON M5G 1L7 , Canada .
- Department of Pharmacology and Toxicology , University of Toronto , Toronto , ON M5S 1A8 , Canada
| |
Collapse
|
6
|
Murthy T, Wang Y, Reynolds C, Boggon TJ. Automated Protein Crystallization Trials Using the Thermo Scientific Matrix Hydra II eDrop. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.jala.2007.04.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Obtaining suitable crystals for X-ray diffraction experiments is one of the rate-limiting steps in macromolecular crystallography. Robotic platforms that can increase throughput are highly desirable for crystal screening operations. The Thermo Scientific Matrix Hydra II eDrop is a liquid handling robot capable of dispensing sub-microliter and microliter volumes of solutions. Equipped with single-channel noncontact microdispenser, 96-channel syringe-based contact microdispenser and user-friendly ControlMate software, the Thermo Scientific Matrix Hydra II eDrop is appropriately designed for crystal screening experiments. The microdispensers are used in a single operational process to rapidly dispense sub-microliter volumes of liquids for automated crystallization plate setup. In this report, we discuss the design of the Thermo Scientific Matrix Hydra II eDrop. We also discuss an automated protocol for macromolecular crystallization trials that has successfully been used to produce crystals of a membrane protein, the rhomboid family intramembrane protease, GlpG. (JALA 2007;12:213–8)
Collapse
Affiliation(s)
- Tal Murthy
- Thermo Fisher Scientific Inc, Hudson, NH
| | | | | | | |
Collapse
|
7
|
Hu J, Han K, Li Y, Yang JY, Shen HB, Yu DJ. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM. Amino Acids 2016; 48:2533-2547. [DOI: 10.1007/s00726-016-2274-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 06/07/2016] [Indexed: 12/12/2022]
|
8
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
9
|
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions. Bioinformatics 2015; 31:3429-36. [DOI: 10.1093/bioinformatics/btv345] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open
|
10
|
Molloy K, Van MJ, Barbara D, Shehu A. Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space. BMC Bioinformatics 2014; 15 Suppl 8:S4. [PMID: 25080993 PMCID: PMC4120149 DOI: 10.1186/1471-2105-15-s8-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.
Collapse
Affiliation(s)
- Kevin Molloy
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - M Jennifer Van
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Daniel Barbara
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- Department of Bioengineering, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
- School of Systems Biology, George Mason University, 4400 University Drive, 22030 Fairfax, VA, USA
| |
Collapse
|
11
|
|
12
|
Dynamical Aspects of Biomacromolecular Multi-resolution Modelling Using the UltraScan Solution Modeler (US-SOMO) Suite. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-94-017-8550-1_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
13
|
Brookes E, Pérez J, Cardinali B, Profumo A, Vachette P, Rocco M. Fibrinogen species as resolved by HPLC-SAXS data processing within the UltraScan Solution Modeler ( US-SOMO) enhanced SAS module. J Appl Crystallogr 2013; 46:1823-1833. [PMID: 24282333 PMCID: PMC3831300 DOI: 10.1107/s0021889813027751] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 10/09/2013] [Indexed: 12/04/2022] Open
Abstract
Fibrinogen is a large heterogeneous aggregation/degradation-prone protein playing a central role in blood coagulation and associated pathologies, whose structure is not completely resolved. When a high-molecular-weight fraction was analyzed by size-exclusion high-performance liquid chromatography/small-angle X-ray scattering (HPLC-SAXS), several composite peaks were apparent and because of the stickiness of fibrinogen the analysis was complicated by severe capillary fouling. Novel SAS analysis tools developed as a part of the UltraScan Solution Modeler (US-SOMO; http://somo.uthscsa.edu/), an open-source suite of utilities with advanced graphical user interfaces whose initial goal was the hydrodynamic modeling of biomacromolecules, were implemented and applied to this problem. They include the correction of baseline drift due to the accumulation of material on the SAXS capillary walls, and the Gaussian decomposition of non-baseline-resolved HPLC-SAXS elution peaks. It was thus possible to resolve at least two species co-eluting under the fibrinogen main monomer peak, probably resulting from in-column degradation, and two others under an oligomers peak. The overall and cross-sectional radii of gyration, molecular mass and mass/length ratio of all species were determined using the manual or semi-automated procedures available within the US-SOMO SAS module. Differences between monomeric species and linear and sideways oligomers were thus identified and rationalized. This new US-SOMO version additionally contains several computational and graphical tools, implementing functionalities such as the mapping of residues contributing to particular regions of P(r), and an advanced module for the comparison of primary I(q) versus q data with model curves computed from atomic level structures or bead models. It should be of great help in multi-resolution studies involving hydrodynamics, solution scattering and crystallographic/NMR data.
Collapse
Affiliation(s)
- Emre Brookes
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229-3900, USA
| | - Javier Pérez
- Beamline SWING, Synchrotron SOLEIL, L’Orme des Merisiers, BP48, Saint-Aubin, Gif sur Yvette, France
| | - Barbara Cardinali
- Biopolimeri e Proteomica, IRCCS AOU San Martino-IST, Istituto Nazionale per la Ricerca sul Cancro, Genova, Italy
- Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Aldo Profumo
- Biopolimeri e Proteomica, IRCCS AOU San Martino-IST, Istituto Nazionale per la Ricerca sul Cancro, Genova, Italy
| | - Patrice Vachette
- Institut de Biochimie et de Biophysique Moléculaire et Cellulaire, CNRS UMR 8619, UPS 11, Orsay, France
- Université Paris-Sud 11, Bâtiment 430, Orsay, France
| | - Mattia Rocco
- Biopolimeri e Proteomica, IRCCS AOU San Martino-IST, Istituto Nazionale per la Ricerca sul Cancro, Genova, Italy
| |
Collapse
|
14
|
Lounnas V, Ritschel T, Kelder J, McGuire R, Bywater RP, Foloppe N. Current progress in Structure-Based Rational Drug Design marks a new mindset in drug discovery. Comput Struct Biotechnol J 2013; 5:e201302011. [PMID: 24688704 PMCID: PMC3962124 DOI: 10.5936/csbj.201302011] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Revised: 01/26/2013] [Accepted: 02/08/2013] [Indexed: 12/20/2022] Open
Abstract
The past decade has witnessed a paradigm shift in preclinical drug discovery with structure-based drug design (SBDD) making a comeback while high-throughput screening (HTS) methods have continued to generate disappointing results. There is a deficit of information between identified hits and the many criteria that must be fulfilled in parallel to convert them into preclinical candidates that have a real chance to become a drug. This gap can be bridged by investigating the interactions between the ligands and their receptors. Accurate calculations of the free energy of binding are still elusive; however progresses were made with respect to how one may deal with the versatile role of water. A corpus of knowledge combining X-ray structures, bioinformatics and molecular modeling techniques now allows drug designers to routinely produce receptor homology models of increasing quality. These models serve as a basis to establish and validate efficient rationales used to tailor and/or screen virtual libraries with enhanced chances of obtaining hits. Many case reports of successful SBDD show how synergy can be gained from the combined use of several techniques. The role of SBDD with respect to two different classes of widely investigated pharmaceutical targets: (a) protein kinases (PK) and (b) G-protein coupled receptors (GPCR) is discussed. Throughout these examples prototypical situations covering the current possibilities and limitations of SBDD are presented.
Collapse
Affiliation(s)
- Valère Lounnas
- CMBI, NCMLS Radboud University, Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | - Tina Ritschel
- Computational Drug Discovery, CMBI, NCMLS, Radboud University Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | - Jan Kelder
- Beethovengaarde 97, 5344 CD Oss, The Netherlands
| | - Ross McGuire
- BioAxis Research BV, Pivot Park, Molenstraat 110, 5342 CC Oss, The Netherlands
| | | | | |
Collapse
|
15
|
Tiwari MK, Singh R, Singh RK, Kim IW, Lee JK. Computational approaches for rational design of proteins with novel functionalities. Comput Struct Biotechnol J 2012; 2:e201209002. [PMID: 24688643 PMCID: PMC3962203 DOI: 10.5936/csbj.201209002] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/17/2012] [Accepted: 08/23/2012] [Indexed: 11/22/2022] Open
Abstract
Proteins are the most multifaceted macromolecules in living systems and have various important functions, including structural, catalytic, sensory, and regulatory functions. Rational design of enzymes is a great challenge to our understanding of protein structure and physical chemistry and has numerous potential applications. Protein design algorithms have been applied to design or engineer proteins that fold, fold faster, catalyze, catalyze faster, signal, and adopt preferred conformational states. The field of de novo protein design, although only a few decades old, is beginning to produce exciting results. Developments in this field are already having a significant impact on biotechnology and chemical biology. The application of powerful computational methods for functional protein designing has recently succeeded at engineering target activities. Here, we review recently reported de novo functional proteins that were developed using various protein design approaches, including rational design, computational optimization, and selection from combinatorial libraries, highlighting recent advances and successes.
Collapse
Affiliation(s)
- Manish Kumar Tiwari
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Ranjitha Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; These authors contributed equally
| | - Raushan Kumar Singh
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - In-Won Kim
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| | - Jung-Kul Lee
- Department of Chemical Engineering, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea ; Institute of SK-KU Biomaterials, Konkuk University, 1 Hwayang-Dong, Gwangjin-Gu, Seoul 143-701, Korea
| |
Collapse
|
16
|
Jamroz M, Kolinski A, Kihara D. Structural features that predict real-value fluctuations of globular proteins. Proteins 2012; 80:1425-35. [PMID: 22328193 DOI: 10.1002/prot.24040] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Revised: 01/03/2012] [Accepted: 01/11/2012] [Indexed: 12/20/2022]
Abstract
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins.
Collapse
Affiliation(s)
- Michal Jamroz
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warszawa, Poland
| | | | | |
Collapse
|
17
|
Sael L, Chitale M, Kihara D. Structure- and sequence-based function prediction for non-homologous proteins. ACTA ACUST UNITED AC 2012; 13:111-23. [PMID: 22270458 DOI: 10.1007/s10969-012-9126-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 01/10/2012] [Indexed: 01/14/2023]
Abstract
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | |
Collapse
|
18
|
Mullins JGL. Structural modelling pipelines in next generation sequencing projects. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2012; 89:117-67. [PMID: 23046884 DOI: 10.1016/b978-0-12-394287-6.00005-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Our capacity to reliably predict protein structure from sequence is steadily improving due to the increased numbers and better targeting of protein structures being experimentally determined by structural genomics projects, along with the development of better modeling methodologies. Template-based (homology) modeling and de novo modeling methods are being combined to fill in remaining gaps in template coverage, and powerful automated structural modeling pipelines are being applied to large data sets of protein sequences. The improved quality of 3D models of proteins has led to their routine use in assessing the functional impact of nonsynonymous single nucleotide polymorphisms (nsSNPs) in specific protein systems, with the development of approaches that may be applied in a predictive fashion to nsSNPs emerging from next-generation sequencing projects. The challenges encountered in deriving functionally meaningful deductions from structural modeling can be quite different for proteins of different protein functional classes. The specific challenges to the assessment of the structural and functional impact of nsSNPs in globular proteins such as binding and regulatory proteins, structural proteins, and enzymes are discussed, as well as membrane transport proteins and ion channels. The mapping of reliable predictions of the structural and functional impact of SNPs, generated from automated modeling pipelines, on to protein-protein interaction networks will facilitate new approaches to understanding complex polygenic disorders and predisposition to disease.
Collapse
Affiliation(s)
- Jonathan G L Mullins
- Genome and Structural Bioinformatics, Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, Wales, UK.
| |
Collapse
|
19
|
La D, Kihara D. A novel method for protein-protein interaction site prediction using phylogenetic substitution models. Proteins 2011; 80:126-41. [PMID: 21989996 DOI: 10.1002/prot.23169] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2011] [Revised: 07/07/2011] [Accepted: 08/17/2011] [Indexed: 11/10/2022]
Abstract
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.
Collapse
Affiliation(s)
- David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
20
|
Xie L, Xie L, Bourne PE. Structure-based systems biology for analyzing off-target binding. Curr Opin Struct Biol 2011; 21:189-99. [PMID: 21292475 PMCID: PMC3070778 DOI: 10.1016/j.sbi.2011.01.004] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Revised: 01/11/2011] [Accepted: 01/13/2011] [Indexed: 12/24/2022]
Abstract
Here off-target binding implies the binding of a small molecule of therapeutic interest to a protein target other than the primary target for which it was intended. Increasingly such off-targeting appears to be the norm rather than the exception, rational drug design notwithstanding, and can lead to detrimental side-effects, or opportunities to reposition a therapeutic agent to treat a different condition. Not surprisingly, there is significant interest in determining a priori what off-targets exist on a proteome-wide scale. Beyond determining putative off-targets is the need to understand the impact of such binding on the complete biological system, with the ultimate goal of being able to predict the phenotypic outcome. While a very ambitious goal, some progress is being made.
Collapse
Affiliation(s)
- Lei Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Computer Science, Hunter College, the City University of New York, 695 Park Avenue, New York City, NY 10065, USA
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
21
|
Lee D, de Beer TAP, Laskowski RA, Thornton JM, Orengo CA. 1,000 structures and more from the MCSG. BMC STRUCTURAL BIOLOGY 2011; 11:2. [PMID: 21219649 PMCID: PMC3024214 DOI: 10.1186/1472-6807-11-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 01/10/2011] [Indexed: 11/10/2022]
Abstract
Background The Midwest Center for Structural Genomics (MCSG) is one of the large-scale centres of the Protein Structure Initiative (PSI). During the first two phases of the PSI the MCSG has solved over a thousand protein structures. A criticism of structural genomics is that target selection strategies mean that some structures are solved without having a known function and thus are of little biomedical significance. Structures of unknown function have stimulated the development of methods for function prediction from structure. Results We show that the MCSG has met the stated goals of the PSI and use online resources and readily available function prediction methods to provide functional annotations for more than 90% of the MCSG structures. The structure-to-function prediction method ProFunc provides likely functions for many of the MCSG structures that cannot be annotated by sequence-based methods. Conclusions Although the focus of the PSI was structural coverage, many of the structures solved by the MCSG can also be associated with functional classes and biological roles of possible biomedical value.
Collapse
Affiliation(s)
- David Lee
- Department of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
22
|
Sael L, Kihara D. Improved protein surface comparison and application to low-resolution protein structure data. BMC Bioinformatics 2010; 11 Suppl 11:S2. [PMID: 21172052 PMCID: PMC3024873 DOI: 10.1186/1471-2105-11-s11-s2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM), which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs). The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. Results The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Conclusions Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
| | | |
Collapse
|
23
|
Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2010; 79:735-51. [PMID: 21287609 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]
Abstract
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
24
|
Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, Furnham N, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res 2010; 39:D420-6. [PMID: 21097779 PMCID: PMC3013636 DOI: 10.1093/nar/gkq1001] [Citation(s) in RCA: 118] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework.
Collapse
Affiliation(s)
- Alison L Cuff
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
26
|
Yang YD, Spratt P, Chen H, Park C, Kihara D. Sub-AQUA: real-value quality assessment of protein structure models. Protein Eng Des Sel 2010; 23:617-32. [PMID: 20525730 DOI: 10.1093/protein/gzq030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Computational protein tertiary structure prediction has made significant progress over the past years. However, most of the existing structure prediction methods are not equipped with functionality to predict accuracy of constructed models. Knowing the accuracy of a structure model is crucial for its practical use since the accuracy determines potential applications of the model. Here we have developed quality assessment methods, which predict real value of the global and local quality of protein structure models. The global quality of a model is defined as the root mean square deviation (RMSD) and the LGA score to its native structure. The local quality is defined as the distance between the corresponding Calpha positions of a model and its native structure when they are superimposed. Three regression methods are employed to combine different types of quality assessment measures of models, including alignment-level scores, residue-position level scores, atomic-detailed structure level scores and composite scores. The regression models were tested on a large benchmark data set of template-based protein structure models of various qualities. In predicting RMSD and the LGA score, a combination of two terms, length-normalized SPAD, a score that assesses alignment stability by considering suboptimal alignments, and Verify3D normalized by the square of the model length shows a significant performance, achieving 97.1 and 83.6% accuracy in identifying models with an RMSD of <2 and 6 A, respectively. For predicting the local quality of models, we find that a two-step approach, in which the global RMSD predicted in the first step is further combined with the other terms, can dramatically increase the accuracy. Finally, the developed regression equations are applied to assess the quality of structure models of whole E. coli proteome.
Collapse
Affiliation(s)
- Yifeng David Yang
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | | | | | |
Collapse
|
27
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
28
|
Cuff A, Redfern OC, Greene L, Sillitoe I, Lewis T, Dibley M, Reid A, Pearl F, Dallman T, Todd A, Garratt R, Thornton J, Orengo C. The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space. Structure 2010; 17:1051-62. [PMID: 19679085 PMCID: PMC2741583 DOI: 10.1016/j.str.2009.06.015] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Revised: 06/24/2009] [Accepted: 06/25/2009] [Indexed: 11/29/2022]
Abstract
This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., αβ-motifs, α-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.
Collapse
Affiliation(s)
- Alison Cuff
- Institute of Structural and Molecular Biology, University College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
'Unknown' proteins and 'orphan' enzymes: the missing half of the engineering parts list--and how to find it. Biochem J 2009; 425:1-11. [PMID: 20001958 DOI: 10.1042/bj20091328] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Like other forms of engineering, metabolic engineering requires knowledge of the components (the 'parts list') of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of 'unknown' proteins and 'orphan' enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the 'missing parts list' problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
Collapse
|
30
|
|
31
|
Grabowski M, Chruszcz M, Zimmerman MD, Kirillova O, Minor W. Benefits of structural genomics for drug discovery research. Infect Disord Drug Targets 2009; 9:459-74. [PMID: 19594422 PMCID: PMC2866842 DOI: 10.2174/187152609789105704] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2009] [Accepted: 06/15/2009] [Indexed: 11/22/2022]
Abstract
While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in "traditional" structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system.
Collapse
Affiliation(s)
- Marek Grabowski
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
- Center for Structural Genomics of Infectious Diseases
| | - Maksymilian Chruszcz
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
- Center for Structural Genomics of Infectious Diseases
| | - Matthew D. Zimmerman
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
- Center for Structural Genomics of Infectious Diseases
| | - Olga Kirillova
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
| | - Wladek Minor
- Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
- Midwest Center for Structural Genomics
- Center for Structural Genomics of Infectious Diseases
| |
Collapse
|
32
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
33
|
Abstract
It can be argued that the arrival of the “genomics era” has significantly shifted the paradigm of vaccine and therapeutics development from microbiological to sequence-based approaches. Genome sequences provide a previously unattainable route to investigate the mechanisms that underpin pathogenesis. Genomics, transcriptomics, metabolomics, structural genomics, proteomics, and immunomics are being exploited to perfect the identification of targets, to design new vaccines and drugs, and to predict their effects in patients. Furthermore, human genomics and related studies are providing insights into aspects of host biology that are important in infectious disease. This ever-growing body of genomic data and new genome-based approaches will play a critical role in the future to enable timely development of vaccines and therapeutics to control emerging infectious diseases.
Collapse
|
34
|
Rinaudo CD, Telford JL, Rappuoli R, Seib KL. Vaccinology in the genome era. J Clin Invest 2009; 119:2515-25. [PMID: 19729849 DOI: 10.1172/jci38330] [Citation(s) in RCA: 115] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Vaccination has played a significant role in controlling and eliminating life-threatening infectious diseases throughout the world, and yet currently licensed vaccines represent only the tip of the iceberg in terms of controlling human pathogens. However, as we discuss in this Review, the arrival of the genome era has revolutionized vaccine development and catalyzed a shift from conventional culture-based approaches to genome-based vaccinology. The availability of complete bacterial genomes has led to the development and application of high-throughput analyses that enable rapid targeted identification of novel vaccine antigens. Furthermore, structural vaccinology is emerging as a powerful tool for the rational design or modification of vaccine antigens to improve their immunogenicity and safety.
Collapse
|
35
|
Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, Lee D, Fiser A, Godzik A, Rost B, Orengo C. PSI-2: structural genomics to cover protein domain family space. Structure 2009; 17:869-81. [PMID: 19523904 DOI: 10.1016/j.str.2009.03.015] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2008] [Revised: 03/18/2009] [Accepted: 03/22/2009] [Indexed: 11/25/2022]
Abstract
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
Collapse
Affiliation(s)
- Benoît H Dessailly
- Department of Structural and Molecular Biology, University College of London, London WC1E6BT, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Ausiello G, Gherardini PF, Gatti E, Incani O, Helmer-Citterich M. Structural motifs recurring in different folds recognize the same ligand fragments. BMC Bioinformatics 2009; 10:182. [PMID: 19527512 PMCID: PMC2704211 DOI: 10.1186/1471-2105-10-182] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Accepted: 06/15/2009] [Indexed: 12/11/2022] Open
Abstract
Background The structural analysis of protein ligand binding sites can provide information relevant for assigning functions to unknown proteins, to guide the drug discovery process and to infer relations among distant protein folds. Previous approaches to the comparative analysis of binding pockets have usually been focused either on the ligand or the protein component. Even though several useful observations have been made with these approaches they both have limitations. In the former case the analysis is restricted to binding pockets interacting with similar ligands, while in the latter it is difficult to systematically check whether the observed structural similarities have a functional significance. Results Here we propose a novel methodology that takes into account the structure of both the binding pocket and the ligand. We first look for local similarities in a set of binding pockets and then check whether the bound ligands, even if completely different, share a common fragment that can account for the presence of the structural motif. Thanks to this method we can identify structural motifs whose functional significance is explained by the presence of shared features in the interacting ligands. Conclusion The application of this method to a large dataset of binding pockets allows the identification of recurring protein motifs that bind specific ligand fragments, even in the context of molecules with a different overall structure. In addition some of these motifs are present in a high number of evolutionarily unrelated proteins.
Collapse
Affiliation(s)
- Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, Rome, Italy.
| | | | | | | | | |
Collapse
|
37
|
Potential for protein surface shape analysis using spherical harmonics and 3D Zernike descriptors. Cell Biochem Biophys 2009; 54:23-32. [PMID: 19521674 DOI: 10.1007/s12013-009-9051-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Accepted: 05/22/2009] [Indexed: 10/20/2022]
Abstract
With structure databases expanding at a rapid rate, the task at hand is to provide reliable clues to their molecular function and to be able to do so on a large scale. This, however, requires suitable encodings of the molecular structure which are amenable to fast screening. To this end, moment-based representations provide a compact and nonredundant description of molecular shape and other associated properties. In this article, we present an overview of some commonly used representations with specific focus on two schemes namely spherical harmonics and their extension, the 3D Zernike descriptors. Key features and differences of the two are reviewed and selected applications are highlighted. We further discuss recent advances covering aspects of shape and property-based comparison at both global and local levels and demonstrate their applicability through some of our studies.
Collapse
|
38
|
Peterson ME, Chen F, Saven JG, Roos DS, Babbitt PC, Sali A. Evolutionary constraints on structural similarity in orthologs and paralogs. Protein Sci 2009; 18:1306-15. [PMID: 19472362 PMCID: PMC2774440 DOI: 10.1002/pro.143] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Revised: 03/29/2009] [Accepted: 03/30/2009] [Indexed: 11/10/2022]
Abstract
Although a quantitative relationship between sequence similarity and structural similarity has long been established, little is known about the impact of orthology on the relationship between protein sequence and structure. Among homologs, orthologs (derived by speciation) more frequently have similar functions than paralogs (derived by duplication). Here, we hypothesize that an orthologous pair will tend to exhibit greater structural similarity than a paralogous pair at the same level of sequence similarity. To test this hypothesis, we used 284,459 pairwise structure-based alignments of 12,634 unique domains from SCOP as well as orthology and paralogy assignments from OrthoMCL DB. We divided the comparisons by sequence identity and determined whether the sequence-structure relationship differed between the orthologs and paralogs. We found that at levels of sequence identity between 30 and 70%, orthologous domain pairs indeed tend to be significantly more structurally similar than paralogous pairs at the same level of sequence identity. An even larger difference is found when comparing ligand binding residues instead of whole domains. These differences between orthologs and paralogs are expected to be useful for selecting template structures in comparative modeling and target proteins in structural genomics.
Collapse
Affiliation(s)
- Mark E Peterson
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| | - Feng Chen
- Department of Chemistry, University of PennsylvaniaPhiladelphia, PA 19104
- Department of Biology and Penn Genomics Institute, University of PennsylvaniaPhiladelphia, PA 19104
| | - Jeffery G Saven
- Department of Chemistry, University of PennsylvaniaPhiladelphia, PA 19104
| | - David S Roos
- Department of Biology and Penn Genomics Institute, University of PennsylvaniaPhiladelphia, PA 19104
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, San Francisco, California 94158
- Department of Pharmaceutical Chemistry, University of CaliforniaSan Francisco, San Francisco, California 94158
- California Institute for Quantitative Biosciences, University of CaliforniaSan Francisco, San Francisco, California 94158
| |
Collapse
|
39
|
Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. A relationship between mRNA expression levels and protein solubility in E. coli. J Mol Biol 2009; 388:381-9. [PMID: 19281824 DOI: 10.1016/j.jmb.2009.03.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Revised: 02/26/2009] [Accepted: 03/03/2009] [Indexed: 10/21/2022]
Abstract
Each step in the process of gene expression, from the transcription of DNA into mRNA to the folding and posttranslational modification of proteins, is regulated by complex cellular mechanisms. At the same time, stringent conditions on the physicochemical properties of proteins, and hence on the nature of their amino acids, are imposed by the need to avoid aggregation at the concentrations required for optimal cellular function. A relationship is therefore expected to exist between mRNA expression levels and protein solubility in the cell. By investigating such a relationship, we formulate a method that enables the prediction of the maximal levels of mRNA expression in Escherichia coli with an accuracy of 83% and of the solubility of recombinant human proteins expressed in E. coli with an accuracy of 86%.
Collapse
Affiliation(s)
- Gian Gaetano Tartaglia
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.
| | | | | | | |
Collapse
|
40
|
Nicola G, Smith CA, Abagyan R. New method for the assessment of all drug-like pockets across a structural genome. J Comput Biol 2008; 15:231-40. [PMID: 18333758 DOI: 10.1089/cmb.2007.0178] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
With the increasing wealth of structural information available for human pathogens, it is now becoming possible to leverage that information to aid in rational selection of targets for inhibitor discovery. We present a methodology for assessing the drugability of all small-molecule binding pockets in a pathogen. Our approach incorporates accurate pocket identification, sequence conservation with a similar organism, sequence conservation with the host, and structure resolution. This novel method is applied to 21 structures from the malarial parasite Plasmodium falciparum. Based on our survey of the structural genome, we selected enoyl-acyl carrier protein reductase (ENR) as a promising candidate for virtual screening based inhibitor discovery.
Collapse
Affiliation(s)
- George Nicola
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, USA
| | | | | |
Collapse
|
41
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|
42
|
Abstract
The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics "pipeline." Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding "problematic" proteins while also ensuring the ultimate goals of the project are followed.
Collapse
|
43
|
Hunjan J, Tovchigrechko A, Gao Y, Vakser IA. The size of the intermolecular energy funnel in protein-protein interactions. Proteins 2008; 72:344-52. [PMID: 18214966 DOI: 10.1002/prot.21930] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Revealing the fundamental principles of protein interactions is essential for the basic knowledge of molecular processes and designing better predictive tools. Protein docking procedures allow systematic sampling of intermolecular energy landscapes, revealing the distribution of energy basins and their characteristics. A systematic search docking procedure GRAMM-X was applied to a comprehensive nonredundant database of nonobligate protein-protein complexes to determine the size of the intermolecular energy funnel. The unbound structures were simulated using rotamer library. The procedure generated grid-based matches, based on a smoothed Lennard-Jones potential, and minimized them off the grid with the same potential. The minimization generated a distribution of distances, based on a variety of metrics, between the grid-based and the minimized matches. The metric selected for the analysis, ligand interface RMSD, provided three independent estimates of the funnel size: based on the distribution amplitude for the near-native matches, deviation from random, and correlation with the energy values. The three methods converge to similar estimates of approximately 6-8 A ligand interface RMSD. The results indicated dependence of the funnel size on the type of the complex (smaller for antigen-antibody, medium for enzyme-inhibitor, and larger for the rest of the complexes) and the funnel size correlation with the size of the interface. Guidelines for the optimal sampling of docking coordinates, based on the funnel size estimates, were explored.
Collapse
Affiliation(s)
- Jagtar Hunjan
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas 66047, USA
| | | | | | | |
Collapse
|
44
|
Powers R, Mercier KA, Copeland JC. The application of FAST-NMR for the identification of novel drug discovery targets. Drug Discov Today 2008; 13:172-9. [PMID: 18275915 DOI: 10.1016/j.drudis.2007.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2007] [Revised: 10/30/2007] [Accepted: 11/01/2007] [Indexed: 10/22/2022]
Abstract
The continued success of genome sequencing projects has resulted in a wealth of information, but 40-50% of identified genes correspond to hypothetical proteins or proteins of unknown function. The functional annotation screening technology by NMR (FAST-NMR) screen was developed to assign a biological function for these unannotated proteins with a structure solved by the protein structure initiative. FAST-NMR is based on the premise that a biological function can be described by a similarity in binding sites and ligand interactions with proteins of known function. The resulting co-structure and functional assignment may provide a starting point for a drug discovery effort.
Collapse
Affiliation(s)
- Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE 68522, USA.
| | | | | |
Collapse
|
45
|
An approach to quality management in structural biology: Biophysical selection of proteins for successful crystallization. J Struct Biol 2008; 162:451-9. [DOI: 10.1016/j.jsb.2008.03.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2007] [Revised: 03/05/2008] [Accepted: 03/06/2008] [Indexed: 11/23/2022]
|
46
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
47
|
Xie L, Bourne PE. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci U S A 2008; 105:5441-6. [PMID: 18385384 PMCID: PMC2291117 DOI: 10.1073/pnas.0704422105] [Citation(s) in RCA: 209] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Indexed: 11/18/2022] Open
Abstract
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structure relationship remains obscure. Results suggest evolutionary relationships across several previously evolutionary distinct protein structure superfamilies. SOIPPA, along with an increased coverage of protein fold space afforded by the structural genomics initiative, can be used to further test the notion that fold space is continuous rather than discrete.
Collapse
Affiliation(s)
- Lei Xie
- *San Diego Supercomputer Center and
| | - Philip E. Bourne
- *San Diego Supercomputer Center and
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093
| |
Collapse
|
48
|
Overton IM, Padovani G, Girolami MA, Barton GJ. ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction. Bioinformatics 2008; 24:901-7. [DOI: 10.1093/bioinformatics/btn055] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
49
|
Piedra D, Lois S, de la Cruz X. Preservation of protein clefts in comparative models. BMC STRUCTURAL BIOLOGY 2008; 8:2. [PMID: 18199319 PMCID: PMC2249585 DOI: 10.1186/1472-6807-8-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Accepted: 01/16/2008] [Indexed: 11/29/2022]
Abstract
BACKGROUND Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. RESULTS We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. CONCLUSION We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.
Collapse
Affiliation(s)
- David Piedra
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
| | - Sergi Lois
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
- Instituto de Biología Molecular de Barcelona, CID, Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - Xavier de la Cruz
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
50
|
Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics 2008; 9:17. [PMID: 18190718 PMCID: PMC2219985 DOI: 10.1186/1471-2105-9-17] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. RESULTS Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. CONCLUSION These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|