1
|
Porter LL. Fluid protein fold space and its implications. Bioessays 2023; 45:e2300057. [PMID: 37431685 PMCID: PMC10529699 DOI: 10.1002/bies.202300057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/21/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023]
Abstract
Fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli, suggest a new view of protein fold space. For decades, experimental evidence has indicated that protein fold space is discrete: dissimilar folds are encoded by dissimilar amino acid sequences. Challenging this assumption, fold-switching proteins interconnect discrete groups of dissimilar protein folds, making protein fold space fluid. Three recent observations support the concept of fluid fold space: (1) some amino acid sequences interconvert between folds with distinct secondary structures, (2) some naturally occurring sequences have switched folds by stepwise mutation, and (3) fold switching is evolutionarily selected and likely confers advantage. These observations indicate that minor amino acid sequence modifications can transform protein structure and function. Consequently, proteomic structural and functional diversity may be expanded by alternative splicing, small nucleotide polymorphisms, post-translational modifications, and modified translation rates.
Collapse
Affiliation(s)
- Lauren L. Porter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
2
|
Koehler Leman J, Szczerbiak P, Renfrew PD, Gligorijevic V, Berenberg D, Vatanen T, Taylor BC, Chandler C, Janssen S, Pataki A, Carriero N, Fisk I, Xavier RJ, Knight R, Bonneau R, Kosciolek T. Sequence-structure-function relationships in the microbial protein universe. Nat Commun 2023; 14:2351. [PMID: 37100781 PMCID: PMC10133388 DOI: 10.1038/s41467-023-37896-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2023] Open
Abstract
For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don't rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, NY, USA.
| | - Pawel Szczerbiak
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - P Douglas Renfrew
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
| | - Vladimir Gligorijevic
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| | - Daniel Berenberg
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
- Center for Data Science, New York University, New York, NY, 10011, USA
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, USA
| | - Tommi Vatanen
- Broad Institute, Cambridge, MA, USA
- Liggins Institute, University of Auckland, Auckland, New Zealand
- Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, 00014 University of Helsinki, Helsinki, Finland
| | - Bryn C Taylor
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- In Silico Discovery and External Innovation, Janssen Research and Development, San Diego, CA, 92122, USA
| | - Chris Chandler
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Stefan Janssen
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA
- Algorithmic Bioinformatics, Justus Liebig University Giessen, Giessen, Germany
| | - Andras Pataki
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Nick Carriero
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Ian Fisk
- Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Ramnik J Xavier
- Broad Institute, Cambridge, MA, USA
- Center for Microbiome Informatics and Therapeutics, MIT, Cambridge, MA, 02139, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, USA
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biology, New York University, New York, NY, USA
- Center for Data Science, New York University, New York, NY, 10011, USA
- Courant Institute of Mathematical Sciences, Department of Computer Science, New York University, New York, NY, USA
- Prescient Design, a Genentech accelerator, New York, NY, 10010, USA
| | - Tomasz Kosciolek
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
| |
Collapse
|
3
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
4
|
Pražnikar J, Attygalle NT. Quantitative analysis of visual codewords of a protein distance matrix. PLoS One 2022; 17:e0263566. [PMID: 35120181 PMCID: PMC8815937 DOI: 10.1371/journal.pone.0263566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/24/2022] [Indexed: 12/02/2022] Open
Abstract
3D protein structures can be analyzed using a distance matrix calculated as the pairwise distance between all Cα atoms in the protein model. Although researchers have efficiently used distance matrices to classify proteins and find homologous proteins, much less work has been done on quantitative analysis of distance matrix features. Therefore, the distance matrix was analyzed as gray scale image using KAZE feature extractor algorithm with Bag of Visual Words model. In this study, each protein was represented as a histogram of visual codewords. The analysis showed that a very small number of codewords (~1%) have a high relative frequency (> 0.25) and that the majority of codewords have a relative frequency around 0.05. We have also shown that there is a relationship between the frequency of codewords and the position of the features in a distance matrix. The codewords that are more frequent are located closer to the main diagonal. Less frequent codewords, on the other hand, are located in the corners of the distance matrix, far from the main diagonal. Moreover, the analysis showed a correlation between the number of unique codewords and the 3D repeats in the protein structure. The solenoid and tandem repeats proteins have a significantly lower number of unique codewords than the globular proteins. Finally, the codeword histograms and Support Vector Machine (SVM) classifier were used to classify solenoid and globular proteins. The result showed that the SVM classifier fed with codeword histograms correctly classified 352 out of 354 proteins.
Collapse
Affiliation(s)
- Jure Pražnikar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
- Department of Biochemistry, Molecular and Structural Biology, Institute Jožef Stefan, Ljubljana, Slovenia
- * E-mail:
| | - Nuwan Tharanga Attygalle
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
| |
Collapse
|
5
|
Carrillo-Cabada H, Benson J, Razavi AM, Mulligan B, Cuendet MA, Weinstein H, Taufer M, Estrada T. A Graphic Encoding Method for Quantitative Classification of Protein Structure and Representation of Conformational Changes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1336-1349. [PMID: 31603792 PMCID: PMC9119144 DOI: 10.1109/tcbb.2019.2945291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In order to successfully predict a proteins function throughout its trajectory, in addition to uncovering changes in its conformational state, it is necessary to employ techniques that maintain its 3D information while performing at scale. We extend a protein representation that encodes secondary and tertiary structure into fix-sized, color images, and a neural network architecture (called GEM-net) that leverages our encoded representation. We show the applicability of our method in two ways: (1) performing protein function prediction, hitting accuracy between 78 and 83 percent, and (2) visualizing and detecting conformational changes in protein trajectories during molecular dynamics simulations.
Collapse
|
6
|
Cao Y, Das P, Chenthamarakshan V, Chen PY, Melnyk I, Shen Y. Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2021; 139:1261-1271. [PMID: 34423306 PMCID: PMC8375603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Designing novel protein sequences for a desired 3D topological fold is a fundamental yet nontrivial task in protein engineering. Challenges exist due to the complex sequence-fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence-structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https://github.com/IBM/fold2seq.
Collapse
Affiliation(s)
- Yue Cao
- IBM Research
- Texas A&M University
| | | | | | | | | | | |
Collapse
|
7
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
8
|
Shukla P, Verma S, Kumar M. A rotation based regularization method for semi-supervised learning. Pattern Anal Appl 2021; 24:887-905. [PMID: 33424433 PMCID: PMC7781196 DOI: 10.1007/s10044-020-00947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 12/09/2020] [Indexed: 12/01/2022]
Abstract
In manifold learning, the intrinsic geometry of the manifold is explored and preserved by identifying the optimal local neighborhood around each observation. It is well known that when a Riemannian manifold is unfolded correctly, the observations lying spatially near to the manifold, should remain near on the lower dimension as well. Due to the nonlinear properties of manifold around each observation, finding such optimal neighborhood on the manifold is a challenge. Thus, a sub-optimal neighborhood may lead to erroneous representation and incorrect inferences. In this paper, we propose a rotation-based affinity metric for accurate graph Laplacian approximation. It exploits the property of aligned tangent spaces of observations in an optimal neighborhood to approximate correct affinity between them. Extensive experiments on both synthetic and real world datasets have been performed. It is observed that proposed method outperforms existing nonlinear dimensionality reduction techniques in low-dimensional representation for synthetic datasets. The results on real world datasets like COVID-19 prove that our approach increases the accuracy of classification by enhancing Laplacian regularization.
Collapse
Affiliation(s)
- Prashant Shukla
- Department of Information Technology, Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, U.P. 211012 India
| | - Shekhar Verma
- Department of Information Technology, Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, U.P. 211012 India
| | - Manish Kumar
- Department of Information Technology, Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, U.P. 211012 India
| |
Collapse
|
9
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
10
|
Exploring Protein Fold Space. Biomolecules 2020; 10:biom10020193. [PMID: 32012781 PMCID: PMC7072414 DOI: 10.3390/biom10020193] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 01/22/2020] [Accepted: 01/24/2020] [Indexed: 11/17/2022] Open
Abstract
The model of protein folding proposed by Ptitsyn and colleagues involves the accretion of secondary structures around a nucleus. As developed by Efimov, this model also provides a useful way to view the relationships among structures. Although somewhat eclipsed by later databases based on the pairwise comparison of structures, Efimov’s approach provides a guide for the more automatic comparison of proteins based on an encoding of their topology as a string. Being restricted to layers of secondary structures based on beta sheets, this too has limitations which are partly overcome by moving to a more generalised secondary structure lattice that can encompass both open and closed (barrel) sheets as well as helical packing of the type encoded by Murzin and Finkelstein on small polyhedra. Regular (crystalline) lattices, such as close-packed hexagonals, were found to be too limited so pseudo-latticses were investigated including those found in quasicrystals and the Bernal tetrahedron-based lattice that he used to represent liquid water. The Bernal lattice was considered best and used to generate model protein structures. These were much more numerous than those seen in Nature, posing the open question of why this might be.
Collapse
|
11
|
A global map of the protein shape universe. PLoS Comput Biol 2019; 15:e1006969. [PMID: 30978181 PMCID: PMC6481876 DOI: 10.1371/journal.pcbi.1006969] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 04/24/2019] [Accepted: 03/20/2019] [Indexed: 11/19/2022] Open
Abstract
Proteins are involved in almost all functions in a living cell, and functions of proteins are realized by their tertiary structures. Obtaining a global perspective of the variety and distribution of protein structures lays a foundation for our understanding of the building principle of protein structures. In light of the rapid accumulation of low-resolution structure data from electron tomography and cryo-electron microscopy, here we map and classify three-dimensional (3D) surface shapes of proteins into a similarity space. Surface shapes of proteins were represented with 3D Zernike descriptors, mathematical moment-based invariants, which have previously been demonstrated effective for biomolecular structure similarity search. In addition to single chains of proteins, we have also analyzed the shape space occupied by protein complexes. From the mapping, we have obtained various new insights into the relationship between shapes, main-chain folds, and complex formation. The unique view obtained from shape mapping opens up new ways to understand design principles, functions, and evolution of proteins. Proteins are the major molecules involved in almost all cellular processes. In this work, we present a novel mapping of protein shapes that represents the variety and the similarities of 3D shapes of proteins and their assemblies. This mapping provides various novel insights into protein shapes including determinant factors of protein 3D shapes, which enhance our understanding of the design principles of protein shapes. The mapping will also be a valuable resource for artificial protein design as well as references for classifying medium- to low-resolution protein structure images of determined by cryo-electron microscopy and tomography.
Collapse
|
12
|
|
13
|
What Can We Learn from Wide-Angle Solution Scattering? ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1009:131-147. [PMID: 29218557 DOI: 10.1007/978-981-10-6038-0_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Extending collection of x-ray solution scattering data into the wide-angle regime (WAXS) can provide information not readily extracted from small angle (SAXS) data. It is possible to accurately predict WAXS scattering on the basis of atomic coordinate sets and thus use it as a means of testing molecular models constructed on the basis of crystallography, molecular dynamics (MD), cryo-electron microscopy or ab initio modeling. WAXS data may provide insights into the secondary, tertiary and quaternary structural organization of macromolecules. It can provide information on protein folding and unfolding beyond that attainable from SAXS data. It is particularly sensitive to structural fluctuations in macromolecules and can be used to generate information about the conformational make up of ensembles of structures co-existing in solution. Novel approaches to modeling of structural fluctuations can provide information on the spatial extent of large-scale structural fluctuations that are difficult to obtain by other means. Direct comparison with the results of MD simulations are becoming possible. Because it is particularly sensitive to small changes in structure and flexibility it provides unique capabilities for the screening of ligand libraries for detection of functional interactions. WAXS thereby provides an important extension of SAXS that can generate structural and dynamic information complementary to that obtainable by other biophysical techniques.
Collapse
|
14
|
Rajendran S, Jothi A. Sequentially distant but structurally similar proteins exhibit fold specific patterns based on their biophysical properties. Comput Biol Chem 2018; 75:143-153. [PMID: 29783123 DOI: 10.1016/j.compbiolchem.2018.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 05/06/2018] [Accepted: 05/07/2018] [Indexed: 11/25/2022]
Abstract
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy.
Collapse
Affiliation(s)
- Senthilnathan Rajendran
- Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamil Nadu, 613401, India.
| | - Arunachalam Jothi
- Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamil Nadu, 613401, India.
| |
Collapse
|
15
|
A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters. Sci Rep 2017; 7:14210. [PMID: 29079755 PMCID: PMC5660162 DOI: 10.1038/s41598-017-13297-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 09/21/2017] [Indexed: 12/28/2022] Open
Abstract
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Collapse
|
16
|
Lee J, Konc J, Janežič D, Brooks BR. Global organization of a binding site network gives insight into evolution and structure-function relationships of proteins. Sci Rep 2017; 7:11652. [PMID: 28912495 PMCID: PMC5599562 DOI: 10.1038/s41598-017-10412-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 08/07/2017] [Indexed: 01/06/2023] Open
Abstract
The global organization of protein binding sites is analyzed by constructing a weighted network of binding sites based on their structural similarities and detecting communities of structurally similar binding sites based on the minimum description length principle. The analysis reveals that there are two central binding site communities that play the roles of the network hubs of smaller peripheral communities. The sizes of communities follow a power-law distribution, which indicates that the binding sites included in larger communities may be older and have been evolutionary structural scaffolds of more recent ones. Structurally similar binding sites in the same community bind to diverse ligands promiscuously and they are also embedded in diverse domain structures. Understanding the general principles of binding site interplay will pave the way for improved drug design and protein design.
Collapse
Affiliation(s)
- Juyong Lee
- Department of Chemistry, Kangwon National University, 1 Kangwondaehak-gil, Chuncheon, 24341, Republic of Korea. .,Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, 20892, United States.
| | - Janez Konc
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000, Koper, Slovenia.,National Institute of Chemistry, Hajdrihova 19, SI-1000, Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000, Koper, Slovenia
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, 20892, United States
| |
Collapse
|
17
|
Garland J. Unravelling the complexity of signalling networks in cancer: A review of the increasing role for computational modelling. Crit Rev Oncol Hematol 2017; 117:73-113. [PMID: 28807238 DOI: 10.1016/j.critrevonc.2017.06.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 06/01/2017] [Accepted: 06/08/2017] [Indexed: 02/06/2023] Open
Abstract
Cancer induction is a highly complex process involving hundreds of different inducers but whose eventual outcome is the same. Clearly, it is essential to understand how signalling pathways and networks generated by these inducers interact to regulate cell behaviour and create the cancer phenotype. While enormous strides have been made in identifying key networking profiles, the amount of data generated far exceeds our ability to understand how it all "fits together". The number of potential interactions is astronomically large and requires novel approaches and extreme computation methods to dissect them out. However, such methodologies have high intrinsic mathematical and conceptual content which is difficult to follow. This review explains how computation modelling is progressively finding solutions and also revealing unexpected and unpredictable nano-scale molecular behaviours extremely relevant to how signalling and networking are coherently integrated. It is divided into linked sections illustrated by numerous figures from the literature describing different approaches and offering visual portrayals of networking and major conceptual advances in the field. First, the problem of signalling complexity and data collection is illustrated for only a small selection of known oncogenes. Next, new concepts from biophysics, molecular behaviours, kinetics, organisation at the nano level and predictive models are presented. These areas include: visual representations of networking, Energy Landscapes and energy transfer/dissemination (entropy); diffusion, percolation; molecular crowding; protein allostery; quinary structure and fractal distributions; energy management, metabolism and re-examination of the Warburg effect. The importance of unravelling complex network interactions is then illustrated for some widely-used drugs in cancer therapy whose interactions are very extensive. Finally, use of computational modelling to develop micro- and nano- functional models ("bottom-up" research) is highlighted. The review concludes that computational modelling is an essential part of cancer research and is vital to understanding network formation and molecular behaviours that are associated with it. Its role is increasingly essential because it is unravelling the huge complexity of cancer induction otherwise unattainable by any other approach.
Collapse
Affiliation(s)
- John Garland
- Manchester Interdisciplinary Biocentre, Manchester University, Manchester, UK.
| |
Collapse
|
18
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
19
|
Semantic Signature: Comparative Interpretation of Gene Expression on a Semantic Space. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:5174503. [PMID: 27242916 PMCID: PMC4868886 DOI: 10.1155/2016/5174503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 03/23/2016] [Indexed: 11/17/2022]
Abstract
Background. Interpretation of microarray data remains challenging because biological meaning should be extracted from enormous numeric matrices and be presented explicitly. Moreover, huge public repositories of microarray dataset are ready to be exploited for comparative analysis. This study aimed to provide a platform where essential implication of a microarray experiment could be visually expressed and various microarray datasets could be intuitively compared. Results. On the semantic space, gene sets from Molecular Signature Database (MSigDB) were plotted as landmarks and their relative distances were calculated by Lin's semantic similarity measure. By formal concept analysis, a microarray dataset was transformed into a concept lattice with gene clusters as objects and Gene Ontology terms as attributes. Concepts of a lattice were located on the semantic space reflecting semantic distance from landmarks and edges between concepts were drawn; consequently, a specific geographic pattern could be observed from a microarray dataset. We termed a distinctive geography shared by microarray datasets of the same category as “semantic signature.” Conclusions. “Semantic space,” a map of biological entities, could serve as a universal platform for comparative microarray analysis. When microarray data were displayed on the semantic space as concept lattices, “semantic signature,” characteristic geography for a microarray experiment, could be discovered.
Collapse
|
20
|
Zhou H, Li S, Makowski L. Visualizing global properties of a molecular dynamics trajectory. Proteins 2015; 84:82-91. [PMID: 26522428 DOI: 10.1002/prot.24957] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Revised: 08/13/2015] [Accepted: 10/14/2015] [Indexed: 11/10/2022]
Abstract
Molecular dynamics (MD) trajectories are very large data sets that contain substantial information about the dynamic behavior of a protein. Condensing these data into a form that can provide intuitively useful understanding of the molecular behavior during the trajectory is a substantial challenge that has received relatively little attention. Here, we introduce the sigma-r plot, a plot of the standard deviation of intermolecular distances as a function of that distance. This representation of global dynamics contains within a single, one-dimensional plot, the average range of motion between pairs of atoms within a macromolecule. Comparison of sigma-r plots calculated from 10 ns trajectories of proteins representing the four major SCOP fold classes indicates diversity of dynamic behaviors which are recognizably different among the four classes. Differences in domain structure and molecular weight also produce recognizable features in sigma-r plots, reflective of differences in global dynamics. Plots generated from trajectories with progressively increasing simulation time reflect the increased sampling of the structural ensemble as a function of time. Single amino acid replacements can give rise to changes in global dynamics detectable through comparison of sigma-r plots. Dynamic behavior of substructures can be monitored by careful choice of interatomic vectors included in the calculation. These examples provide demonstrations of the utility of the sigma-r plot to provide a simple measure of the global dynamics of a macromolecule.
Collapse
Affiliation(s)
- Hao Zhou
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
| | - Shangyang Li
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts
| | - Lee Makowski
- Department of Bioengineering, Northeastern University, Boston, Massachusetts.,Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts
| |
Collapse
|
21
|
Machine Learnable Fold Space Representation based on Residue Cluster Classes. Comput Biol Chem 2015; 59 Pt A:1-7. [PMID: 26366526 DOI: 10.1016/j.compbiolchem.2015.07.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Revised: 07/17/2015] [Accepted: 07/25/2015] [Indexed: 11/21/2022]
Abstract
MOTIVATION Protein fold space is a conceptual framework where all possible protein folds exist and ideas about protein structure, function and evolution may be analyzed. Classification of protein folds in this space is commonly achieved by using similarity indexes and/or machine learning approaches, each with different limitations. RESULTS We propose a method for constructing a compact vector space model of protein fold space by representing each protein structure by its residues local contacts. We developed an efficient method to statistically test for the separability of points in a space and showed that our protein fold space representation is learnable by any machine-learning algorithm. AVAILABILITY An API is freely available at https://code.google.com/p/pyrcc/.
Collapse
|
22
|
Edwards H, Deane CM. Structural Bridges through Fold Space. PLoS Comput Biol 2015; 11:e1004466. [PMID: 26372166 PMCID: PMC4570669 DOI: 10.1371/journal.pcbi.1004466] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/12/2015] [Indexed: 12/05/2022] Open
Abstract
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes. Folds are considered to be the structural units which make up the protein universe. Structural classification schemes focus on the assignment and organisation of protein domains into folds. However, they do not suggest how different folds might relate to one another in a global way. We introduce the concept of bridges through fold space: significant similarities between these units. We consider four alignment methods and a dynamic approach to placing these bridges. A greater consensus between these methods cannot be achieved by simply increasing the stringency with which edges are assigned. Instead, we emphasise the importance of considering consensus maps and only report results where there is agreement across all networks. It is possible that a study of the bridges may reveal evolutionary relationships. Based on a phylogenetic analysis of structures, we find that bridges consistently fall between folds which evolved at similar times. Moreover, the landscapes all consist of a core of older folds, with younger structures more often seen at the periphery. Finally we identify four pivotal folds in the landscapes. They contain topological motifs which unite disparate regions of fold space.
Collapse
Affiliation(s)
- Hannah Edwards
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
23
|
Rackovsky S. Nonlinearities in protein space limit the utility of informatics in protein biophysics. Proteins 2015; 83:1923-8. [PMID: 26315852 DOI: 10.1002/prot.24916] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 08/12/2015] [Accepted: 08/20/2015] [Indexed: 11/08/2022]
Abstract
We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.
Collapse
Affiliation(s)
- S Rackovsky
- Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.,Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029
| |
Collapse
|
24
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
25
|
Minami S, Sawada K, Chikenji G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. PLoS One 2014; 9:e107959. [PMID: 25243952 PMCID: PMC4171485 DOI: 10.1371/journal.pone.0107959] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/β packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Complex Systems Science, Nagoya University, Nagoya, Aichi, Japan
| | - Kengo Sawada
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - George Chikenji
- Department of Computational Science and Engineering, Nagoya University, Nagoya, Aichi, Japan
- * E-mail:
| |
Collapse
|
26
|
Ben-Tal N, Kolodny R. Representation of the Protein Universe using Classifications, Maps, and Networks. Isr J Chem 2014. [DOI: 10.1002/ijch.201400001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
27
|
Shokry AM, Al-Karim S, Ramadan A, Gadallah N, Al Attas SG, Sabir JSM, Hassan SM, Madkour MA, Bressan R, Mahfouz M, Bahieldin A. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. C R Biol 2014; 337:86-94. [PMID: 24581802 DOI: 10.1016/j.crvi.2013.12.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2013] [Accepted: 12/20/2013] [Indexed: 11/18/2022]
Abstract
The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM).
Collapse
Affiliation(s)
- Ahmed M Shokry
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia; Agricultural Genetic Engineering Research Institute (AGERI), Agriculture Research Center (ARC), Giza, Egypt
| | - Saleh Al-Karim
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia
| | - Ahmed Ramadan
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia; Agricultural Genetic Engineering Research Institute (AGERI), Agriculture Research Center (ARC), Giza, Egypt
| | - Nour Gadallah
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia; Genetics and Cytology Department, Genetic Engineering and Biotechnology Division, National Research Center, Dokki, Egypt
| | - Sanaa G Al Attas
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia
| | - Jamal S M Sabir
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia
| | - Sabah M Hassan
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia; Department of Genetics, Faculty of Agriculture, Ain Shams University, Cairo, Egypt
| | - Magdy A Madkour
- Arid Lands Agricultural Research Institute, Ain Shams University, Cairo, Egypt
| | - Ray Bressan
- School of Agriculture, Purdue University, West Lafayette, Indiana, USA
| | - Magdy Mahfouz
- Division of Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Ahmed Bahieldin
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University (KAU), P.O. Box 80141, Jeddah 21589, Saudi Arabia; Department of Genetics, Faculty of Agriculture, Ain Shams University, Cairo, Egypt.
| |
Collapse
|
28
|
Shi JY, Yiu SM, Zhang YN, Chin FYL. Effective moment feature vectors for protein domain structures. PLoS One 2014; 8:e83788. [PMID: 24391828 DOI: 10.1371/journal.pone.0083788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 11/08/2013] [Indexed: 11/19/2022] Open
Abstract
Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Science, Northwestern Polytechnical University, Xi'an, Shaanxi Province, China ; Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Yan-Ning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi Province, China
| | | |
Collapse
|
29
|
Asarnow D, Singh R. The impact of structural diversity and parameterization on maps of the protein universe. BMC Proc 2013; 7:S1. [PMID: 24565442 PMCID: PMC4029320 DOI: 10.1186/1753-6561-7-s7-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Low dimensional maps of protein structure space (MPSS) provide a powerful global representation of all proteins. In such mappings structural relationships are depicted through spatial adjacency of points, each of which represents a molecule. MPSS can help in understanding the local and global topological characteristics of the structure space, as well as elucidate structure-function relationships within and between sets of proteins. A number of meta- and method-dependent parameters are involved in creating MPSS. However, at the state-of-the-art, a systematic investigation of the influence of these parameters on MPSS construction has yet to be carried out. Further, while specific cases in which MPSS out-perform pairwise distances for prediction of functional annotations have been noted, no general explanation for this phenomenon has yet been advanced. METHODS We address the above questions within the technical context of creating MPSS by utilizing multidimensional scaling (MDS) for obtaining low-dimensional projections of structure alignment distances. RESULTS AND CONCLUSION MDS is demonstrated as an effective method for construction of MPSS where related structures are co-located, even when their functional and evolutionary proximity cannot be deduced from distributions of pairwise comparisons alone. In particular, we show that MPSS exceed pairwise distance distributions in predictive capability for those annotations of shared function or origin which are characterized by a high level of structural diversity. We also determine the impact of the choice of structure alignment and MDS algorithms on the accuracy of such predictions.
Collapse
|
30
|
Edwards H, Abeln S, Deane CM. Exploring fold space preferences of new-born and ancient protein superfamilies. PLoS Comput Biol 2013; 9:e1003325. [PMID: 24244135 PMCID: PMC3828129 DOI: 10.1371/journal.pcbi.1003325] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 09/23/2013] [Indexed: 11/18/2022] Open
Abstract
The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.
Collapse
Affiliation(s)
- Hannah Edwards
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Sanne Abeln
- Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
31
|
Singh R, Yang H, Dalziel B, Asarnow D, Murad W, Foote D, Gormley M, Stillman J, Fisher S. Towards human-computer synergetic analysis of large-scale biological data. BMC Bioinformatics 2013; 14 Suppl 14:S10. [PMID: 24267485 PMCID: PMC3851181 DOI: 10.1186/1471-2105-14-s14-s10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Advances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself. RESULTS In context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Experiential Microarray Analysis System) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (Protein Space Explorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration. CONCLUSIONS The proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/.
Collapse
|
32
|
Sequence and structure space model of protein divergence driven by point mutations. J Theor Biol 2013; 330:1-8. [DOI: 10.1016/j.jtbi.2013.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 03/07/2013] [Accepted: 03/18/2013] [Indexed: 12/11/2022]
|
33
|
Mach P, Koehl P. Capturing protein sequence-structure specificity using computational sequence design. Proteins 2013; 81:1556-70. [DOI: 10.1002/prot.24307] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/28/2013] [Accepted: 04/11/2013] [Indexed: 02/05/2023]
Affiliation(s)
- Paul Mach
- Department of Applied Mathematics; Genome Center; University of California; Davis 95616 California
| | - Patrice Koehl
- Department of Computer Science; Genome Center; University of California; Davis 95616 California
| |
Collapse
|
34
|
Kolodny R, Kosloff M. From Protein Structure to Function via Computational Tools and Approaches. Isr J Chem 2013. [DOI: 10.1002/ijch.201200078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
35
|
Robertson JWF, Kasianowicz JJ, Banerjee S. Analytical Approaches for Studying Transporters, Channels and Porins. Chem Rev 2012; 112:6227-49. [DOI: 10.1021/cr300317z] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Joseph W. F. Robertson
- Physical Measurement Laboratory,
National Institute of Standards and Technology, Gaithersburg, Maryland
20899, United States
| | - John J. Kasianowicz
- Physical Measurement Laboratory,
National Institute of Standards and Technology, Gaithersburg, Maryland
20899, United States
| | - Soojay Banerjee
- National
Institute of Neurological
Disorders and Stroke, Bethesda, Maryland 20824, United States
| |
Collapse
|
36
|
Interaction between soluble and membrane-embedded potassium channel peptides monitored by Fourier transform infrared spectroscopy. PLoS One 2012; 7:e49070. [PMID: 23145073 PMCID: PMC3493504 DOI: 10.1371/journal.pone.0049070] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 10/08/2012] [Indexed: 11/19/2022] Open
Abstract
Recent studies have explored the utility of Fourier transform infrared spectroscopy (FTIR) in dynamic monitoring of soluble protein-protein interactions. Here, we investigated the applicability of FTIR to detect interaction between synthetic soluble and phospholipid-embedded peptides corresponding to, respectively, a voltage-gated potassium (Kv) channel inactivation domain (ID) and S4–S6 of the Shaker Kv channel (KV1; including the S4–S5 linker “pre-inactivation” ID binding site). KV1 was predominantly α-helical at 30°C when incorporated into dimyristoyl-l-α-phosphatidylcholine (DMPC) bilayers. Cooling to induce a shift in DMPC from liquid crystalline to gel phase reversibly decreased KV1 helicity, and was previously shown to partially extrude a synthetic S4 peptide. While no interaction was detected in liquid crystalline DMPC, upon cooling to induce the DMPC gel phase a reversible amide I peak (1633 cm−1) consistent with novel hydrogen bond formation was detected. This spectral shift was not observed for KV1 in the absence of ID (or vice versa), nor when the non-inactivating mutant V7E ID was applied to KV1 under similar conditions. Alteration of salt or redox conditions affected KV1-ID hydrogen bonding in a manner suggesting electrostatic KV1-ID interaction favored by a hairpin conformation for the ID and requiring extrusion of one or more KV1 domains from DMPC, consistent with ID binding to S4–S5. These findings support the utility of FTIR in detecting reversible interactions between soluble and membrane-embedded proteins, with lipid state-sensitivity of the conformation of the latter facilitating control of the interaction.
Collapse
|
37
|
A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition. Mol Phylogenet Evol 2012; 65:380-9. [DOI: 10.1016/j.ympev.2012.06.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Revised: 06/01/2012] [Accepted: 06/25/2012] [Indexed: 11/24/2022]
|
38
|
Classification of protein functional surfaces using structural characteristics. Proc Natl Acad Sci U S A 2012; 109:1170-5. [PMID: 22238424 DOI: 10.1073/pnas.1119684109] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein structure and function are closely related, especially in functional surfaces, which are local spatial regions that perform the biological functions. Also, protein structures tend to evolve more slowly than amino acid sequences. We have therefore developed a method to classify proteins using the structures of functional surfaces; we call it protein surface classification (PSC). PSC may reflect functional relationships among proteins and may detect evolutionary relationships among highly divergent sequences. We focused on the surfaces of ligand-bound regions because they represent well-defined structures. Specifically, we used structural attributes to measure similarities between binding surfaces and constructed a PSC library of ~2,000 binding surface types from the bound forms. Using flavin mononucleotide-binding proteins and glycosidases as examples, we show how the evolutionary position of an uncharacterized protein can be defined and its function inferred from the characterized members of the same surface subtype. We found that proteins with the same enzyme nomenclature may be divided into subtypes and that two proteins in the same CATH (Class, Architecture, Topology, Homologous superfamily) fold may belong to two different surface types. In conclusion, our approach complements the sequence-based and fold-domain classifications and has the advantage of associating the shape of a protein with its biological function. As an expandable library, PSC provides a resource of spatial patterns for studying the evolution of protein structure and function.
Collapse
|
39
|
Wang C, Mao X, Yang A, Niu L, Wang S, Li D, Guo Y, Wang Y, Yang Y, Wang C. Determination of relative binding affinities of labeling molecules with amino acids by using scanning tunneling microscopy. Chem Commun (Camb) 2011; 47:10638-40. [PMID: 21869951 DOI: 10.1039/c1cc12380g] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The binding behaviour of labeling molecule copper phthalocyanine tetrasulfonate sodium (PcCu(SO(3)Na)(4)) on the assemblies of representative polyamino acids has been studied by using scanning tunneling microscopy (STM). By directly visualizing the adsorption and distribution of the labeling species on the peptide assemblies in STM images, one could obtain relative binding affinities of the labeling molecule with different amino acid residues.
Collapse
Affiliation(s)
- Chenxuan Wang
- National Center for Nanoscience and Technology, Beijing 100190, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Maps of protein structure space reveal a fundamental relationship between protein structure and function. Proc Natl Acad Sci U S A 2011; 108:12301-6. [PMID: 21737750 DOI: 10.1073/pnas.1102727108] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To study the protein structure-function relationship, we propose a method to efficiently create three-dimensional maps of structure space using a very large dataset of > 30,000 Structural Classification of Proteins (SCOP) domains. In our maps, each domain is represented by a point, and the distance between any two points approximates the structural distance between their corresponding domains. We use these maps to study the spatial distributions of properties of proteins, and in particular those of local vicinities in structure space such as structural density and functional diversity. These maps provide a unique broad view of protein space and thus reveal previously undescribed fundamental properties thereof. At the same time, the maps are consistent with previous knowledge (e.g., domains cluster by their SCOP class) and organize in a unified, coherent representation previous observation concerning specific protein folds. To investigate the function-structure relationship, we measure the functional diversity (using the Gene Ontology controlled vocabulary) in local structural vicinities. Our most striking finding is that functional diversity varies considerably across structure space: The space has a highly diverse region, and diversity abates when moving away from it. Interestingly, the domains in this region are mostly alpha/beta structures, which are known to be the most ancient proteins. We believe that our unique perspective of structure space will open previously undescribed ways of studying proteins, their evolution, and the relationship between their structure and function.
Collapse
|
41
|
Aita T, Nishigaki K. A visualization of 3D proteome universe: mapping of a proteome ensemble into 3D space based on the protein-structure composition. Mol Phylogenet Evol 2011; 61:484-94. [PMID: 21762784 DOI: 10.1016/j.ympev.2011.06.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Revised: 06/23/2011] [Accepted: 06/25/2011] [Indexed: 10/18/2022]
Abstract
To visualize a bird's-eye view of an ensemble of proteomes for various species, we recently developed a novel method of mapping a proteome ensemble into Three-Dimensional (3D) vector space. In this study, the "proteome" is defined as the entire set of all proteins encoded in a genome sequence, and these proteins were dealt with at the level of the SCOP Fold. First, we represented the proteome of a species s by a 1053-dimensional vector x(s), where its length ∣x(s)∣ represents the overall amount of all the SCOP Folds in the proteome, and its unit vector x(s)/∣x(s)∣ represents the relative composition of the SCOP Folds in the proteome and the size of the dimension, 1053, is the number of all possible Folds in the proteome ensemble given. Second, we mapped the vector x(s) to the 3D vector y(s), based on the two simple principles: (1) ∣y(s)∣=∣x(s)∣, and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). We applied to the mapping of a proteome ensemble for 456 species, which were retrieved from the Genomes TO Protein structures and functions (GTOP) database. As a result, we succeeded in the mapping in that the properties of the 1053-dimensional vectors were quantitatively conserved in the 3D vectors. Particularly, the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.95-0.96). This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner.
Collapse
Affiliation(s)
- Takuyo Aita
- Graduate School of Science and Engineering, Department of Functional Materials Science, Faculty of Engineering, Saitama University, 255 Shimo-okubo, Saitama 338-8570, Japan.
| | | |
Collapse
|
42
|
Nguyen MN, Tan KP, Madhusudhan MS. CLICK--topology-independent comparison of biomolecular 3D structures. Nucleic Acids Res 2011; 39:W24-8. [PMID: 21602266 PMCID: PMC3125785 DOI: 10.1093/nar/gkr393] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Revised: 04/19/2011] [Accepted: 05/03/2011] [Indexed: 01/28/2023] Open
Abstract
Our server, CLICK: http://mspc.bii.a-star.edu.sg/click, is capable of superimposing the 3D structures of any pair of biomolecules (proteins, DNA, RNA, etc.). The server makes use of the Cartesian coordinates of the molecules with the option of using other structural features such as secondary structure, solvent accessible surface area and residue depth to guide the alignment. CLICK first looks for cliques of points (3-7 residues) that are structurally similar in the pair of structures to be aligned. Using these local similarities, a one-to-one equivalence is charted between the residues of the two structures. A least square fit then superimposes the two structures. Our method is especially powerful in establishing protein relationships by detecting similarities in structural subdomains, domains and topological variants. CLICK has been extensively benchmarked and compared with other popular methods for protein and RNA structural alignments. In most cases, CLICK alignments were statistically significantly better in terms of structure overlap. The method also recognizes conformational changes that may have occurred in structural domains or subdomains in one structure with respect to the other. For this purpose, the server produces complementary alignments to maximize the extent of detectable similarity. Various examples showcase the utility of our web server.
Collapse
Affiliation(s)
- M. N. Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - K. P. Tan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - M. S. Madhusudhan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
43
|
Practical applications of structural genomics technologies for mutagen research. Mutat Res 2011; 722:165-70. [PMID: 21182983 DOI: 10.1016/j.mrgentox.2010.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 12/10/2010] [Indexed: 11/23/2022]
Abstract
Here we present a perspective on a range of practical uses of structural genomics for mutagen research. Structural genomics is an overloaded term and requires some definition to bound the discussion; we give a brief description of public and private structural genomics endeavors, along with some of their objectives, their activities, their capabilities, and their limitations. We discuss how structural genomics might impact mutagen research in three different scenarios: at a structural genomics center, at a lab with modest resources that also conducts structural biology research, and at a lab that is conducting mutagen research without in-house experimental structural biology. Applications span functional annotation of single genes or SNP, to constructing gene networks and pathways, to an integrated systems biology approach. Structural genomics centers can take advantage of systems biology models to target high value targets for structure determination and in turn extend systems models to better understand systems biology diseases or phenomenon. Individual investigator run structural biology laboratories can collaborate with structural genomics centers, but can also take advantage of technical advances and tools developed by structural genomics centers and can employ a structural genomics approach to advancing biological understanding. Individual investigator-run non-structural biology laboratories can also collaborate with structural genomics centers, possibly influencing targeting decisions, but can also use structure based annotation tools enabled by the growing coverage of protein fold space provided by structural genomics. Better functional annotation can inform pathway and systems biology models.
Collapse
|
44
|
Newman J. One plate, two plates, a thousand plates. How crystallisation changes with large numbers of samples. Methods 2011; 55:73-80. [PMID: 21571072 DOI: 10.1016/j.ymeth.2011.04.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2010] [Revised: 04/28/2011] [Accepted: 04/29/2011] [Indexed: 11/15/2022] Open
Abstract
Turning commercial lab automation into a high-throughput centre requires an underlying process, and implementing checks to ensure that the process is working as it should. At the Collaborative Crystallisation Centre (C3), protein samples from local, national and international groups are set up in crystallisation screening and optimisation experiments with two thousand 96 well plates being set up each year. During its five years of operation, the C3 has implemented a series of enabling protocols - from simple 'reality checks' to determine if a screen has evaporated during storage to more sophisticated systems such as a sample labelling and tracking system. The most important - and perhaps surprising - lesson has been how much effort is required to effectively communicate between the centre and its clients, as well as between the centre's staff members. It is easy to confuse the concept of 'high throughput' in any field with the idea of setting up an experiment quickly. Although automation can be used to set up a single experiment more rapidly than can be done by hand, the distinguishing feature of a high throughput technology is the sustainability of the increased rate.
Collapse
Affiliation(s)
- Janet Newman
- CSIRO Material Science and Engineering, 343 Royal Parade, Parkville VIC 3052, Australia.
| |
Collapse
|
45
|
Pelé J, Abdi H, Moreau M, Thybert D, Chabbert M. Multidimensional scaling reveals the main evolutionary pathways of class A G-protein-coupled receptors. PLoS One 2011; 6:e19094. [PMID: 21544207 PMCID: PMC3081337 DOI: 10.1371/journal.pone.0019094] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 03/16/2011] [Indexed: 11/21/2022] Open
Abstract
Class A G-protein-coupled receptors (GPCRs) constitute the largest family of transmembrane receptors in the human genome. Understanding the mechanisms which drove the evolution of such a large family would help understand the specificity of each GPCR sub-family with applications to drug design. To gain evolutionary information on class A GPCRs, we explored their sequence space by metric multidimensional scaling analysis (MDS). Three-dimensional mapping of human sequences shows a non-uniform distribution of GPCRs, organized in clusters that lay along four privileged directions. To interpret these directions, we projected supplementary sequences from different species onto the human space used as a reference. With this technique, we can easily monitor the evolutionary drift of several GPCR sub-families from cnidarians to humans. Results support a model of radiative evolution of class A GPCRs from a central node formed by peptide receptors. The privileged directions obtained from the MDS analysis are interpretable in terms of three main evolutionary pathways related to specific sequence determinants. The first pathway was initiated by a deletion in transmembrane helix 2 (TM2) and led to three sub-families by divergent evolution. The second pathway corresponds to the differentiation of the amine receptors. The third pathway corresponds to parallel evolution of several sub-families in relation with a covarion process involving proline residues in TM2 and TM5. As exemplified with GPCRs, the MDS projection technique is an important tool to compare orthologous sequence sets and to help decipher the mutational events that drove the evolution of protein families.
Collapse
Affiliation(s)
- Julien Pelé
- CNRS UMR 6214 – INSERM 771, Faculté de Médecine, Angers, France
| | - Hervé Abdi
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas, United States of America
| | - Matthieu Moreau
- CNRS UMR 6214 – INSERM 771, Faculté de Médecine, Angers, France
| | - David Thybert
- CNRS UMR 6214 – INSERM 771, Faculté de Médecine, Angers, France
| | - Marie Chabbert
- CNRS UMR 6214 – INSERM 771, Faculté de Médecine, Angers, France
| |
Collapse
|
46
|
Ikebe J, Standley DM, Nakamura H, Higo J. Ab initio simulation of a 57-residue protein in explicit solvent reproduces the native conformation in the lowest free-energy cluster. Protein Sci 2011; 20:187-96. [PMID: 21082745 DOI: 10.1002/pro.553] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An enhanced conformational sampling method, multicanonical molecular dynamics (McMD), was applied to the ab intio folding of the 57-residue first repeat of human glutamyl- prolyl-tRNA synthetase (EPRS-R1) in explicit solvent. The simulation started from a fully extended structure of EPRS-R1 and did not utilize prior structural knowledge. A canonical ensemble, which is a conformational ensemble thermodynamically probable at an arbitrary temperature, was constructed by reweighting the sampled structures. Conformational clusters were obtained from the canonical ensemble at 300 K, and the largest cluster (i.e., the lowest free-energy cluster), which contained 34% of the structures in the ensemble, was characterized by the highest similarity to the NMR structure relative to all alternative clusters. This lowest free-energy cluster included native-like structures composed of two anti-parallel α-helices. The canonical ensemble at 300 K also showed that a short Gly-containing segment, which adopts an α-helix in the native structure, has a tendency to be structurally disordered. Atomic-level analyses demonstrated clearly that inter-residue hydrophobic interactions drive the helix formation of the Gly-containing segment, and that increasing the hydrophobic contacts accompanies exclusion of water molecules from the vicinity of this segment. This study has shown, for the first time, that the free-energy landscape of a structurally well-ordered protein of about 60 residues is obtainable with an all atom model in explicit water without prior structural knowledge.
Collapse
Affiliation(s)
- Jinzen Ikebe
- Graduate School of Frontier Biosciences, Osaka University, Open Laboratories for Advanced Bioscience and Biotechnology, Suita, Osaka 565-0874, Japan
| | | | | | | |
Collapse
|
47
|
Abstract
It is well known that the set of observed topological arrangements of secondary structures in globular proteins is highly limited. These limitations have been explained as the consequence of several rules of thumb including a strong preference for right-handed connections, against crossing loops and certain beta strand patterns. We present a critical evaluation of the power of these rules to distinguish known from possible topologies in a large set of two- and three-layer protein structures and determine that although these rules are still largely valid, an increasing number of exceptions can be found to many of them. The rules are then used to construct a generalised linear model for assessing the probability of occurrence of an arbitrary topology in the PDB. Application of the model to a large set of topologies generated during structure prediction showed that many had a similar probability of occurrence to known PDB folds.
Collapse
Affiliation(s)
- Ben Grainger
- Division of Mathematical Biology, National Institute for Medical Research, London, United Kingdom.
| | | | | |
Collapse
|
48
|
Abstract
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen 72076, Germany
| | | | | | | | | |
Collapse
|
49
|
Carugo O. Clustering tendency in the protein fold space. Bioinformation 2010; 4:347-51. [PMID: 20975898 PMCID: PMC2951670 DOI: 10.6026/97320630004347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2009] [Accepted: 07/23/2009] [Indexed: 11/23/2022] Open
Abstract
Several non-redundant ensembles of protein three-dimensional structures were analyzed in order to estimate their natural clustering tendency by means of the Cox-Lewis coefficient. It was observed that, despite proteins tend to aggregate into different and well separated groups, some overlap between different clusters occurs. This suggests that classifications bases only on structural data cannot allow a systematic classification of proteins. Additional information are in particular needed in order to monitor completely the complex evolutionary relationships between proteins.
Collapse
Affiliation(s)
- Oliviero Carugo
- Department of General Chemistry, Pavia University, viale Taramelli 12, I-27100 Pavia, Italy and Department of Biomolecular Structural Chemistry, MFPL - Vienna University, Campus Vienna Biocenter 5, A-1030 Vienna, Austria.
| |
Collapse
|
50
|
Sadowski MI, Taylor WR. Protein structures, folds and fold spaces. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:033103. [PMID: 21386276 DOI: 10.1088/0953-8984/22/3/033103] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
There has been considerable progress towards the goal of understanding the space of possible tertiary structures adopted by proteins. Despite a greatly increased rate of structure determination and a deliberate strategy of sequencing proteins expected to be very different from those already known, it is now rare to see a genuinely new fold, leading to the conclusion that we have seen the majority of natural structural types. The increase in knowledge has also led to a critical examination of traditional fold-based classifications and their meaning for evolution and protein structures. We review these issues and discuss possible solutions.
Collapse
Affiliation(s)
- Michael I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | | |
Collapse
|