1
|
Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020; 35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. RESULTS We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. AVAILABILITY AND IMPLEMENTATION All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, Paris, France
| | - Jacques Chomilier
- Sorbonne Université, MNHN, CNRS, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC), BiBiP, Paris, France
| |
Collapse
|
2
|
Fallaize CJ, Green PJ, Mardia KV, Barber S. Bayesian protein sequence and structure alignment. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
| | - Peter J. Green
- University of Bristol UK
- University of Technology Sydney Australia
| | | | | |
Collapse
|
3
|
Basile W, Salvatore M, Bassot C, Elofsson A. Why do eukaryotic proteins contain more intrinsically disordered regions? PLoS Comput Biol 2019; 15:e1007186. [PMID: 31329574 PMCID: PMC6675126 DOI: 10.1371/journal.pcbi.1007186] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 08/01/2019] [Accepted: 06/14/2019] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder? Intrinsic disorder is essential for various functions in eukaryotic cells and is a signature of eukaryotic proteins. Here, we try to understand the origin of the difference in disorder between eukaryotic and prokaryotic proteins. We show that eukaryotic proteins contain more extended linker regions and that these linker regions are significantly more disordered. Further, we show, for the first time, that the difference in disorder originates from a systematic difference in amino acid frequencies between eukaryotic and prokaryotic proteins. Three amino acids contribute to the difference in disorder; serine and proline are more abundant in eukaryotic linkers, while isoleucine is less frequent. These shifts in frequencies are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. Anyhow the widespread of the shifts in abundance indicates that the differences are ancient and caused be some yet not fully understood selective difference acting on eukaryotic and prokaryotic proteins.
Collapse
Affiliation(s)
- Walter Basile
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Marco Salvatore
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Claudio Bassot
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Swedish e-Science Research Center (SeRC), Stockholm, Sweden
- * E-mail:
| |
Collapse
|
4
|
Kryshtafovych A, Adams PD, Lawson CL, Chiu W. Evaluation system and web infrastructure for the second cryo-EM model challenge. J Struct Biol 2018; 204:96-108. [PMID: 30017700 DOI: 10.1016/j.jsb.2018.07.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 07/06/2018] [Accepted: 07/10/2018] [Indexed: 01/01/2023]
Abstract
An evaluation system and a web infrastructure were developed for the second cryo-EM model challenge. The evaluation system includes tools to validate stereo-chemical plausibility of submitted models, check their fit to the corresponding density maps, estimate their overall and per-residue accuracy, and assess their similarity to reference cryo-EM or X-ray structures as well as other models submitted in this challenge. The web infrastructure provides a convenient interface for analyzing models at different levels of detail. It includes interactively sortable tables of evaluation scores for different subsets of models and different sublevels of structure organization, and a suite of visualization tools facilitating model analysis. The results are publicly accessible at http://model-compare.emdatabank.org.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA.
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging, LBNL, CA 94720, USA; Department of Bioengineering, University of California Berkeley, CA 94720, USA
| | - Catherine L Lawson
- Institute for Quantitative Biomedicine and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Wah Chiu
- Departments of Bioengineering and Microbiology & Immunology, Stanford University, Stanford, CA 94305-5447, USA; Division of CryoEM and Bioimaging, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| |
Collapse
|
5
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
6
|
Computational Modeling of the Staphylococcal Enterotoxins and Their Interaction with Natural Antitoxin Compounds. Int J Mol Sci 2018; 19:ijms19010133. [PMID: 29301344 PMCID: PMC5796082 DOI: 10.3390/ijms19010133] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 12/26/2017] [Accepted: 12/27/2017] [Indexed: 01/08/2023] Open
Abstract
Staphylococcus aureus is an opportunistic bacterium that produces various types of toxins, resulting in serious food poisoning. Staphylococcal enterotoxins (SEs) are heat-stable and resistant to hydrolysis by digestive enzymes, representing a potential hazard for consumers worldwide. In the present study, we used amino-acid sequences encoding SEA and SEB-like to identify their respective template structure and build the three-dimensional (3-D) models using homology modeling method. Two natural compounds, Betulin and 28-Norolean-12-en-3-one, were selected for docking study on the basis of the criteria that they satisfied the Lipinski’s Rule-of-Five. A total of 14 and 13 amino-acid residues were present in the best binding site predicted in the SEA and SEB-like, respectively, using the Computer Atlas of Surface Topology of Proteins (CASTp). Among these residues, the docking study with natural compounds Betulin and 28-Norolean-12-en-3-one revealed that GLN43 and GLY227 in the binding site of the SEA, each formed a hydrogen-bond interaction with 28-Norolean-12-en-3-one; while GLY227 residue established a hydrogen bond with Betulin. In the case of SEB-like, the docking study demonstrated that ASN87 and TYR88 residues in its binding site formed hydrogen bonds with Betulin; whereas HIS59 in the binding site formed a hydrogen-bond interaction with 28-Norolean-12-en-3-one. Our results demonstrate that the toxic effects of these two SEs can be effectively treated with antitoxins like Betulin and 28-Norolean-12-en-3-one, which could provide an effective drug therapy for this pathogen.
Collapse
|
7
|
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628592. [PMID: 27747230 PMCID: PMC5056246 DOI: 10.1155/2016/4628592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/25/2016] [Indexed: 11/24/2022]
Abstract
Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Collapse
|
8
|
ProQ3: Improved model quality assessments using Rosetta energy terms. Sci Rep 2016; 6:33509. [PMID: 27698390 PMCID: PMC5048106 DOI: 10.1038/srep33509] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 08/26/2016] [Indexed: 01/17/2023] Open
Abstract
Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.
Collapse
|
9
|
Fakhar Z, Naiker S, Alves CN, Govender T, Maguire GEM, Lameira J, Lamichhane G, Kruger HG, Honarparvar B. A comparative modeling and molecular docking study on Mycobacterium tuberculosis targets involved in peptidoglycan biosynthesis. J Biomol Struct Dyn 2016; 34:2399-417. [PMID: 26612108 DOI: 10.1080/07391102.2015.1117397] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
An alarming rise of multidrug-resistant Mycobacterium tuberculosis strains and the continuous high global morbidity of tuberculosis have reinvigorated the need to identify novel targets to combat the disease. The enzymes that catalyze the biosynthesis of peptidoglycan in M. tuberculosis are essential and noteworthy therapeutic targets. In this study, the biochemical function and homology modeling of MurI, MurG, MraY, DapE, DapA, Alr, and Ddl enzymes of the CDC1551 M. tuberculosis strain involved in the biosynthesis of peptidoglycan cell wall are reported. Generation of the 3D structures was achieved with Modeller 9.13. To assess the structural quality of the obtained homology modeled targets, the models were validated using PROCHECK, PDBsum, QMEAN, and ERRAT scores. Molecular dynamics simulations were performed to calculate root mean square deviation (RMSD) and radius of gyration (Rg) of MurI and MurG target proteins and their corresponding templates. For further model validation, RMSD and Rg for selected targets/templates were investigated to compare the close proximity of their dynamic behavior in terms of protein stability and average distances. To identify the potential binding mode required for molecular docking, binding site information of all modeled targets was obtained using two prediction algorithms. A docking study was performed for MurI to determine the potential mode of interaction between the inhibitor and the active site residues. This study presents the first accounts of the 3D structural information for the selected M. tuberculosis targets involved in peptidoglycan biosynthesis.
Collapse
Affiliation(s)
- Zeynab Fakhar
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Suhashni Naiker
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Claudio N Alves
- b Laboratório de Planejamento de Fármacos, Instituto de Ciências Exatas e Naturais , Instituto de Ciências Biológicas, Universidade Federal do Pará , CEP 66075-110, Belém , Pará , Brazil
| | - Thavendran Govender
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Glenn E M Maguire
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa.,c School of Chemistry and Physics , University of KwaZulu-Natal , 4001 Durban , South Africa
| | - Jeronimo Lameira
- b Laboratório de Planejamento de Fármacos, Instituto de Ciências Exatas e Naturais , Instituto de Ciências Biológicas, Universidade Federal do Pará , CEP 66075-110, Belém , Pará , Brazil
| | - Gyanu Lamichhane
- d Division of Infectious Diseases, Center for Tuberculosis Research , Johns Hopkins University School of Medicine , Baltimore , MD 21205 , USA
| | - Hendrik G Kruger
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| | - Bahareh Honarparvar
- a Catalysis and Peptide Research Unit, School of Health Sciences , University of KwaZulu-Natal , Durban 4001 , South Africa
| |
Collapse
|
10
|
Aslam N, Nadeem A, Babar ME, Pervez MT, Aslam M, Naveed N, Hussain T, Shehzad W, Wasim M, Bao Z, Javed M. The accuracy of protein structure alignment servers. ELECTRON J BIOTECHN 2016. [DOI: 10.1016/j.ejbt.2016.01.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
11
|
Zhao C, Sacan A. UniAlign: protein structure alignment meets evolution. Bioinformatics 2015; 31:3139-46. [PMID: 26059715 DOI: 10.1093/bioinformatics/btv354] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 06/02/2015] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION During the evolution, functional sites on the surface of the protein as well as the hydrophobic core maintaining the structural integrity are well-conserved. However, available protein structure alignment methods align protein structures based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins. RESULTS In this article, we propose a new protein pairwise structure alignment algorithm (UniAlign) that incorporates additional evolutionary information captured in the form of sequence similarity, sequence profiles and residue conservation. We define a per-residue score (UniScore) as a weighted sum of these and other features and develop an iterative optimization procedure to search for an alignment with the best overall UniScore. Our extensive experiments on CDD, HOMSTRAD and BAliBASE benchmark datasets show that UniAlign outperforms commonly used structure alignment methods. We further demonstrate UniAlign's ability to develop family-specific models to drastically improve the quality of the alignments. AVAILABILITY AND IMPLEMENTATION UniAlign is available as a web service at: http://sacan.biomed.drexel.edu/unialign CONTACT ahmet.sacan@drexel.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chunyu Zhao
- Center for Integrated Bioinformatics, School of Biomedical Engineering, Science and Health System, Drexel University, Philadelphia, PA 19104, USA
| | - Ahmet Sacan
- Center for Integrated Bioinformatics, School of Biomedical Engineering, Science and Health System, Drexel University, Philadelphia, PA 19104, USA
| |
Collapse
|
12
|
Pang B, Schlessman D, Kuang X, Zhao N, Shyu D, Korkin D, Shyu CR. An Integrated Approach to Sequence-Independent Local Alignment of Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:298-308. [PMID: 26357218 DOI: 10.1109/tcbb.2014.2355208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurate alignment of protein-protein binding sites can aid in protein docking studies and constructing templates for predicting structure of protein complexes, along with in-depth understanding of evolutionary and functional relationships. However, over the past three decades, structural alignment algorithms have focused predominantly on global alignments with little effort on the alignment of local interfaces. In this paper, we introduce the PBSalign (Protein-protein Binding Site alignment) method, which integrates techniques in graph theory, 3D localized shape analysis, geometric scoring, and utilization of physicochemical and geometrical properties. Computational results demonstrate that PBSalign is capable of identifying similar homologous and analogous binding sites accurately and performing alignments with better geometric match measures than existing protein-protein interface comparison tools. The proportion of better alignment quality generated by PBSalign is 46, 56, and 70 percent more than iAlign as judged by the average match index (MI), similarity index (SI), and structural alignment score (SAS), respectively. PBSalign provides the life science community an efficient and accurate solution to binding-site alignment while striking the balance between topological details and computational complexity.
Collapse
|
13
|
Tai CH, Paul R, Dukka KC, Shilling JD, Lee B. SymD webserver: a platform for detecting internally symmetric protein structures. Nucleic Acids Res 2014; 42:W296-300. [PMID: 24799435 PMCID: PMC4086132 DOI: 10.1093/nar/gku364] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Internal symmetry of a protein structure is the pseudo-symmetry that a single protein chain sometimes exhibits. This is in contrast to the symmetry with which monomers are arranged in many multimeric protein complexes. SymD is a program that detects proteins with internal symmetry. It proved to be useful for analyzing protein structure, function and modeling. This web-based interactive tool was developed by implementing the SymD algorithm. To the best of our knowledge, SymD webserver is the first tool of its kind with which users can easily study the symmetry of the protein they are interested in by uploading the structure or retrieving it from databases. It uses the Galaxy platform to take advantage of its extensibility and displays the symmetry properties, the symmetry axis and the sequence alignment of the structures before and after the symmetry transformation via an interactive graphical visualization environment in any modern web browser. An Example Run video displays the workflow to help users navigate. SymD webserver is publicly available at http://symd.nci.nih.gov.
Collapse
Affiliation(s)
- Chin-Hsien Tai
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Rohit Paul
- Office of Information Technology, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | | | - Jeffery D Shilling
- Office of Information Technology, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | - Byungkook Lee
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
14
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|
15
|
Topham CM, Rouquier M, Tarrat N, André I. Adaptive Smith-Waterman residue match seeding for protein structural alignment. Proteins 2013; 81:1823-39. [DOI: 10.1002/prot.24327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 04/22/2013] [Accepted: 05/15/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Christopher M. Topham
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Mickaël Rouquier
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Nathalie Tarrat
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Isabelle André
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| |
Collapse
|
16
|
Vyas VK, Ukawala RD, Ghate M, Chintha C. Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci 2012. [PMID: 23204616 PMCID: PMC3507339 DOI: 10.4103/0250-474x.102537] [Citation(s) in RCA: 139] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Collapse
Affiliation(s)
- V K Vyas
- Department of Pharmaceutical Chemistry, Institute of Pharmacy, Nirma University, Ahmedabad-382 481, India
| | | | | | | |
Collapse
|
17
|
Ritchie DW, Ghoorah AW, Mavridis L, Venkatraman V. Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity. Bioinformatics 2012; 28:3274-81. [DOI: 10.1093/bioinformatics/bts618] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
18
|
Kim JK, Kim DS. BetaSuperposer: superposition of protein surfaces using beta-shapes. J Biomol Struct Dyn 2012; 30:684-700. [DOI: 10.1080/07391102.2012.689700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
19
|
Korenblat K, Volkovich Z, Bolshoy A. Robust classifying of prokaryotic genomes. Comput Biol Chem 2012; 40:20-9. [PMID: 22940609 DOI: 10.1016/j.compbiolchem.2012.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Revised: 07/03/2012] [Accepted: 07/03/2012] [Indexed: 01/07/2023]
Abstract
In this paper, we propose a method to classify prokaryotic genomes using the agglomerative information bottleneck method for unsupervised clustering. Although the method we present here is closely related to a group of methods based on detecting the presence or absence of genes, our method is different because it uses gene lengths as well. We show that this amended method is reliable. For robustness evaluation, we apply bootstrap and jackknife techniques to input data. As a result, we are able to propose an approach to determine the stability level of a cladogram. We demonstrate that the genome tree produced for a selected small group of genomes looks a lot like a phylogenetic tree of this group.
Collapse
Affiliation(s)
- Katerina Korenblat
- Software Engineering Department, ORT Braude Academic College, Karmiel, Israel
| | | | | |
Collapse
|
20
|
Khazanov NA, Damm-Ganamet KL, Quang DX, Carlson HA. Overcoming sequence misalignments with weighted structural superposition. Proteins 2012; 80:2523-35. [PMID: 22733542 DOI: 10.1002/prot.24134] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2011] [Revised: 06/05/2012] [Accepted: 06/10/2012] [Indexed: 11/09/2022]
Abstract
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian-weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary-structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics-scale analysis. HwRMSD can align homologs with low-sequence identity and large conformational differences, cases where both sequence-based and structural-based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence-alignment method, substitution matrix, and gap parameters for each unique pair of homologs.
Collapse
Affiliation(s)
- Nickolay A Khazanov
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | | | | | | |
Collapse
|
21
|
Hung K, Wang JC, Chen CW, Chuang CL, Tsai KN, Chen CM. Enhancement of initial equivalency for protein structure alignment based on encoded local structures. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2012; 16:1185-92. [PMID: 22717522 DOI: 10.1109/titb.2012.2204892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Most alignment algorithms find an initial equivalent residue pair followed by an iterative optimization process to explore better near-optimal alignments in the surrounding solution space of the initial alignment. It plays a decisive role in determining the alignment quality since a poor initial alignment may make the final alignment trapped in an undesirable local optimum even with an iterative optimization. We proposed a vector-based alignment algorithm with a new initial alignment approach accounting for local structure features called MIRAGE-align. The new idea is to enhance the quality of the initial alignment based on encoded local structural alphabets to identify the protein structure pair whose sequence identity falls in or below twilight zone. The statistical analysis of alignment quality based on Match Index (MI) and computation time demonstrated that MIRAGE-align algorithm outperformed four previously published algorithms, i.e., the residue-based algorithm (CE), the vector-based algorithm (SSM), TM-align, and Fr-TM-align. MIRAGE-align yields a better estimate of initial solution to enhance the quality of initial alignment and enable the employment of a non-iterative optimization process to achieve a better alignment.
Collapse
|
22
|
Abstract
Motivation: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity. A more sophisticated measure of structure similarity, Template Modeling (TM)-score, avoids these problems, and it is one of the measures used by the community-wide experiments of critical assessment of protein structure prediction to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. This acceleration in speed allows TM-score to be used efficiently in computationally intensive applications such as for clustering of protein models and genome-wide comparisons of structure. Results: TM-score-GPU was applied to six sets of models from Nutritious Rice for the World for a total of 3 million comparisons. TM-score-GPU is 68 times faster on an ATI 5870 GPU, on average, than the original CPU single-threaded implementation on an AMD Phenom II 810 quad-core processor. Availability and implementation: The complete source, including the GPU code and the hybrid RMSD subroutine, can be downloaded and used without restriction at http://software.compbio.washington.edu/misc/downloads/tmscore/. The implementation is in C++/OpenCL. Contact:ram@compbio.washington.edu Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ling-Hong Hung
- Department of Microbiology, University of Washington, Seattle, WA 98195-7735, USA
| | | |
Collapse
|
23
|
Sacan A, Ekins S, Kortagere S. Applications and limitations of in silico models in drug discovery. Methods Mol Biol 2012; 910:87-124. [PMID: 22821594 DOI: 10.1007/978-1-61779-965-5_6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery in the late twentieth and early twenty-first century has witnessed a myriad of changes that were adopted to predict whether a compound is likely to be successful, or conversely enable identification of molecules with liabilities as early as possible. These changes include integration of in silico strategies for lead design and optimization that perform complementary roles to that of the traditional in vitro and in vivo approaches. The in silico models are facilitated by the availability of large datasets associated with high-throughput screening, bioinformatics algorithms to mine and annotate the data from a target perspective, and chemoinformatics methods to integrate chemistry methods into lead design process. This chapter highlights the applications of some of these methods and their limitations. We hope this serves as an introduction to in silico drug discovery.
Collapse
Affiliation(s)
- Ahmet Sacan
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | | | | |
Collapse
|
24
|
Sun H, Sacan A, Ferhatosmanoglu H, Wang Y. Smolign: a spatial motifs-based protein multiple structural alignment method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:249-261. [PMID: 21464513 DOI: 10.1109/tcbb.2011.67] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign.
Collapse
Affiliation(s)
- Hong Sun
- The Ohio State University, Columbus
| | | | | | | |
Collapse
|
25
|
Daniels NM, Kumar A, Cowen LJ, Menke M. Touring protein space with Matt. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:286-93. [PMID: 21464511 PMCID: PMC3355523 DOI: 10.1109/tcbb.2011.70] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.
Collapse
Affiliation(s)
- Noah M. Daniels
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Anoop Kumar
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Lenore J. Cowen
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Matt Menke
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| |
Collapse
|
26
|
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. ACTA ACUST UNITED AC 2011; 12:181-9. [DOI: 10.1007/s10969-011-9119-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 11/24/2011] [Indexed: 10/14/2022]
|
27
|
Can T, Wang YF. PROTEIN STRUCTURE ALIGNMENT AND FAST SIMILARITY SEARCH USING LOCAL SHAPE SIGNATURES. J Bioinform Comput Biol 2011; 2:215-39. [PMID: 15272439 DOI: 10.1142/s0219720004000533] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2003] [Revised: 11/17/2003] [Accepted: 01/28/2004] [Indexed: 11/18/2022]
Abstract
We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.
Collapse
Affiliation(s)
- Tolga Can
- Department of Computer Science, University of California at Santa Barbara, Santa Barbara, CA 93106, USA.
| | | |
Collapse
|
28
|
SALEM SAEED, ZAKI MOHAMMEDJ, BYSTROFF CHRISTOPHER. ITERATIVE NON-SEQUENTIAL PROTEIN STRUCTURAL ALIGNMENT. J Bioinform Comput Biol 2011; 7:571-96. [PMID: 19507290 DOI: 10.1142/s0219720009004205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Revised: 11/05/2008] [Accepted: 11/06/2008] [Indexed: 11/18/2022]
Abstract
Structural similarity between proteins gives us insights into their evolutionary relationships when there is low sequence similarity. In this paper, we present a novel approach called SNAP for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process consisting of a superposition step and an alignment step, until convergence. We propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of SNAP alignments were assessed by comparing against the manually curated reference alignments in the challenging SISY and RIPC datasets. Moreover, when applied to a dataset of 4410 protein pairs selected from the CATH database, SNAP produced longer alignments with lower rmsd than several state-of-the-art alignment methods. Classification of folds using SNAP alignments was both highly sensitive and highly selective. The SNAP software along with the datasets are available online at
Collapse
Affiliation(s)
- SAEED SALEM
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| | - MOHAMMED J. ZAKI
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| | - CHRISTOPHER BYSTROFF
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
- Department of Biology, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| |
Collapse
|
29
|
Mavridis L, Ghoorah AW, Venkatraman V, Ritchie DW. Representing and comparing protein folds and fold families using three-dimensional shape-density representations. Proteins 2011; 80:530-45. [DOI: 10.1002/prot.23218] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Revised: 09/02/2011] [Accepted: 09/04/2011] [Indexed: 11/11/2022]
|
30
|
Finzel BC, Akavaram R, Ragipindi A, Van Voorst JR, Cahn M, Davis ME, Pokross ME, Sheriff S, Baldwin ET. Conserved Core Substructures in the Overlay of Protein–Ligand Complexes. J Chem Inf Model 2011; 51:1931-41. [DOI: 10.1021/ci100475y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Barry C. Finzel
- Department of Medicinal Chemistry, University of Minnesota College of Pharmacy, Minneapolis, Minnesota 55455, United States
| | - Ramprasad Akavaram
- Department of Medicinal Chemistry, University of Minnesota College of Pharmacy, Minneapolis, Minnesota 55455, United States
| | - Aravind Ragipindi
- Department of Medicinal Chemistry, University of Minnesota College of Pharmacy, Minneapolis, Minnesota 55455, United States
| | - Jeffrey R. Van Voorst
- Department of Medicinal Chemistry, University of Minnesota College of Pharmacy, Minneapolis, Minnesota 55455, United States
| | - Matthew Cahn
- BioPharma Information Technologies, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, United States
| | - Malcolm E. Davis
- Research & Development, Chemical and Protein Technologies, Molecular Sciences and Candidate Optimization, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, United States
| | - Matt E. Pokross
- Research & Development, Chemical and Protein Technologies, Molecular Sciences and Candidate Optimization, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, United States
| | - Steven Sheriff
- Research & Development, Chemical and Protein Technologies, Molecular Sciences and Candidate Optimization, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, United States
| | - Eric T. Baldwin
- Research & Development, Chemical and Protein Technologies, Molecular Sciences and Candidate Optimization, Bristol-Myers Squibb Company, Princeton, New Jersey 08543, United States
| |
Collapse
|
31
|
Shen YF, Li B, Liu ZP. Protein structure alignment based on internal coordinates. Interdiscip Sci 2010; 2:308-19. [DOI: 10.1007/s12539-010-0019-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Revised: 01/05/2010] [Accepted: 01/06/2010] [Indexed: 10/18/2022]
|
32
|
Bertolazzi P, Guerra C, Liuzzi G. A global optimization algorithm for protein surface alignment. BMC Bioinformatics 2010; 11:488. [PMID: 20920230 PMCID: PMC2957401 DOI: 10.1186/1471-2105-11-488] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2009] [Accepted: 09/29/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A relevant problem in drug design is the comparison and recognition of protein binding sites. Binding sites recognition is generally based on geometry often combined with physico-chemical properties of the site since the conformation, size and chemical composition of the protein surface are all relevant for the interaction with a specific ligand. Several matching strategies have been designed for the recognition of protein-ligand binding sites and of protein-protein interfaces but the problem cannot be considered solved. RESULTS In this paper we propose a new method for local structural alignment of protein surfaces based on continuous global optimization techniques. Given the three-dimensional structures of two proteins, the method finds the isometric transformation (rotation plus translation) that best superimposes active regions of two structures. We draw our inspiration from the well-known Iterative Closest Point (ICP) method for three-dimensional (3D) shapes registration. Our main contribution is in the adoption of a controlled random search as a more efficient global optimization approach along with a new dissimilarity measure. The reported computational experience and comparison show viability of the proposed approach. CONCLUSIONS Our method performs well to detect similarity in binding sites when this in fact exists. In the future we plan to do a more comprehensive evaluation of the method by considering large datasets of non-redundant proteins and applying a clustering technique to the results of all comparisons to classify binding sites.
Collapse
Affiliation(s)
- Paola Bertolazzi
- Istituto di Analisi dei Sistemi ed Informatica A. Ruberti, Consiglio Nazionale delle Ricerche, Viale Manzoni, 30, 00185 Rome, Italy
| | | | | |
Collapse
|
33
|
Jain P, Hirst JD. Automatic structure classification of small proteins using random forest. BMC Bioinformatics 2010; 11:364. [PMID: 20594334 PMCID: PMC2916923 DOI: 10.1186/1471-2105-11-364] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 07/01/2010] [Indexed: 11/29/2022] Open
Abstract
Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.
Collapse
Affiliation(s)
- Pooja Jain
- School of Chemistry, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | | |
Collapse
|
34
|
Zhang ZH, Bharatham K, Sherman WA, Mihalek I. deconSTRUCT: general purpose protein database search on the substructure level. Nucleic Acids Res 2010; 38:W590-4. [PMID: 20522512 PMCID: PMC2896154 DOI: 10.1093/nar/gkq489] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore
| | | | | | | |
Collapse
|
35
|
Zhang ZH, Lee HK, Mihalek I. Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity. BMC Bioinformatics 2010; 11:155. [PMID: 20338066 PMCID: PMC3098053 DOI: 10.1186/1471-2105-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 03/26/2010] [Indexed: 11/10/2022] Open
Abstract
Background Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within. Results Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions. Conclusions Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute, A*STAR, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | | | |
Collapse
|
36
|
Nicolotti O, Giangreco I, Miscioscia TF, Convertino M, Leonetti F, Pisani L, Carotti A. Screening of benzamidine-based thrombin inhibitors via a linear interaction energy in continuum electrostatics model. J Comput Aided Mol Des 2010; 24:117-29. [DOI: 10.1007/s10822-010-9320-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2009] [Accepted: 01/28/2010] [Indexed: 10/19/2022]
|
37
|
Pisanti N, Soldano H, Carpentier M, Pothier J. A Relational Extension of the Notion of Motifs: Application to the Common 3D Protein Substructures Searching Problem. J Comput Biol 2009; 16:1635-60. [PMID: 20047489 DOI: 10.1089/cmb.2008.0019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nadia Pisanti
- Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo, Pisa, Italy
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
| | - Henry Soldano
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| | - Mathilde Carpentier
- Université Pierre et Marie Curie-Paris6, Equipe de Génomique Analytique, INSERM511 Paris, France
| | - Joel Pothier
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| |
Collapse
|
38
|
Huang D, Zhou T, Lafleur K, Nevado C, Caflisch A. Kinase selectivity potential for inhibitors targeting the ATP binding site: a network analysis. ACTA ACUST UNITED AC 2009; 26:198-204. [PMID: 19942586 DOI: 10.1093/bioinformatics/btp650] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
UNLABELLED MOTIVATION AND METHOD: Small-molecule inhibitors targeting the adenosine triphosphate (ATP) binding pocket of the catalytic domain of protein kinases have potential to become drugs devoid of (major) side effects, particularly if they bind selectively. Here, the sequences of the 518 human kinases are first mapped onto the structural alignment of 116 kinases of known three-dimensional structure. The multiple structure alignment is then used to encode the known strategies for developing selective inhibitors into a fingerprint. Finally, a network analysis is used to partition the kinases into clusters according to similarity of their fingerprints, i.e. physico-chemical characteristics of the residues responsible for selective binding. RESULTS For each kinase the network analysis reveals the likelihood to find selective inhibitors targeting the ATP binding site. Systematic guidelines are proposed to develop selective inhibitors. Importantly, the network analysis suggests that the tyrosine kinase EphB4 has high selectivity potential, which is consistent with the selectivity profile of two novel EphB4 inhibitors. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Danzhi Huang
- Department of Biochemistry, University of Zürich, Winterthurerstrasse 190.
| | | | | | | | | |
Collapse
|
39
|
Micheletti C, Orland H. MISTRAL: a tool for energy-based multiple structural alignment of proteins. ACTA ACUST UNITED AC 2009; 25:2663-9. [PMID: 19692555 DOI: 10.1093/bioinformatics/btp506] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The steady growth of the number of available protein structures has constantly motivated the development of new algorithms for detecting structural correspondences in proteins. Detecting structural equivalences in two or more proteins is computationally demanding as it typically entails the exploration of the combinatorial space of all possible amino acid pairings in the parent proteins. The search is often aided by the introduction of various constraints such as considering protein fragments, rather than single amino acids, and/or seeking only sequential correspondences in the given proteins. An additional challenge is represented by the difficulty of associating to a given alignment, a reliable a priori measure of its statistical significance. RESULTS Here, we present and discuss MISTRAL (Multiple STRuctural ALignment), a novel strategy for multiple protein alignment based on the minimization of an energy function over the low-dimensional space of the relative rotations and translations of the molecules. The energy minimization avoids combinatorial searches and returns pairwise alignment scores for which a reliable a priori statistical significance can be given. AVAILABILITY MISTRAL is freely available for academic users as a standalone program and as a web service at http://ipht.cea.fr/protein.php.
Collapse
Affiliation(s)
- Cristian Micheletti
- SISSA, CNR-INFM Democritos and Italian Institute of Technology, Via Beirut 2-4, 34014 Trieste, Italy.
| | | |
Collapse
|
40
|
Kim C, Tai CH, Lee B. Iterative refinement of structure-based sequence alignments by Seed Extension. BMC Bioinformatics 2009; 10:210. [PMID: 19589133 PMCID: PMC2753854 DOI: 10.1186/1471-2105-10-210] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/09/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. RESULTS RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. CONCLUSION RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.
Collapse
Affiliation(s)
- Changhoon Kim
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
| | | | | |
Collapse
|
41
|
Angaran S, Bock ME, Garutti C, Guerra C. MolLoc: a web tool for the local structural alignment of molecular surfaces. Nucleic Acids Res 2009; 37:W565-70. [PMID: 19465382 PMCID: PMC2703929 DOI: 10.1093/nar/gkp405] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MolLoc stands for Molecular Local surface comparison, and is a web server for the structural comparison of molecular surfaces. Given two structures in PDB format, the user can compare their binding sites, cavities or any arbitrary residue selection. Moreover, the web server allows the comparison of a query structure with a list of structures. Each comparison produces a structural alignment that maximizes the extension of the superimposition of the surfaces, and returns the pairs of atoms with similar physicochemical properties that are close in space after the superimposition. Based on this subset of atoms sharing similar physicochemical properties a new rototranslation is derived that best superimposes them. MolLoc approach is both local and surface-oriented, and therefore it can be particularly useful when testing if molecules with different sequences and folds share any local surface similarity. The MolLoc web server is available at http://bcb.dei.unipd.it/MolLoc.
Collapse
Affiliation(s)
- Stefano Angaran
- Department of Information Engineering, University of Padova, Via Gradenigo 6a, Padova, Italy
| | | | | | | |
Collapse
|
42
|
Eslahchi C, Pezeshk H, Sadeghi M, Massoud Rahimi A, Maboudi Afkham H, Arab S. STON: A novel method for protein three-dimensional structure comparison. Comput Biol Med 2009; 39:166-72. [DOI: 10.1016/j.compbiomed.2008.12.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2007] [Revised: 11/27/2008] [Accepted: 12/05/2008] [Indexed: 11/29/2022]
|
43
|
Basu MK, Poliakov E, Rogozin IB. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 2009; 10:205-16. [PMID: 19151098 DOI: 10.1093/bib/bbn057] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or 'promiscuous'). These promiscuous domains are typically involved in protein-protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
Collapse
Affiliation(s)
- Malay Kumar Basu
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
44
|
Pirovano W, Feenstra KA, Heringa J. The meaning of alignment: lessons from structural diversity. BMC Bioinformatics 2008; 9:556. [PMID: 19105835 PMCID: PMC2630330 DOI: 10.1186/1471-2105-9-556] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 12/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structural alignment provides a fundamental basis for deriving principles of functional and evolutionary relationships. It is routinely used for structural classification and functional characterization of proteins and for the construction of sequence alignment benchmarks. However, the available techniques do not fully consider the implications of protein structural diversity and typically generate a single alignment between sequences. RESULTS We have taken alternative protein crystal structures and generated simulation snapshots to explicitly investigate the impact of structural changes on the alignments. We show that structural diversity has a significant effect on structural alignment. Moreover, we observe alignment inconsistencies even for modest spatial divergence, implying that the biological interpretation of alignments is less straightforward than commonly assumed. A salient example is the GroES 'mobile loop' where sub-Angstrom variations give rise to contradictory sequence alignments. CONCLUSION A comprehensive treatment of ambiguous alignment regions is crucial for further development of structural alignment applications and for the representation of alignments in general. For this purpose we have developed an on-line database containing our data and new ways of visualizing alignment inconsistencies, which can be found at http://www.ibi.vu.nl/databases/stralivari.
Collapse
Affiliation(s)
- Walter Pirovano
- Centre for Integrative Bioinformatics VU (IBIVU), VU University Amsterdam, De Boelelaan 1081A, 1081HV Amsterdam, the Netherlands.
| | | | | |
Collapse
|
45
|
Sun H, Ferhatosmanoglu H, Ota M, Wang Y. An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories. BMC Bioinformatics 2008; 9:344. [PMID: 18710565 PMCID: PMC2571979 DOI: 10.1186/1471-2105-9-344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2008] [Accepted: 08/18/2008] [Indexed: 11/13/2022] Open
Abstract
Background Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. Results In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage [1] demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events. Conclusion The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at .
Collapse
|
46
|
Wang D, Yang H, Shi L, Ma L, Fujii T, Engelstad K, Pascual JM, De Vivo DC. Functional studies of the T295M mutation causing Glut1 deficiency: glucose efflux preferentially affected by T295M. Pediatr Res 2008; 64:538-43. [PMID: 18614966 DOI: 10.1203/pdr.0b013e318184d2b5] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Glucose transporter type 1 (Glut1) deficiency syndrome (Glut1 DS, OMIM: #606777) is characterized by infantile seizures, acquired microcephaly, developmental delay, hypoglycorrhachia (CSF glucose <40 mg/dL), and decreased erythrocyte glucose uptake (56.1 +/- 17% of control). Previously, we reported two patients with a mild Glut1 deficiency phenotype associated with a heterozygous GLUT1 T295M mutation and normal erythrocyte glucose uptake. We assessed the pathogenicity of T295M in the Xenopus laevis oocyte expression system. Under zero-trans influx conditions, the T295M Vmax (590 pmol/min/oocyte) was 79% of the WT value and the Km (14.3 mM) was increased compared with WT (9.6 mM). Under zero-trans efflux conditions, both the Vmax (1216 pmol/min/oocyte) and Km (8.8 mM) in T295M mutant Glut1 were markedly decreased in comparison to the WT values (7443 pmol/min/oocyte and 90.8 mM). Western blot analysis and confocal studies confirmed incorporation of the T295M mutant protein into the plasma membrane. The side chain of M295 is predicted to block the extracellular "gate" for glucose efflux in our Glut-1 molecular model. We conclude that the T295M mutation specifically alters Glut1 conformation and asymmetrically affects glucose flux across the cell by perturbing efflux more than influx. These findings explain the seemingly paradoxical findings of Glut1 DS with hypoglycorrhachia and "normal" erythrocyte glucose uptake.
Collapse
Affiliation(s)
- Dong Wang
- Department of Neurology, Columbia University, New York, New York 10032, USA
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Wang X, Snoeyink J. Defining and computing optimum RMSD for gapped and weighted multiple-structure alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:525-533. [PMID: 18989040 DOI: 10.1109/tcbb.2008.92] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.
Collapse
Affiliation(s)
- Xueyi Wang
- Department of Mathematics and Computer Science, Northwest Nazarene University, 623 Holly St., Nampa, ID 83686-5897, USA.
| | | |
Collapse
|
48
|
Bernsel A, Viklund H, Elofsson A. Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 2008; 71:1387-99. [PMID: 18076048 DOI: 10.1002/prot.21825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.
Collapse
Affiliation(s)
- Andreas Bernsel
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | |
Collapse
|
49
|
Pascual JM, Wang D, Yang R, Shi L, Yang H, De Vivo DC. Structural signatures and membrane helix 4 in GLUT1: inferences from human blood-brain glucose transport mutants. J Biol Chem 2008; 283:16732-42. [PMID: 18387950 PMCID: PMC2423257 DOI: 10.1074/jbc.m801403200] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Indexed: 12/11/2022] Open
Abstract
Exon IV of SLC2A1, a multiple facilitator superfamily (MFS) transporter gene, is particularly susceptible to mutations that cause GLUT1 deficiency syndrome, a human encephalopathy that results from decreased glucose flux through the blood-brain barrier. Genotyping of 100 patients revealed that in a third of them who harbor missense mutations in the GLUT1 transporter, transmembrane domain 4 (TM4), encoded by SLC2A1 exon IV, contains mutant residues that have the periodicity of one face of a kinked alpha-helix. Arg-126, located at the amino terminus of TM4, is the locus for most of the mutations followed by other arginine and glycine residues located elsewhere in the transporter but conserved among MFS proteins. The Arg-126 mutants were constructed and assayed for protein expression, targeting, and transport capacity in Xenopus oocytes. The role of charge at position 126, as well as its accessibility, was investigated in R126H by determining its activity as a function of extracellular pH. The results indicate that intracellular charges at the MFS TM2-3 and TM8-9 signature loops and flanking TMs 3, 5, and 6 are critical for the structure of GLUT1 as are TM glycines and that TM4, located at the catalytic core of MFS proteins, forms a helix that surfaces into the extracellular solution where another proton facilitates transport.
Collapse
Affiliation(s)
- Juan M Pascual
- Department of Neurology, The University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Viklund H, Elofsson A. OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar. ACTA ACUST UNITED AC 2008; 24:1662-8. [PMID: 18474507 DOI: 10.1093/bioinformatics/btn221] [Citation(s) in RCA: 281] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION As alpha-helical transmembrane proteins constitute roughly 25% of a typical genome and are vital parts of many essential biological processes, structural knowledge of these proteins is necessary for increasing our understanding of such processes. Because structural knowledge of transmembrane proteins is difficult to attain experimentally, improved methods for prediction of structural features of these proteins are important. RESULTS OCTOPUS, a new method for predicting transmembrane protein topology is presented and benchmarked using a dataset of 124 sequences with known structures. Using a novel combination of hidden Markov models and artificial neural networks, OCTOPUS predicts the correct topology for 94% of the sequences. In particular, OCTOPUS is the first topology predictor to fully integrate modeling of reentrant/membrane-dipping regions and transmembrane hairpins in the topological grammar. AVAILABILITY OCTOPUS is available as a web server at http://octopus.cbr.su.se.
Collapse
Affiliation(s)
- Håkan Viklund
- Department of Biochemistry and Biophysics/Center for Biomembrane Research/Stockholm Bioinformatics Center, The Arrhenius Laboratories for Natural Sciences, Stockholm University, SE-10691 Stockholm, Sweden
| | | |
Collapse
|