1
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
2
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
3
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| |
Collapse
|
4
|
O’Donoghue SI, Schafferhans A, Sikta N, Stolte C, Kaur S, Ho BK, Anderson S, Procter JB, Dallago C, Bordin N, Adcock M, Rost B. SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms. Mol Syst Biol 2021; 17:e10079. [PMID: 34519429 PMCID: PMC8438690 DOI: 10.15252/msb.202010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 08/05/2021] [Accepted: 08/06/2021] [Indexed: 01/18/2023] Open
Abstract
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ˜6% of the proteome mimicked human proteins, while ˜7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ˜29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms. To make these 3D models more accessible, we devised a structural coverage map, a novel visualization method to show what is-and is not-known about the 3D structure of the viral proteome. We integrated the coverage map into an accompanying online resource (https://aquaria.ws/covid) that can be used to find and explore models corresponding to the 79 structural states identified in this work. The resulting Aquaria-COVID resource helps scientists use emerging structural data to understand the mechanisms underlying coronavirus infection and draws attention to the 31% of the viral proteome that remains structurally unknown or dark.
Collapse
MESH Headings
- Amino Acid Transport Systems, Neutral/chemistry
- Amino Acid Transport Systems, Neutral/genetics
- Amino Acid Transport Systems, Neutral/metabolism
- Angiotensin-Converting Enzyme 2/chemistry
- Angiotensin-Converting Enzyme 2/genetics
- Angiotensin-Converting Enzyme 2/metabolism
- Binding Sites
- COVID-19/genetics
- COVID-19/metabolism
- COVID-19/virology
- Computational Biology/methods
- Coronavirus Envelope Proteins/chemistry
- Coronavirus Envelope Proteins/genetics
- Coronavirus Envelope Proteins/metabolism
- Coronavirus Nucleocapsid Proteins/chemistry
- Coronavirus Nucleocapsid Proteins/genetics
- Coronavirus Nucleocapsid Proteins/metabolism
- Host-Pathogen Interactions/genetics
- Humans
- Mitochondrial Membrane Transport Proteins/chemistry
- Mitochondrial Membrane Transport Proteins/genetics
- Mitochondrial Membrane Transport Proteins/metabolism
- Mitochondrial Precursor Protein Import Complex Proteins
- Models, Molecular
- Molecular Mimicry
- Neuropilin-1/chemistry
- Neuropilin-1/genetics
- Neuropilin-1/metabolism
- Phosphoproteins/chemistry
- Phosphoproteins/genetics
- Phosphoproteins/metabolism
- Protein Binding
- Protein Conformation, alpha-Helical
- Protein Conformation, beta-Strand
- Protein Interaction Domains and Motifs
- Protein Interaction Mapping/methods
- Protein Multimerization
- Protein Processing, Post-Translational
- SARS-CoV-2/chemistry
- SARS-CoV-2/genetics
- SARS-CoV-2/metabolism
- Spike Glycoprotein, Coronavirus/chemistry
- Spike Glycoprotein, Coronavirus/genetics
- Spike Glycoprotein, Coronavirus/metabolism
- Viral Matrix Proteins/chemistry
- Viral Matrix Proteins/genetics
- Viral Matrix Proteins/metabolism
- Viroporin Proteins/chemistry
- Viroporin Proteins/genetics
- Viroporin Proteins/metabolism
- Virus Replication
Collapse
Affiliation(s)
- Seán I O’Donoghue
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- CSIRO Data61CanberraACTAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Andrea Schafferhans
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- Department of Bioengineering SciencesWeihenstephan‐Tr. University of Applied SciencesFreisingGermany
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Neblina Sikta
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | - Sandeep Kaur
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
- School of Biotechnology and Biomolecular Sciences (UNSW)KensingtonNSWAustralia
| | - Bosco K Ho
- Garvan Institute of Medical ResearchDarlinghurstNSWAustralia
| | | | | | - Christian Dallago
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| | - Nicola Bordin
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | | | - Burkhard Rost
- Department of InformaticsBioinformatics & Computational BiologyTechnical University of MunichMunichGermany
| |
Collapse
|
5
|
Littmann M, Bordin N, Heinzinger M, Schütze K, Dallago C, Orengo C, Rost B. Clustering FunFams using sequence embeddings improves EC purity. Bioinformatics 2021; 37:3449-3455. [PMID: 33978744 PMCID: PMC8545299 DOI: 10.1093/bioinformatics/btab371] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/02/2021] [Accepted: 05/11/2021] [Indexed: 12/05/2022] Open
Abstract
Motivation Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be ‘pure’, i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22 830 of 203 639) contain EC annotations and of those, 7% (1526 of 22 830) have inconsistent functional annotations. Results We propose an approach to further cluster FunFams into functionally more consistent sub-families by encoding their sequences through embeddings. These embeddings originate from language models transferring knowledge gained from predicting missing amino acids in a sequence (ProtBERT) and have been further optimized to distinguish between proteins belonging to the same or a different CATH superfamily (PB-Tucker). Using distances between embeddings and DBSCAN to cluster FunFams and identify outliers, doubled the number of pure clusters per FunFam compared to random clustering. Our approach was not limited to FunFams but also succeeded on families created using sequence similarity alone. Complementing EC annotations, we observed similar results for binding annotations. Thus, we expect an increased purity also for other aspects of function. Our results can help generating FunFams; the resulting clusters with improved functional consistency allow more reliable inference of annotations. We expect this approach to succeed equally for any other grouping of proteins by their phenotypes. Availability and implementation Code and embeddings are available via GitHub: https://github.com/Rostlab/FunFamsClustering. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
6
|
Waglechner N, Culp EJ, Wright GD. Ancient Antibiotics, Ancient Resistance. EcoSal Plus 2021; 9:eESP-0027-2020. [PMID: 33734062 PMCID: PMC11163840 DOI: 10.1128/ecosalplus.esp-0027-2020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 01/26/2021] [Indexed: 02/06/2023]
Abstract
As the spread of antibiotic resistance threatens our ability to treat infections, avoiding the return of a preantibiotic era requires the discovery of new drugs. While therapeutic use of antibiotics followed by the inevitable selection of resistance is a modern phenomenon, these molecules and the genetic determinants of resistance were in use by environmental microbes long before humans discovered them. In this review, we discuss evidence that antibiotics and resistance were present in the environment before anthropogenic use, describing techniques including direct sampling of ancient DNA and phylogenetic analyses that are used to reconstruct the past. We also pay special attention to the ecological and evolutionary forces that have shaped the natural history of antibiotic biosynthesis, including a discussion of competitive versus signaling roles for antibiotics, proto-resistance, and substrate promiscuity of biosynthetic and resistance enzymes. Finally, by applying an evolutionary lens, we describe concepts governing the origins and evolution of biosynthetic gene clusters and cluster-associated resistance determinants. These insights into microbes' use of antibiotics in nature, a game they have been playing for millennia, can provide inspiration for discovery technologies and management strategies to combat the growing resistance crisis.
Collapse
Affiliation(s)
- Nicholas Waglechner
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
| | - Elizabeth J. Culp
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
| | - Gerard D. Wright
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
| |
Collapse
|
7
|
Rahman A, Susmi TF, Yasmin F, Karim ME, Hossain MU. Functional annotation of an ecologically important protein from Chloroflexus aurantiacus involved in polyhydroxyalkanoates (PHA) biosynthetic pathway. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03598-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
8
|
Polyanovsky V, Lifanov A, Esipova N, Tumanyan V. The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion. BMC Bioinformatics 2020; 21:294. [PMID: 32921315 PMCID: PMC7489204 DOI: 10.1186/s12859-020-03616-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 06/18/2020] [Indexed: 11/15/2022] Open
Abstract
Background The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins. Results We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters. Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true. Conclusions This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.
Collapse
|
9
|
Nissley DA, Vu QV, Trovato F, Ahmed N, Jiang Y, Li MS, O'Brien EP. Electrostatic Interactions Govern Extreme Nascent Protein Ejection Times from Ribosomes and Can Delay Ribosome Recycling. J Am Chem Soc 2020; 142:6103-6110. [PMID: 32138505 DOI: 10.1021/jacs.9b12264] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The ejection of nascent proteins out of the ribosome exit tunnel, after their covalent bond to transfer-RNA has been broken, has not been experimentally studied due to challenges in sample preparation. Here, we investigate this process using a combination of multiscale modeling, ribosome profiling, and gene ontology analyses. Simulating the ejection of a representative set of 122 E. coli proteins we find a greater than 1000-fold variation in ejection times. Nascent proteins enriched in negatively charged residues near their C-terminus eject the fastest, while nascent chains enriched in positively charged residues tend to eject much more slowly. More work is required to pull slowly ejecting proteins out of the exit tunnel than quickly ejecting proteins, according to all-atom simulations. An energetic decomposition reveals, for slowly ejecting proteins, that this is due to the strong attractive electrostatic interactions between the nascent chain and the negatively charged ribosomal-RNA lining the exit tunnel, and for quickly ejecting proteins, it is due to their repulsive electrostatic interactions with the exit tunnel. Ribosome profiling data from E. coli reveals that the presence of slowly ejecting sequences correlates with ribosomes spending more time at stop codons, indicating that the ejection process might delay ribosome recycling. Proteins that have the highest positive charge density at their C-terminus are overwhelmingly ribosomal proteins, suggesting the possibility that this sequence feature may aid in the cotranslational assembly of ribosomes by delaying the release of nascent ribosomal proteins into the cytosol. Thus, nascent chain ejection times from the ribosome can vary greatly between proteins due to differential electrostatic interactions, can influence ribosome recycling, and could be particularly relevant to the synthesis and cotranslational behavior of some proteins.
Collapse
Affiliation(s)
| | - Quyen V Vu
- Institute of Physics, Polish Academy of Sciences, Al. Lotnikow 32/46, 02-668 Warsaw, Poland
| | | | | | | | - Mai Suan Li
- Institute of Physics, Polish Academy of Sciences, Al. Lotnikow 32/46, 02-668 Warsaw, Poland.,Institute for Computational Sciences and Technology, Ho Chi Minh City, Vietnam
| | | |
Collapse
|
10
|
Scheibenreif L, Littmann M, Orengo C, Rost B. FunFam protein families improve residue level molecular function prediction. BMC Bioinformatics 2019; 20:400. [PMID: 31319797 PMCID: PMC6639920 DOI: 10.1186/s12859-019-2988-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/09/2019] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.
Collapse
Affiliation(s)
- Linus Scheibenreif
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Maria Littmann
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY 10032, USA
| |
Collapse
|
11
|
Environmental conditions shape the nature of a minimal bacterial genome. Nat Commun 2019; 10:3100. [PMID: 31308405 PMCID: PMC6629657 DOI: 10.1038/s41467-019-10837-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 06/04/2019] [Indexed: 12/16/2022] Open
Abstract
Of the 473 genes in the genome of the bacterium with the smallest genome generated to date, 149 genes have unknown function, emphasising a universal problem; less than 1% of proteins have experimentally determined annotations. Here, we combine the results from state-of-the-art in silico methods for functional annotation and assign functions to 66 of the 149 proteins. Proteins that are still not annotated lack orthologues, lack protein domains, and/ or are membrane proteins. Twenty-four likely transporter proteins are identified indicating the importance of nutrient uptake into and waste disposal out of the minimal bacterial cell in a nutrient-rich environment after removal of metabolic enzymes. Hence, the environment shapes the nature of a minimal genome. Our findings also show that the combination of multiple different state-of-the-art in silico methods for annotating proteins is able to predict functions, even for difficult to characterise proteins and identify crucial gaps for further development.
Collapse
|
12
|
Liu J, Dai J, He J, Peng X, Niemi AJ. Can the geometry of all-atom protein trajectories be reconstructed from the knowledge of Cα time evolution? A study of peptide plane O and side chain Cβ atoms. J Chem Phys 2019; 150:225103. [PMID: 31202245 DOI: 10.1063/1.5082627] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We inquire to what extent can the geometry of protein peptide plane and side chain atoms be reconstructed from the knowledge of Cα time evolution. Due to the lack of experimental data, we analyze all atom molecular dynamics trajectories from the Anton supercomputer, and for clarity, we limit our attention to the peptide plane O atoms and side chain Cβ atoms. We reconstruct their positions using four different approaches. Three of these are the publicly available reconstruction programs Pulchra, Remo, and Scwrl4. The fourth, Statistical Method, builds entirely on the statistical analysis of Protein Data Bank structures. All four methods place the O and Cβ atoms accurately along the Anton trajectories; the Statistical Method gives results that are closest to the Anton data. The results suggest that when a protein moves under physiological conditions, its all atom structures can be reconstructed with high accuracy from the knowledge of the Cα atom positions. This can help to better understand and improve all atom force fields, and advance reconstruction and refinement methods for reduced protein structures. The results provide impetus for the development of effective coarse grained force fields in terms of reduced coordinates.
Collapse
Affiliation(s)
- Jiaojiao Liu
- School of Physics, Beijing Institute of Technology, Beijing 100081, People's Republic of China
| | - Jin Dai
- Nordita, Stockholm University, Roslagstullsbacken 23, SE-106 91 Stockholm, Sweden
| | - Jianfeng He
- School of Physics, Beijing Institute of Technology, Beijing 100081, People's Republic of China
| | - Xubiao Peng
- Center for Quantum Technology Research, School of Physics, Beijing Institute of Technology, Beijing 100081, People's Republic of China
| | - Antti J Niemi
- School of Physics, Beijing Institute of Technology, Beijing 100081, People's Republic of China
| |
Collapse
|
13
|
Pascual-García A, Arenas M, Bastolla U. The Molecular Clock in the Evolution of Protein Structures. Syst Biol 2019; 68:987-1002. [DOI: 10.1093/sysbio/syz022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/20/2019] [Accepted: 04/09/2019] [Indexed: 12/11/2022] Open
Abstract
Abstract
The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clock in the evolution of both protein sequences and structures in three large superfamilies. After validating our method with computer simulations, we find that clock violations are generally consistent in sequence and structure evolution, but they tend to be larger and more significant in structure evolution. Moreover, changes of function assessed through Gene Ontology and InterPro terms are associated with large and significant clock violations in structure evolution. We found that almost one third of significant clock violations are significant in structure evolution but not in sequence evolution, highlighting the advantage to use structure information for assessing accelerated evolution and gathering hints of positive selection. Clock violations between closely related pairs are frequently significant in sequence evolution, consistent with the observed time dependence of the substitution rate attributed to segregation of neutral and slightly deleterious polymorphisms, but not in structure evolution, suggesting that these substitutions do not affect protein structure although they may affect stability. These results are consistent with the view that natural selection, both negative and positive, constrains more strongly protein structures than protein sequences. Our code for computing clock violations is freely available at https://github.com/ugobas/Molecular_clock.
Collapse
Affiliation(s)
- Alberto Pascual-García
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, UK
- Institute of Integrative Biology, ETH Zürich, Zürich, Switzerland
| | - Miguel Arenas
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Spain
| | - Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
14
|
Facchiano A, Pignone D, Servillo L, Castaldo D, De Masi L. Structure and Ligands Interactions of Citrus Tryptophan Decarboxylase by Molecular Modeling and Docking Simulations. Biomolecules 2019; 9:E117. [PMID: 30917613 PMCID: PMC6468663 DOI: 10.3390/biom9030117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 03/22/2019] [Indexed: 01/21/2023] Open
Abstract
In a previous work, we in silico annotated protein sequences of Citrus genus plants as putative tryptophan decarboxylase (pTDC). Here, we investigated the structural properties of Citrus pTDCs by using the TDC sequence of Catharanthus roseus as an experimentally annotated reference to carry out comparative modeling and substrate docking analyses. The functional annotation as TDC was verified by combining 3D molecular modeling and docking simulations, evidencing the peculiarities and the structural similarities with C. roseus TDC. Docking with l-tryptophan as a ligand showed specificity of pTDC for this substrate. These combined results confirm our previous in silico annotation of the examined protein sequences of Citrus as TDC and provide support for TDC activity in this plant genus.
Collapse
Affiliation(s)
- Angelo Facchiano
- Consiglio Nazionale delle Ricerche (CNR), Istituto di Scienze dell'Alimentazione (ISA), 83100 Avellino, Italy.
| | - Domenico Pignone
- CNR, Istituto di Bioscienze e BioRisorse (IBBR), 70126 Bari, Italy.
| | - Luigi Servillo
- Dipartimento di Medicina di Precisione, Università degli Studi della Campania "Luigi Vanvitelli", 80138 Napoli, Italy.
| | - Domenico Castaldo
- Stazione Sperimentale per le Industrie delle Essenze e dei Derivati dagli Agrumi (SSEA), Azienda Speciale della Camera di Commercio di Reggio Calabria, 89125 Reggio Calabria, Italy.
| | | |
Collapse
|
15
|
Networks of electrostatic and hydrophobic interactions modulate the complex folding free energy surface of a designed βα protein. Proc Natl Acad Sci U S A 2019; 116:6806-6811. [PMID: 30877249 DOI: 10.1073/pnas.1818744116] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The successful de novo design of proteins can provide insights into the physical chemical basis of stability, the role of evolution in constraining amino acid sequences, and the production of customizable platforms for engineering applications. Previous guanidine hydrochloride (GdnHCl; an ionic denaturant) experiments of a designed, naturally occurring βα fold, Di-III_14, revealed a cooperative, two-state unfolding transition and a modest stability. Continuous-flow mixing experiments in our laboratory revealed a simple two-state reaction in the microsecond to millisecond time range and consistent with the thermodynamic results. In striking contrast, the protein remains folded up to 9.25 M in urea, a neutral denaturant, and hydrogen exchange (HDX) NMR analysis in water revealed the presence of numerous high-energy states that interconvert on a time scale greater than seconds. The complex protection pattern for HDX corresponds closely with a pair of electrostatic networks on the surface and an extensive network of hydrophobic side chains in the interior of the protein. Mutational analysis showed that electrostatic and hydrophobic networks contribute to the resistance to urea denaturation for the WT protein; remarkably, single charge reversals on the protein surface restore the expected urea sensitivity. The roughness of the energy surface reflects the densely packed hydrophobic core; the removal of only two methyl groups eliminates the high-energy states and creates a smooth surface. The design of a very stable βα fold containing electrostatic and hydrophobic networks has created a complex energy surface rarely observed in natural proteins.
Collapse
|
16
|
Sillitoe I, Dawson N, Lewis TE, Das S, Lees JG, Ashford P, Tolulope A, Scholes HM, Senatorov I, Bujan A, Ceballos Rodriguez-Conde F, Dowling B, Thornton J, Orengo CA. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res 2019; 47:D280-D284. [PMID: 30398663 PMCID: PMC6323983 DOI: 10.1093/nar/gky1097] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 10/16/2018] [Accepted: 11/02/2018] [Indexed: 01/20/2023] Open
Abstract
This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.
Collapse
Affiliation(s)
- Ian Sillitoe
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Natalie Dawson
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Tony E Lewis
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Sayoni Das
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Jonathan G Lees
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Paul Ashford
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Adeyelu Tolulope
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Harry M Scholes
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Ilya Senatorov
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Andra Bujan
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | | | - Benjamin Dowling
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| | - Janet Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine A Orengo
- Structural and Molecular Biology, University College London WC1E 6BT, UK
| |
Collapse
|
17
|
Studer G, Tauriello G, Bienert S, Waterhouse AM, Bertoni M, Bordoli L, Schwede T, Lepore R. Modeling of Protein Tertiary and Quaternary Structures Based on Evolutionary Information. Methods Mol Biol 2019; 1851:301-316. [PMID: 30298405 DOI: 10.1007/978-1-4939-8736-8_17] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Proteins are subject to evolutionary forces that shape their three-dimensional structure to meet specific functional demands. The knowledge of the structure of a protein is therefore instrumental to gain information about the molecular basis of its function. However, experimental structure determination is inherently time consuming and expensive, making it impossible to follow the explosion of sequence data deriving from genome-scale projects. As a consequence, computational structural modeling techniques have received much attention and established themselves as a valuable complement to experimental structural biology efforts. Among these, comparative modeling remains the method of choice to model the three-dimensional structure of a protein when homology to a protein of known structure can be detected.The general strategy consists of using experimentally determined structures of proteins as templates for the generation of three-dimensional models of related family members (targets) of which the structure is unknown. This chapter provides a description of the individual steps needed to obtain a comparative model using SWISS-MODEL, one of the most widely used automated servers for protein structure homology modeling.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Andrew Mark Waterhouse
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Martino Bertoni
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Lorenza Bordoli
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Rosalba Lepore
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
| |
Collapse
|
18
|
Gazi MA, Mahmud S, Fahim SM, Kibria MG, Palit P, Islam MR, Rashid H, Das S, Mahfuz M, Ahmeed T. Functional Prediction of Hypothetical Proteins from Shigella flexneri and Validation of the Predicted Models by Using ROC Curve Analysis. Genomics Inform 2018; 16:e26. [PMID: 30602087 PMCID: PMC6440662 DOI: 10.5808/gi.2018.16.4.e26] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 09/16/2018] [Indexed: 01/04/2023] Open
Abstract
Shigella spp. constitutes some of the key pathogens responsible for the global burden of diarrhoeal disease. With over 164 million reported cases per annum, shigellosis accounts for 1.1 million deaths each year. Majority of these cases occur among the children of the developing nations and the emergence of multi-drug resistance Shigella strains in clinical isolates demands the development of better/new drugs against this pathogen. The genome of Shigella flexneri was extensively analyzed and found 4,362 proteins among which the functions of 674 proteins, termed as hypothetical proteins (HPs) had not been previously elucidated. Amino acid sequences of all these 674 HPs were studied and the functions of a total of 39 HPs have been assigned with high level of confidence. Here we have utilized a combination of the latest versions of databases to assign the precise function of HPs for which no experimental information is available. These HPs were found to belong to various classes of proteins such as enzymes, binding proteins, signal transducers, lipoprotein, transporters, virulence and other proteins. Evaluation of the performance of the various computational tools conducted using receiver operating characteristic curve analysis and a resoundingly high average accuracy of 93.6% were obtained. Our comprehensive analysis will help to gain greater understanding for the development of many novel potential therapeutic interventions to defeat Shigella infection.
Collapse
Affiliation(s)
- Md Amran Gazi
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Sultan Mahmud
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Shah Mohammad Fahim
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Mohammad Golam Kibria
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Parag Palit
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Md Rezaul Islam
- International Max Planck Research School, Grisebachstraße 5, 37077 Göttingen, Germany
| | - Humaira Rashid
- Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Subhasish Das
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Mustafa Mahfuz
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| | - Tahmeed Ahmeed
- Nutrition and Clinical Services Division, International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b), Dhaka 1212, Bangladesh
| |
Collapse
|
19
|
Ivanov AA, Revennaugh B, Rusnak L, Gonzalez-Pecchi V, Mo X, Johns MA, Du Y, Cooper LAD, Moreno CS, Khuri FR, Fu H. The OncoPPi Portal: an integrative resource to explore and prioritize protein-protein interactions for cancer target discovery. Bioinformatics 2018; 34:1183-1191. [PMID: 29186335 DOI: 10.1093/bioinformatics/btx743] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/23/2017] [Indexed: 12/21/2022] Open
Abstract
Motivation As cancer genomics initiatives move toward comprehensive identification of genetic alterations in cancer, attention is now turning to understanding how interactions among these genes lead to the acquisition of tumor hallmarks. Emerging pharmacological and clinical data suggest a highly promising role of cancer-specific protein-protein interactions (PPIs) as druggable cancer targets. However, large-scale experimental identification of cancer-related PPIs remains challenging, and currently available resources to explore oncogenic PPI networks are limited. Results Recently, we have developed a PPI high-throughput screening platform to detect PPIs between cancer-associated proteins in the context of cancer cells. Here, we present the OncoPPi Portal, an interactive web resource that allows investigators to access, manipulate and interpret a high-quality cancer-focused network of PPIs experimentally detected in cancer cell lines. To facilitate prioritization of PPIs for further biological studies, this resource combines network connectivity analysis, mutual exclusivity analysis of genomic alterations, cellular co-localization of interacting proteins and domain-domain interactions. Estimates of PPI essentiality allow users to evaluate the functional impact of PPI disruption on cancer cell proliferation. Furthermore, connecting the OncoPPi network with the approved drugs and compounds in clinical trials enables discovery of new tumor dependencies to inform strategies to interrogate undruggable targets like tumor suppressors. The OncoPPi Portal serves as a resource for the cancer research community to facilitate discovery of cancer targets and therapeutic development. Availability and implementation The OncoPPi Portal is available at http://oncoppi.emory.edu. Contact andrey.ivanov@emory.edu or hfu@emory.edu.
Collapse
Affiliation(s)
- Andrei A Ivanov
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine.,Winship Cancer Institute of Emory University
| | - Brian Revennaugh
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine
| | - Lauren Rusnak
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine
| | - Valentina Gonzalez-Pecchi
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine
| | - Xiulei Mo
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine
| | - Margaret A Johns
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine
| | - Yuhong Du
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine.,Winship Cancer Institute of Emory University
| | - Lee A D Cooper
- Winship Cancer Institute of Emory University.,Department of Biomedical Informatics.,Department of Biomedical Engineering
| | - Carlos S Moreno
- Winship Cancer Institute of Emory University.,Department of Biomedical Informatics.,Department of Pathology and Laboratory Medicine
| | - Fadlo R Khuri
- Winship Cancer Institute of Emory University.,Department of Hematology and Medical Oncology, Emory University School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Haian Fu
- Department of Pharmacology and Emory Chemical Biology Discovery Center, Emory University School of Medicine.,Winship Cancer Institute of Emory University.,Department of Hematology and Medical Oncology, Emory University School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
20
|
Esque J, Sansom MSP, Baaden M, Oguey C. Analyzing protein topology based on Laguerre tessellation of a pore-traversing water network. Sci Rep 2018; 8:13540. [PMID: 30202114 PMCID: PMC6131185 DOI: 10.1038/s41598-018-31422-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 08/17/2018] [Indexed: 11/15/2022] Open
Abstract
Given the tight relation between protein structure and function, we present a set of methods to analyze protein topology, implemented in the VLDP program, relying on Laguerre space partitions built from series of molecular dynamics snapshots. The Laguerre partition specifies inter-atomic contacts, formalized in graphs. The deduced properties are the existence and count of water aggregates, possible passage ways and constrictions, the structure, connectivity, stability and depth of the water network. As a test-case, the membrane protein FepA is investigated in its full environment, yielding a more precise description of the protein surface. Inside FepA, the solvent splits into isolated clusters and an intricate network connecting both sides of the lipid bilayer. The network is dynamic, connections set on and off, occasionally substantially relocating traversing paths. Subtle differences are detected between two forms of FepA, ligand-free and complexed with its natural iron carrier, the enterobactin. The complexed form has more constricted and more centered openings in the upper part whereas, in the lower part, constriction is released: two main channels between the plug and barrel lead directly to the periplasm. Reliability, precision and the variety of topological features are the main interest of the method.
Collapse
Affiliation(s)
- Jérémy Esque
- LPTM, CNRS UMR 8089, Université de Cergy-Pontoise, 95302, Cergy-Pontoise, France. .,LISBP, Université de Toulouse, CNRS, INSA, INRA, 135 Avenue de Rangueil, 31400, Toulouse, France.
| | - Mark S P Sansom
- Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, United Kingdom
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS, UPR9080, Univ Paris Diderot, Sorbonne Paris Cité, 13 rue Pierre et Marie Curie, 75005, Paris, France
| | - Christophe Oguey
- LPTM, CNRS UMR 8089, Université de Cergy-Pontoise, 95302, Cergy-Pontoise, France.
| |
Collapse
|
21
|
Koch I, Schäfer T. Protein super-secondary structure and quaternary structure topology: theoretical description and application. Curr Opin Struct Biol 2018; 50:134-143. [DOI: 10.1016/j.sbi.2018.02.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/26/2018] [Accepted: 02/17/2018] [Indexed: 12/13/2022]
|
22
|
Nasedkin A, Davidsson J, Niemi AJ, Peng X. Solution x-ray scattering and structure formation in protein dynamics. Phys Rev E 2018; 96:062405. [PMID: 29347365 DOI: 10.1103/physreve.96.062405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Indexed: 11/07/2022]
Abstract
We propose a computationally effective approach that builds on Landau mean-field theory in combination with modern nonequilibrium statistical mechanics to model and interpret protein dynamics and structure formation in small- to wide-angle x-ray scattering (S/WAXS) experiments. We develop the methodology by analyzing experimental data in the case of Engrailed homeodomain protein as an example. We demonstrate how to interpret S/WAXS data qualitatively with a good precision and over an extended temperature range. We explain experimental observations in terms of protein phase structure, and we make predictions for future experiments and for how to analyze data at different ambient temperature values. We conclude that the approach we propose has the potential to become a highly accurate, computationally effective, and predictive tool for analyzing S/WAXS data. For this, we compare our results with those obtained previously in an all-atom molecular dynamics simulation.
Collapse
Affiliation(s)
- Alexandr Nasedkin
- Department of Physics, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden
| | - Jan Davidsson
- Department of Chemistry, Uppsala University, P. O. Box 803, S-75108, Uppsala, Sweden
| | - Antti J Niemi
- Department of Physics, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden.,Nordita, Stockholm University, Roslagstullsbacken 23, SE-106 91 Stockholm, Sweden.,Department of Physics and Astronomy, Uppsala University, P. O. Box 803, S-75108, Uppsala, Sweden.,Laboratoire de Mathematiques et Physique Theorique CNRS UMR 6083, Fédération Denis Poisson, Université de Tours, Parc de Grandmont, F37200, Tours, France.,School of Physics, Beijing Institute of Technology, Beijing 100081, P.R. China.,Laboratory of Physics of Living Matter, School of Biomedicine, Far Eastern Federal University, Vladivostok 690090, Russia¶
| | - Xubiao Peng
- Department of Physics and Astronomy, University of British Columbia, Vancouver, British Columbia V6T1Z4, Canada
| |
Collapse
|
23
|
Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017; 20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Elizabeth Sam
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| |
Collapse
|
24
|
Kalinowska B, Banach M, Wiśniowski Z, Konieczny L, Roterman I. Is the hydrophobic core a universal structural element in proteins? J Mol Model 2017. [PMID: 28623601 PMCID: PMC5487895 DOI: 10.1007/s00894-017-3367-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The hydrophobic core, when subjected to analysis based on the fuzzy oil drop model, appears to be a universal structural component of proteins irrespective of their secondary, supersecondary, and tertiary conformations. A study has been performed on a set of nonhomologous proteins representing a variety of CATH categories. The presence of a well-ordered hydrophobic core has been confirmed in each case, regardless of the protein’s biological function, chain length or source organism. In light of fuzzy oil drop (FOD) analysis, various supersecondary forms seem to share a common structural factor in the form of a hydrophobic core, emerging either as part of the whole protein or a specific domain. The variable status of individual folds with respect to the FOD model reflects their propensity for conformational changes, frequently associated with biological function. Such flexibility is expressed as variable stability of the hydrophobic core, along with specific encoding of potential conformational changes which depend on the properties of helices and β-folds.
Collapse
Affiliation(s)
- Barbara Kalinowska
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Lazarza 16, 31-530, Krakow, Poland.,Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348, Krakow, Poland
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Lazarza 16, 31-530, Krakow, Poland.,Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348, Krakow, Poland
| | - Zdzisław Wiśniowski
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Lazarza 16, 31-530, Krakow, Poland
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Jagiellonian University - Medical College, Kopernika 7, 31-034, Krakow, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Lazarza 16, 31-530, Krakow, Poland.
| |
Collapse
|
25
|
Knutson ST, Westwood BM, Leuthaeuser JB, Turner BE, Nguyendac D, Shea G, Kumar K, Hayden JD, Harper AF, Brown SD, Morris JH, Ferrin TE, Babbitt PC, Fetrow JS. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 2017; 26:677-699. [PMID: 28054422 PMCID: PMC5368075 DOI: 10.1002/pro.3112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 12/22/2016] [Indexed: 01/11/2023]
Abstract
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Collapse
Affiliation(s)
- Stacy T. Knutson
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Brian M. Westwood
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Janelle B. Leuthaeuser
- Molecular Genetics and Genomics ProgramWake Forest School of MedicineWinston‐SalemNorth Carolina27157
| | - Brandon E. Turner
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Don Nguyendac
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Gabrielle Shea
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Kiran Kumar
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Julia D. Hayden
- Biochemistry Program, Dickinson CollegeCarlislePennsylvania17013
| | - Angela F. Harper
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Shoshana D. Brown
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - John H. Morris
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Thomas E. Ferrin
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Patricia C. Babbitt
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | | |
Collapse
|
26
|
Landim PGC, Correia TO, Silva FD, Nepomuceno DR, Costa HP, Pereira HM, Lobo MD, Moreno FB, Brandão-Neto J, Medeiros SC, Vasconcelos IM, Oliveira JT, Sousa BL, Barroso-Neto IL, Freire VN, Carvalho CP, Monteiro-Moreira AC, Grangeiro TB. Production in Pichia pastoris, antifungal activity and crystal structure of a class I chitinase from cowpea (Vigna unguiculata): Insights into sugar binding mode and hydrolytic action. Biochimie 2017; 135:89-103. [DOI: 10.1016/j.biochi.2017.01.014] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 01/27/2017] [Indexed: 02/02/2023]
|
27
|
Singh S, Shrivastava AK. In silico characterization and transcriptomic analysis of nif family genes from Anabaena sp. PCC7120. Cell Biol Toxicol 2017; 33:467-482. [DOI: 10.1007/s10565-017-9388-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 02/13/2017] [Indexed: 12/12/2022]
|
28
|
Harsch T, Schneider P, Kieninger B, Donaubauer H, Kalbitzer HR. Stereospecific assignment of the asparagine and glutamine sidechain amide protons in proteins from chemical shift analysis. JOURNAL OF BIOMOLECULAR NMR 2017; 67:157-164. [PMID: 28197852 DOI: 10.1007/s10858-017-0093-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 01/31/2017] [Indexed: 06/06/2023]
Abstract
Side chain amide protons of asparagine and glutamine residues in random-coil peptides are characterized by large chemical shift differences and can be stereospecifically assigned on the basis of their chemical shift values only. The bimodal chemical shift distributions stored in the biological magnetic resonance data bank (BMRB) do not allow such an assignment. However, an analysis of the BMRB shows, that a substantial part of all stored stereospecific assignments is not correct. We show here that in most cases stereospecific assignment can also be done for folded proteins using an unbiased artificial chemical shift data base (UACSB). For a separation of the chemical shifts of the two amide resonance lines with differences ≥0.40 ppm for asparagine and differences ≥0.42 ppm for glutamine, the downfield shifted resonance lines can be assigned to Hδ21 and Hε21, respectively, at a confidence level >95%. A classifier derived from UASCB can also be used to correct the BMRB data. The program tool AssignmentChecker implemented in AUREMOL calculates the Bayesian probability for a given stereospecific assignment and automatically corrects the assignments for a given list of chemical shifts.
Collapse
Affiliation(s)
- Tobias Harsch
- Institute of Biophysics and Physical Biochemistry and Centre of Magnetic Resonance in Chemistry and Biomedicine, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Philipp Schneider
- Institute of Biophysics and Physical Biochemistry and Centre of Magnetic Resonance in Chemistry and Biomedicine, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Bärbel Kieninger
- Institute of Biophysics and Physical Biochemistry and Centre of Magnetic Resonance in Chemistry and Biomedicine, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Harald Donaubauer
- Institute of Biophysics and Physical Biochemistry and Centre of Magnetic Resonance in Chemistry and Biomedicine, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany
| | - Hans Robert Kalbitzer
- Institute of Biophysics and Physical Biochemistry and Centre of Magnetic Resonance in Chemistry and Biomedicine, University of Regensburg, Universitätsstraße 31, 93053, Regensburg, Germany.
| |
Collapse
|
29
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
30
|
Abstract
We identify new entangled motifs in proteins that we call complex lassos. Lassos arise in proteins with disulfide bridges (or in proteins with amide linkages), when termini of a protein backbone pierce through an auxiliary surface of minimal area, spanned on a covalent loop. We find that as much as 18% of all proteins with disulfide bridges in a non-redundant subset of PDB form complex lassos, and classify them into six distinct geometric classes, one of which resembles supercoiling known from DNA. Based on biological classification of proteins we find that lassos are much more common in viruses, plants and fungi than in other kingdoms of life. We also discuss how changes in the oxidation/reduction potential may affect the function of proteins with lassos. Lassos and associated surfaces of minimal area provide new, interesting and possessing many potential applications geometric characteristics not only of proteins, but also of other biomolecules.
Collapse
|
31
|
Dygut J, Kalinowska B, Banach M, Piwowar M, Konieczny L, Roterman I. Structural Interface Forms and Their Involvement in Stabilization of Multidomain Proteins or Protein Complexes. Int J Mol Sci 2016; 17:ijms17101741. [PMID: 27763556 PMCID: PMC5085769 DOI: 10.3390/ijms17101741] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Revised: 09/30/2016] [Accepted: 10/11/2016] [Indexed: 12/20/2022] Open
Abstract
The presented analysis concerns the inter-domain and inter-protein interface in protein complexes. We propose extending the traditional understanding of the protein domain as a function of local compactness with an additional criterion which refers to the presence of a well-defined hydrophobic core. Interface areas in selected homodimers vary with respect to their contribution to share as well as individual (domain-specific) hydrophobic cores. The basic definition of a protein domain, i.e., a structural unit characterized by tighter packing than its immediate environment, is extended in order to acknowledge the role of a structured hydrophobic core, which includes the interface area. The hydrophobic properties of interfaces vary depending on the status of interacting domains—In this context we can distinguish: (1) Shared hydrophobic cores (spanning the whole dimer); (2) Individual hydrophobic cores present in each monomer irrespective of whether the dimer contains a shared core. Analysis of interfaces in dystrophin and utrophin indicates the presence of an additional quasi-domain with a prominent hydrophobic core, consisting of fragments contributed by both monomers. In addition, we have also attempted to determine the relationship between the type of interface (as categorized above) and the biological function of each complex. This analysis is entirely based on the fuzzy oil drop model.
Collapse
Affiliation(s)
- Jacek Dygut
- Department of Rehabilitation, Hospital in Przemyśl, Monte Cassino 18, 37-700 Przemyśl, Poland.
| | - Barbara Kalinowska
- Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, Łojasiewicza 11, 30-348 Krakow, Poland.
| | - Mateusz Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Łazarza 16, 31-530 Krakow, Poland.
| | - Monika Piwowar
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Łazarza 16, 31-530 Krakow, Poland.
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Jagiellonian University-Medical College, Kopernika 7, 31-034 Krakow, Poland.
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Łazarza 16, 31-530 Krakow, Poland.
| |
Collapse
|
32
|
Naqvi AAT, Anjum F, Khan FI, Islam A, Ahmad F, Hassan MI. Sequence Analysis of Hypothetical Proteins from Helicobacter pylori 26695 to Identify Potential Virulence Factors. Genomics Inform 2016; 14:125-135. [PMID: 27729842 PMCID: PMC5056897 DOI: 10.5808/gi.2016.14.3.125] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 08/05/2016] [Accepted: 08/29/2016] [Indexed: 12/16/2022] Open
Abstract
Helicobacter pylori is a Gram-negative bacteria that is responsible for gastritis in human. Its spiral flagellated body helps in locomotion and colonization in the host environment. It is capable of living in the highly acidic environment of the stomach with the help of acid adaptive genes. The genome of H. pylori 26695 strain contains 1,555 coding genes that encode 1,445 proteins. Out of these, 340 proteins are characterized as hypothetical proteins (HP). This study involves extensive analysis of the HPs using an established pipeline which comprises various bioinformatics tools and databases to find out probable functions of the HPs and identification of virulence factors. After extensive analysis of all the 340 HPs, we found that 104 HPs are showing characteristic similarities with the proteins with known functions. Thus, on the basis of such similarities, we assigned probable functions to 104 HPs with high confidence and precision. All the predicted HPs contain representative members of diverse functional classes of proteins such as enzymes, transporters, binding proteins, regulatory proteins, proteins involved in cellular processes and other proteins with miscellaneous functions. Therefore, we classified 104 HPs into aforementioned functional groups. During the virulence factors analysis of the HPs, we found 11 HPs are showing significant virulence. The identification of virulence proteins with the help their predicted functions may pave the way for drug target estimation and development of effective drug to counter the activity of that protein.
Collapse
Affiliation(s)
- Ahmad Abu Turab Naqvi
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Farah Anjum
- Female College of Applied Medical Science, Taif University, Al-Taif 21974, Kingdom of Saudi Arabia
| | - Faez Iqbal Khan
- School of Chemistry and Chemical Engineering, Henan University of Technology, Henan 450001, China
| | - Asimul Islam
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Faizan Ahmad
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| | - Md Imtaiyaz Hassan
- Center for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi 110025, India
| |
Collapse
|
33
|
Assar Z, Nossoni Z, Wang W, Santos EM, Kramer K, McCornack C, Vasileiou C, Borhan B, Geiger JH. Domain-Swapped Dimers of Intracellular Lipid-Binding Proteins: Evidence for Ordered Folding Intermediates. Structure 2016; 24:1590-8. [PMID: 27524203 PMCID: PMC5330279 DOI: 10.1016/j.str.2016.05.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Revised: 05/13/2016] [Accepted: 05/16/2016] [Indexed: 11/29/2022]
Abstract
Human Cellular Retinol Binding Protein II (hCRBPII), a member of the intracellular lipid-binding protein family, is a monomeric protein responsible for the intracellular transport of retinol and retinal. Herein we report that hCRBPII forms an extensive domain-swapped dimer during bacterial expression. The domain-swapped region encompasses almost half of the protein. The dimer represents a novel structural architecture with the mouths of the two binding cavities facing each other, producing a new binding cavity that spans the length of the protein complex. Although wild-type hCRBPII forms the dimer, the propensity for dimerization can be substantially increased via mutation at Tyr60. The monomeric form of the wild-type protein represents the thermodynamically more stable species, making the domain-swapped dimer a kinetically trapped entity. Hypothetically, the wild-type protein has evolved to minimize dimerization of the folding intermediate through a critical hydrogen bond (Tyr60-Glu72) that disfavors the dimeric form.
Collapse
Affiliation(s)
- Zahra Assar
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Zahra Nossoni
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Wenjing Wang
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Elizabeth M Santos
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Kevin Kramer
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Colin McCornack
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Chrysoula Vasileiou
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Babak Borhan
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA.
| | - James H Geiger
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
34
|
Dai J, Niemi AJ, He J. Conformational landscape of an amyloid intra-cellular domain and Landau-Ginzburg-Wilson paradigm in protein dynamics. J Chem Phys 2016; 145:045103. [DOI: 10.1063/1.4959582] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Jin Dai
- School of Physics, Beijing Institute of Technology, Beijing 100081, People’s Republic of China
| | - Antti J. Niemi
- School of Physics, Beijing Institute of Technology, Beijing 100081, People’s Republic of China
- Department of Physics and Astronomy, Uppsala University, P.O. Box 803, S-75108 Uppsala, Sweden
- Laboratoire de Mathematiques et Physique Theorique CNRS UMR 6083, Fédération Denis Poisson, Université de Tours, Parc de Grandmont, F37200 Tours, France
| | - Jianfeng He
- School of Physics, Beijing Institute of Technology, Beijing 100081, People’s Republic of China
| |
Collapse
|
35
|
Arenas-Salinas M, Vargas-Pérez JI, Morales W, Pinto C, Muñoz-Díaz P, Cornejo FA, Pugin B, Sandoval JM, Díaz-Vásquez WA, Muñoz-Villagrán C, Rodríguez-Rojas F, Morales EH, Vásquez CC, Arenas FA. Flavoprotein-Mediated Tellurite Reduction: Structural Basis and Applications to the Synthesis of Tellurium-Containing Nanostructures. Front Microbiol 2016; 7:1160. [PMID: 27507969 PMCID: PMC4960239 DOI: 10.3389/fmicb.2016.01160] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Accepted: 07/12/2016] [Indexed: 01/24/2023] Open
Abstract
The tellurium oxyanion tellurite (TeO32-) is extremely harmful for most organisms. It has been suggested that a potential bacterial tellurite resistance mechanism would consist of an enzymatic, NAD(P)H-dependent, reduction to the less toxic form elemental tellurium (Te0). To date, a number of enzymes such as catalase, type II NADH dehydrogenase and terminal oxidases from the electron transport chain, nitrate reductases, and dihydrolipoamide dehydrogenase (E3), among others, have been shown to display tellurite-reducing activity. This activity is generically referred to as tellurite reductase (TR). Bioinformatic data resting on some of the abovementioned enzymes enabled the identification of common structures involved in tellurite reduction including vicinal catalytic cysteine residues and the FAD/NAD(P)+-binding domain, which is characteristic of some flavoproteins. Along this line, thioredoxin reductase (TrxB), alkyl hydroperoxide reductase (AhpF), glutathione reductase (GorA), mercuric reductase (MerA), NADH: flavorubredoxin reductase (NorW), dihydrolipoamide dehydrogenase, and the putative oxidoreductase YkgC from Escherichia coli or environmental bacteria were purified and assessed for TR activity. All of them displayed in vitro TR activity at the expense of NADH or NADPH oxidation. In general, optimal reducing conditions occurred around pH 9–10 and 37°C. Enzymes exhibiting strong TR activity produced Te-containing nanostructures (TeNS). While GorA and AhpF generated TeNS of 75 nm average diameter, E3 and YkgC produced larger structures (>100 nm). Electron-dense structures were observed in cells over-expressing genes encoding TrxB, GorA, and YkgC.
Collapse
Affiliation(s)
| | - Joaquín I Vargas-Pérez
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Wladimir Morales
- Centro de Bioinformática y Simulación Molecular, Universidad de Talca Talca, Chile
| | - Camilo Pinto
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Pablo Muñoz-Díaz
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Fabián A Cornejo
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Benoit Pugin
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Juan M Sandoval
- Facultad de Ciencias de la Salud e Instituto de Etnofarmacología, Universidad Arturo Prat Iquique, Chile
| | - Waldo A Díaz-Vásquez
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de ChileSantiago, Chile; Facultad de Ciencias de la Salud, Universidad San SebastiánSantiago, Chile
| | - Claudia Muñoz-Villagrán
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Fernanda Rodríguez-Rojas
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Eduardo H Morales
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Claudio C Vásquez
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| | - Felipe A Arenas
- Departamento de Biología, Facultad de Química y Biología, Universidad de Santiago de Chile Santiago, Chile
| |
Collapse
|
36
|
Identification of Novel Abiotic Stress Proteins in Triticum aestivum Through Functional Annotation of Hypothetical Proteins. Interdiscip Sci 2016; 10:205-220. [PMID: 27421996 DOI: 10.1007/s12539-016-0178-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 06/15/2016] [Accepted: 07/07/2016] [Indexed: 01/14/2023]
Abstract
Cereal grain bread wheat (T. aestivum) is an important source of food and belongs to Poaceae family. Hypothetical proteins (HPs), i.e., proteins with unknown functions, share a substantial portion of wheat proteomes and play important roles in growth and physiology of plant system. Several functional annotations studies utilizing the protein sequences for characterization of role of individual protein in physiology of plant systems were being reported in recent past. In this study, an integrated pipeline of software/servers has been used for the identification and functional annotation of 124 unique HPs of T. aestivum considering available data in NCBI till date. All HPs were broadly annotated, out of which functions of 77 HPs were successfully assigned with high confidence level. Precisely functional annotation of remaining 47 HPs is also characterized with low confidence. Several latest versions of protein family databases, pathways information, genomics context methods and in silico tools were utilized to identify and assign function for individual HPs. Annotation result of several HPs mainly belongs to cellular protein, metabolic enzymes, binding proteins, transmembrane proteins, transcription factors and photosystem regulator proteins. Subsequently, functional analysis has revealed the role of few HPs in abiotic stress, which were further verified by phylogenetic analysis. The functionally associated proteins with each of above-mentioned abiotic stress-related proteins were identified through protein-protein interaction network analysis. The outcome of this study may be helpful for formulating general set pipeline/protocols for a better understanding of the role of HPs in physiological development of various plant systems.
Collapse
|
37
|
Dixit K, Rahman M, Nath A, Sundaram S. Elucidating hydrogenase surfaces and tracing the intramolecular tunnels for hydrogenase inhibition in microalgal species. Bioinformation 2016; 12:165-171. [PMID: 28149051 PMCID: PMC5267960 DOI: 10.6026/97320630012165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2016] [Revised: 05/25/2016] [Accepted: 05/26/2016] [Indexed: 11/23/2022] Open
Abstract
Intramolecular tunnels are majorly attracting attention as possible pathways for entry of inhibitors like oxygen and carbon monoxide to the active sites of the enzymes, hydrogenases. The results of homology modeling of the HydSL protein, a NiFe-hydrogenase from Chlamydomonas reinhardtii and Chlorella vulgaris are presented in this work. Here we identify and describe molecular tunnels observed in HydSL hydrogenase enzyme systems. The possible determinant of the oxygen stability of already studied hydrogenases could be the lack of several intramolecular tunnels. The possible tunnels were traced out using MOLE 2 software, which showed several intramolecular pathways that may be connecting the active sites of the enzyme. The RMSD value showed a great deal of significance in the enzyme homology. This is the first report of its kind in which mapping of the intramolecular tunnels in the four-hydrogenase enzymes disclosed potential variations between designed models and acknowledged structures. We are seeking out the explanations for oxygen sensitivity of studied hydrogenases within the structure of intramolecular tunnels. Local and Global RMSD (Root mean square deviation) was calculated for models and templates, which showed value of 1.284 indicating a successful homology model. The tunnel tracing study by Mole 2 indicated two tunnels joined into one in C. reinhardtii model whereas C. vulgaris model showed one tunnel almost like two tunnels. Templates of both the A. vinosum and D. vulgaris hydrogenase consisted of six tunnels. For HydSL from Chlamydomonas and Chlorella Species the maximal potential was set to 250 kcal/mol (1,046 kJ/mol) and the positive potential areas were marked. Electrostatic studies define electrostatic potential (ESP) that help shuttle protons to the active site.
Collapse
Affiliation(s)
- Kritika Dixit
- Centre of Biotechnology, University of Allahabad, Allahabad, Uttar Pradesh 211002 India
| | - Md.Akhlaqur Rahman
- Centre of Biotechnology, University of Allahabad, Allahabad, Uttar Pradesh 211002 India
| | - Adi Nath
- Centre of Biotechnology, University of Allahabad, Allahabad, Uttar Pradesh 211002 India
| | - Shanthy Sundaram
- Centre of Biotechnology, University of Allahabad, Allahabad, Uttar Pradesh 211002 India
| |
Collapse
|
38
|
Venev SV, Zeldovich KB. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation. J Chem Phys 2016; 143:055101. [PMID: 26254668 DOI: 10.1063/1.4927565] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.
Collapse
Affiliation(s)
- Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 368 Plantation St, Worcester, Massachusetts 01605, USA
| |
Collapse
|
39
|
Sirota FL, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. Single-residue posttranslational modification sites at the N-terminus, C-terminus or in-between: To be or not to be exposed for enzyme access. Proteomics 2016; 15:2525-46. [PMID: 26038108 PMCID: PMC4745020 DOI: 10.1002/pmic.201400633] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/17/2015] [Accepted: 05/29/2015] [Indexed: 11/30/2022]
Abstract
Many protein posttranslational modifications (PTMs) are the result of an enzymatic reaction. The modifying enzyme has to recognize the substrate protein's sequence motif containing the residue(s) to be modified; thus, the enzyme's catalytic cleft engulfs these residue(s) and the respective sequence environment. This residue accessibility condition principally limits the range where enzymatic PTMs can occur in the protein sequence. Non‐globular, flexible, intrinsically disordered segments or large loops/accessible long side chains should be preferred whereas residues buried in the core of structures should be void of what we call canonical, enzyme‐generated PTMs. We investigate whether PTM sites annotated in UniProtKB (with MOD_RES/LIPID keys) are situated within sequence ranges that can be mapped to known 3D structures. We find that N‐ or C‐termini harbor essentially exclusively canonical PTMs. We also find that the overwhelming majority of all other PTMs are also canonical though, later in the protein's life cycle, the PTM sites can become buried due to complex formation. Among the remaining cases, some can be explained (i) with autocatalysis, (ii) with modification before folding or after temporary unfolding, or (iii) as products of interaction with small, diffusible reactants. Others require further research how these PTMs are mechanistically generated in vivo.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,School of Biological Sciences (SBS), Nanyang Technological University (NTU), Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore.,School of Computer Engineering (SCE), Nanyang Technological University (NTU), Singapore
| |
Collapse
|
40
|
Xu J, Zhang J. Impact of structure space continuity on protein fold classification. Sci Rep 2016; 6:23263. [PMID: 27006112 PMCID: PMC4804218 DOI: 10.1038/srep23263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 03/03/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure classification hierarchically clusters domain structures based on structure and/or sequence similarities and plays important roles in the study of protein structure-function relationship and protein evolution. Among many classifications, SCOP and CATH are widely viewed as the gold standards. Fold classification is of special interest because this is the lowest level of classification that does not depend on protein sequence similarity. The current fold classifications such as those in SCOP and CATH are controversial because they implicitly assume that folds are discrete islands in the structure space, whereas increasing evidence suggests significant similarities among folds and supports a continuous fold space. Although this problem is widely recognized, its impact on fold classification has not been quantitatively evaluated. Here we develop a likelihood method to classify a domain into the existing folds of CATH or SCOP using both query-fold structure similarities and within-fold structure heterogeneities. The new classification differs from the original classification for 3.4-12% of domains, depending on factors such as the structure similarity score and original classification scheme used. Because these factors differ for different biological purposes, our results indicate that the importance of considering structure space continuity in fold classification depends on the specific question asked.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
41
|
Tongsook C, Uhl MK, Jankowitsch F, Mack M, Gruber K, Macheroux P. Structural and kinetic studies on RosA, the enzyme catalysing the methylation of 8-demethyl-8-amino-d-riboflavin to the antibiotic roseoflavin. FEBS J 2016; 283:1531-49. [PMID: 26913589 PMCID: PMC4982073 DOI: 10.1111/febs.13690] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 01/26/2016] [Accepted: 02/18/2016] [Indexed: 11/28/2022]
Abstract
N,N‐8‐demethyl‐8‐amino‐d‐riboflavin dimethyltransferase (RosA) catalyses the final dimethylation of 8‐demethyl‐8‐amino‐d‐riboflavin (AF) to the antibiotic roseoflavin (RoF) in Streptomyces davawensis. In the present study, we solved the X‐ray structure of RosA, and determined the binding properties of substrates and products. Moreover, we used steady‐state and rapid reaction kinetic studies to obtain detailed information on the reaction mechanism. The structure of RosA was found to be similar to that of previously described S‐adenosylmethionine (SAM)‐dependent methyltransferases, featuring two domains: a mainly α‐helical ‘orthogonal bundle’ and a Rossmann‐like domain (α/β twisted open sheet). Bioinformatics studies and molecular modelling enabled us to predict the potential SAM and AF binding sites in RosA, suggesting that both substrates, AF and SAM, bind independently to their respective binding pocket. This finding was confirmed by kinetic experiments that demonstrated a random‐order ‘bi‐bi’ reaction mechanism. Furthermore, we determined the dissociation constants for substrates and products by either isothermal titration calorimetry or UV/Vis absorption spectroscopy, revealing that both products, RoF and S‐adenosylhomocysteine (SAH), bind more tightly to RosA compared with the substrates, AF and SAM. This suggests that RosA may contribute to roseoflavin resistance in S. davawensis. The tighter binding of products is also reflected by the results of inhibition experiments, in which RoF and SAH behave as competitive inhibitors for AF and SAM, respectively. We also showed that formation of a ternary complex of RosA, RoF and SAH (or SAM) leads to drastic spectral changes that are indicative of a hydrophobic environment. Database Structural data are available in the Protein Data Bank under accession number 4D7K.
Collapse
Affiliation(s)
| | - Michael K Uhl
- Institute of Molecular Biosciences, University of Graz, Austria
| | - Frank Jankowitsch
- Department of Biotechnology, Institute for Technical Microbiology, Mannheim University of Applied Sciences, Germany
| | - Matthias Mack
- Department of Biotechnology, Institute for Technical Microbiology, Mannheim University of Applied Sciences, Germany
| | - Karl Gruber
- Institute of Molecular Biosciences, University of Graz, Austria
| | - Peter Macheroux
- Institute of Biochemistry, Graz University of Technology, Austria
| |
Collapse
|
42
|
Warne B, Harkins CP, Harris SR, Vatsiou A, Stanley-Wall N, Parkhill J, Peacock SJ, Palmer T, Holden MTG. The Ess/Type VII secretion system of Staphylococcus aureus shows unexpected genetic diversity. BMC Genomics 2016; 17:222. [PMID: 26969225 PMCID: PMC4788903 DOI: 10.1186/s12864-016-2426-7] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 02/01/2016] [Indexed: 02/03/2023] Open
Abstract
Background Type VII protein secretion (T7SS) is a specialised system for excreting extracellular proteins across bacterial cell membranes and has been associated with virulence in Staphylococcus aureus. The genetic diversity of the ess locus, which encodes the T7SS, and the functions of proteins encoded within it are poorly understood. Results We used whole genome sequence data from 153 isolates representative of the diversity of the species to investigate the genetic variability of T7SS across S. aureus. The ess loci were found to comprise of four distinct modules based on gene content and relative conservation. Modules 1 and 4, comprising of the 5’ and 3’ modules of the ess locus, contained the most conserved clusters of genes across the species. Module 1 contained genes encoding the secreted protein EsxA, and the EsaAB and EssAB components of the T7SS machinery, and Module 4 contained two functionally uncharacterized conserved membrane proteins. Across the species four variants of Module 2 were identified containing the essC gene, each of which was associated with a specific group of downstream genes. The most diverse module of the ess locus was Module 3 comprising a highly variable arrangement of hypothetical proteins. RNA-Seq was performed on representatives of the four Module 2 variants and demonstrated strain-specific differences in the levels of transcription in the conserved Module 1 components and transcriptional linkage Module 2, and provided evidence of the expression of genes the variable regions of the ess loci. Conclusions The ess locus of S. aureus exhibits modularity and organisational variation across the species and transcriptional variation. In silico analysis of ess loci encoded hypothetical proteins identified potential novel secreted substrates for the T7SS. The considerable variety in operon arrangement between otherwise closely related isolates provides strong evidence for recombination at this locus. Comparison of these recombination regions with each other, and with the genomes of other Staphylococcal species, failed to identify evidence of intra- and inter-species recombination, however the analysis identified a novel T7SS in another pathogenic staphylococci, Staphylococcus lugdunensis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2426-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ben Warne
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 15A, UK.,University of Cambridge, Addenbrooke's Hospital, Cambridge, CB2 0QQ, UK
| | - Catriona P Harkins
- Division of Molecular Microbiology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK.,School of Medicine, University of St Andrews, St Andrews, KY16 9TF, UK
| | - Simon R Harris
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 15A, UK
| | - Alexandra Vatsiou
- School of Medicine, University of St Andrews, St Andrews, KY16 9TF, UK
| | - Nicola Stanley-Wall
- Division of Molecular Microbiology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - Julian Parkhill
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 15A, UK
| | - Sharon J Peacock
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 15A, UK.,University of Cambridge, Addenbrooke's Hospital, Cambridge, CB2 0QQ, UK
| | - Tracy Palmer
- Division of Molecular Microbiology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK.
| | | |
Collapse
|
43
|
Accurate prediction of cellular co-translational folding indicates proteins can switch from post- to co-translational folding. Nat Commun 2016; 7:10341. [PMID: 26887592 PMCID: PMC4759629 DOI: 10.1038/ncomms10341] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 12/01/2015] [Indexed: 12/29/2022] Open
Abstract
The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally--a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process.
Collapse
|
44
|
Computational based functional analysis of Bacillus phytases. Comput Biol Chem 2016; 60:53-8. [DOI: 10.1016/j.compbiolchem.2015.11.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Revised: 09/25/2015] [Accepted: 11/06/2015] [Indexed: 11/19/2022]
|
45
|
Krossa S, Faust A, Ober D, Scheidig AJ. Comprehensive Structural Characterization of the Bacterial Homospermidine Synthase-an Essential Enzyme of the Polyamine Metabolism. Sci Rep 2016; 6:19501. [PMID: 26776105 PMCID: PMC4725965 DOI: 10.1038/srep19501] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 12/15/2015] [Indexed: 12/14/2022] Open
Abstract
The highly conserved bacterial homospermidine synthase (HSS) is a key enzyme of the polyamine metabolism of many proteobacteria including pathogenic strains such as Legionella pneumophila and Pseudomonas aeruginosa; The unique usage of NAD(H) as a prosthetic group is a common feature of bacterial HSS, eukaryotic HSS and deoxyhypusine synthase (DHS). The structure of the bacterial enzyme does not possess a lysine residue in the active center and thus does not form an enzyme-substrate Schiff base intermediate as observed for the DHS. In contrast to the DHS the active site is not formed by the interface of two subunits but resides within one subunit of the bacterial HSS. Crystal structures of Blastochloris viridis HSS (BvHSS) reveal two distinct substrate binding sites, one of which is highly specific for putrescine. BvHSS features a side pocket in the direct vicinity of the active site formed by conserved amino acids and a potential substrate discrimination, guiding, and sensing mechanism. The proposed reaction steps for the catalysis of BvHSS emphasize cation-π interaction through a conserved Trp residue as a key stabilizer of high energetic transition states.
Collapse
Affiliation(s)
- Sebastian Krossa
- Structural Biology-Zoological Institute, Kiel University, Am Botanischen Garten 11, 24118 Kiel, Germany
| | - Annette Faust
- Structural Biology-Zoological Institute, Kiel University, Am Botanischen Garten 11, 24118 Kiel, Germany
| | - Dietrich Ober
- Botanical Institute - Biochemical Ecology and Molecular Evolution, Kiel University, Am Botanischen Garten 1-9, 24118 Kiel, Germany
| | - Axel J Scheidig
- Structural Biology-Zoological Institute, Kiel University, Am Botanischen Garten 11, 24118 Kiel, Germany
| |
Collapse
|
46
|
Wojciechowski M, Gómez-Sicilia À, Carrión-Vázquez M, Cieplak M. Unfolding knots by proteasome-like systems: simulations of the behaviour of folded and neurotoxic proteins. MOLECULAR BIOSYSTEMS 2016; 12:2700-12. [DOI: 10.1039/c6mb00214e] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Knots in proteins have been proposed to resist proteasomal degradation, thought in turn to be related to neurodegenerative diseases such as Huntington.
Collapse
Affiliation(s)
| | - Àngel Gómez-Sicilia
- Instituto Cajal
- Consejo Superior de Investigaciones Científicas
- (CSIC)
- 28002 Madrid
- Spain
| | | | - Marek Cieplak
- Institute of Physics
- Polish Academy of Sciences
- PL-02668 Warsaw
- Poland
| |
Collapse
|
47
|
De Laet M, Gilis D, Rooman M. Stability strengths and weaknesses in protein structures detected by statistical potentials: Application to bovine seminal ribonuclease. Proteins 2015; 84:143-58. [DOI: 10.1002/prot.24962] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 10/27/2015] [Accepted: 11/09/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Marie De Laet
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Dimitri Gilis
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Marianne Rooman
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| |
Collapse
|
48
|
Ramos MV, de Oliveira RSB, Pereira HM, Moreno FBMB, Lobo MDP, Rebelo LM, Brandão-Neto J, de Sousa JS, Monteiro-Moreira ACO, Freitas CDT, Grangeiro TB. Crystal structure of an antifungal osmotin-like protein from Calotropis procera and its effects on Fusarium solani spores, as revealed by atomic force microscopy: Insights into the mechanism of action. PHYTOCHEMISTRY 2015; 119:5-18. [PMID: 26456062 DOI: 10.1016/j.phytochem.2015.09.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 08/25/2015] [Accepted: 09/30/2015] [Indexed: 05/11/2023]
Abstract
CpOsm is an antifungal osmotin/thaumatin-like protein purified from the latex of Calotropis procera. The protein is relatively thermostable and retains its antifungal activity over a wide pH range; therefore, it may be useful in the development of new antifungal drugs or transgenic crops with enhanced resistance to phytopathogenic fungi. To gain further insight into the mechanism of action of CpOsm, its three-dimensional structure was determined, and the effects of the protein on Fusarium solani spores were investigated by atomic force microscopy (AFM). The atomic structure of CpOsm was solved at a resolution of 1.61Å, and it contained 205 amino acid residues and 192 water molecules, with a final R-factor of 18.12% and an Rfree of 21.59%. The CpOsm structure belongs to the thaumatin superfamily fold and is characterized by three domains stabilized by eight disulfide bonds and a prominent charged cleft, which runs the length of the front side of the molecule. Similarly to other antifungal thaumatin-like proteins, the cleft of CpOsm is predominantly acidic. AFM images of F. solani spores treated with CpOsm resulted in striking morphological changes being induced by the protein. Spores treated with CpOsm were wrinkled, and the volume of these cells was reduced by approximately 80%. Treated cells were covered by a shell of CpOsm molecules, and the leakage of cytoplasmic content from these cells was also observed. Based on the structural features of CpOsm and the effects that the protein produces on F. solani spores, a possible mechanism of action is suggested and discussed.
Collapse
Affiliation(s)
- Marcio V Ramos
- Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil
| | - Raquel S B de Oliveira
- Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil
| | - Humberto M Pereira
- Instituto de Física de São Carlos, Universidade de São Paulo, 13563-120 São Carlos, São Paulo, Brazil
| | | | - Marina D P Lobo
- Núcleo de Biologia Experimental, Universidade de Fortaleza, Fortaleza, Ceará, Brazil
| | - Luciana M Rebelo
- Departamento de Física, Centro de Ciências, Universidade Federal do Ceará, Caixa Postal 6030, Campus do Pici, 60440-900 Fortaleza, Ceará, Brazil
| | - José Brandão-Neto
- Diamond Light Source, Harwell Science and Innovation Campus Didcot, Oxfordshire OX11 0DE, United Kingdom
| | - Jeanlex S de Sousa
- Departamento de Física, Centro de Ciências, Universidade Federal do Ceará, Caixa Postal 6030, Campus do Pici, 60440-900 Fortaleza, Ceará, Brazil
| | | | - Cléverson D T Freitas
- Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil
| | - Thalles Barbosa Grangeiro
- Departamento de Biologia, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil.
| |
Collapse
|
49
|
Peng X, He J, Niemi AJ. Clustering and percolation in protein loop structures. BMC STRUCTURAL BIOLOGY 2015; 15:22. [PMID: 26510704 PMCID: PMC4625449 DOI: 10.1186/s12900-015-0049-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 10/13/2015] [Indexed: 11/24/2022]
Abstract
Background High precision protein loop modelling remains a challenge, both in template based and template independent approaches to protein structure prediction. Method We introduce the concepts of protein loop clustering and percolation, to develop a quantitative approach to systematically classify the modular building blocks of loops in crystallographic folded proteins. These fragments are all different parameterisations of a unique kink solution to a generalised discrete nonlinear Schrödinger (DNLS) equation. Accordingly, the fragments are also local energy minima of the ensuing energy function. Results We show how the loop fragments cover practically all ultrahigh resolution crystallographic protein structures in Protein Data Bank (PDB), with a 0.2 Ångström root-mean-square (RMS) precision. We find that no more than 12 different loop fragments are needed, to describe around 38 % of ultrahigh resolution loops in PDB. But there is also a large number of loop fragments that are either unique, or very rare, and examples of unique fragments are found even in the structure of a myoglobin. Conclusions Protein loops are built in a modular fashion. The loops are composed of fragments that can be modelled by the kink of the DNLS equation. The majority of loop fragments are also common, which are shared by many proteins. These common fragments are probably important for supporting the overall protein conformation. But there are also several fragments that are either unique to a given protein, or very rare. Such fragments are probably related to the function of the protein. Furthermore, we have found that the amino acid sequence does not determine the structure in a unique fashion. There are many examples of loop fragments with an identical amino acid sequence, but with a very different structure. Electronic supplementary material The online version of this article (doi:10.1186/s12900-015-0049-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xubiao Peng
- Department of Physics and Astronomy, Uppsala University, P.O. Box 803, Uppsala, S-75108, Sweden.
| | - Jianfeng He
- School of Physics, Beijing Institute of Technology, Beijing, 100081, People's Republic of China.
| | - Antti J Niemi
- Department of Physics and Astronomy, Uppsala University, P.O. Box 803, Uppsala, S-75108, Sweden. .,Laboratoire de Mathematiques et Physique Theorique CNRS UMR 6083, Fédération Denis Poisson, Université de Tours, Parc de Grandmont, Tours, F37200, France.
| |
Collapse
|
50
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|