1
|
Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
|
2
|
In Silico Structural Analysis Exploring Conformational Folding of Protein Variants in Alzheimer's Disease. Int J Mol Sci 2023; 24:13543. [PMID: 37686347 PMCID: PMC10487466 DOI: 10.3390/ijms241713543] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/26/2023] [Accepted: 08/30/2023] [Indexed: 09/10/2023] Open
Abstract
Accurate protein structure prediction using computational methods remains a challenge in molecular biology. Recent advances in AI-powered algorithms provide a transformative effect in solving this problem. Even though AlphaFold's performance has improved since its release, there are still limitations that apply to its efficacy. In this study, a selection of proteins related to the pathology of Alzheimer's disease was modeled, with Presenilin-1 (PSN1) and its mutated variants in the foreground. Their structural predictions were evaluated using the ColabFold implementation of AlphaFold, which utilizes MMseqs2 for the creation of multiple sequence alignments (MSAs). A higher number of recycles than the one used in the AlphaFold DB was selected, and no templates were used. In addition, prediction by RoseTTAFold was also applied to address how structures from the two deep learning frameworks match reality. The resulting conformations were compared with the corresponding experimental structures, providing potential insights into the predictive ability of this approach in this particular group of proteins. Furthermore, a comprehensive examination was performed on features such as predicted regions of disorder and the potential effect of mutations on PSN1. Our findings consist of highly accurate superpositions with little or no deviation from experimentally determined domain-level models.
Collapse
|
3
|
Ribonuclease T2 represents a distinct circularly permutated version of the BECR RNases. Protein Sci 2023; 32:e4531. [PMID: 36477982 PMCID: PMC9793965 DOI: 10.1002/pro.4531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 11/07/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022]
Abstract
Detection of homologous relationships among proteins and understanding their mechanisms of diversification are major topics in the fields of protein science, bioinformatics, and phylogenetics. Recent developments in sequence/profile-based and structural similarity-based methods have greatly facilitated the unification and classification of many protein families into superfamilies or folds, yet many proteins remain unclassified in current protein databases. As one of the three earliest identified RNases in biology, ribonuclease T2, also known as RNase I in Escherichia coli, RNase Rh in fungi, or S-RNase in plant, is thought to be an ancient RNase family due to its widespread distribution and distinct structure. In this study, we present evidence that RNase T2 represents a circularly permutated version of the BECR (Barnase-EndoU-Colicin E5/D-RelE) fold RNases. This subtle relationship cannot be detected by traditional methods such as sequence/profile-based comparisons, structure-similarity searches, and circular permutation detections. However, we were able to identify the structural similarity using rational reconstruction of a theoretical RNase T2 ancestor via a reverse circular permutation process, followed by structural modeling using AlphaFold2, and structural comparisons. This relationship is further supported by the fact that RNase T2 and other typical BECR RNases, namely Colicin D, RNase A, and BrnT, share similar catalytic site configurations, all involving an analogous set of conserved residues on the α0 helix and the β4 strand of the BECR fold. This study revealed a hidden root of RNase T2 in bacterial toxin systems and demonstrated that reconstruction and modeling of ancestral topology is an effective strategy to identify remote relationship between proteins.
Collapse
|
4
|
Artificial protein sequences enable recognition of vicinal and distant protein functional relationships. Proteins 2020; 88:1688-1700. [PMID: 32725917 DOI: 10.1002/prot.25986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/29/2020] [Accepted: 07/26/2020] [Indexed: 11/07/2022]
Abstract
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.
Collapse
|
5
|
Sequence and Structure Properties Uncover the Natural Classification of Protein Complexes Formed by Intrinsically Disordered Proteins via Mutual Synergistic Folding. Int J Mol Sci 2019; 20:ijms20215460. [PMID: 31683980 PMCID: PMC6862064 DOI: 10.3390/ijms20215460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 10/28/2019] [Accepted: 10/30/2019] [Indexed: 12/17/2022] Open
Abstract
Intrinsically disordered proteins mediate crucial biological functions through their interactions with other proteins. Mutual synergistic folding (MSF) occurs when all interacting proteins are disordered, folding into a stable structure in the course of the complex formation. In these cases, the folding and binding processes occur in parallel, lending the resulting structures uniquely heterogeneous features. Currently there are no dedicated classification approaches that take into account the particular biological and biophysical properties of MSF complexes. Here, we present a scalable clustering-based classification scheme, built on redundancy-filtered features that describe the sequence and structure properties of the complexes and the role of the interaction, which is directly responsible for structure formation. Using this approach, we define six major types of MSF complexes, corresponding to biologically meaningful groups. Hence, the presented method also shows that differences in binding strength, subcellular localization, and regulation are encoded in the sequence and structural properties of proteins. While current protein structure classification methods can also handle complex structures, we show that the developed scheme is fundamentally different, and since it takes into account defining features of MSF complexes, it serves as a better representation of structures arising through this specific interaction mode.
Collapse
|
6
|
Protein tertiary structure prediction using hidden Markov model based on lattice. J Bioinform Comput Biol 2019; 17:1950007. [PMID: 31057069 DOI: 10.1142/s0219720019500070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of protein structure from its amino acid sequence is one of the most prominent problems in computational biology. The biological function of a protein depends on its tertiary structure which is determined by its amino acid sequence via the process of protein folding. We propose a novel fold recognition method for protein tertiary structure prediction based on a hidden Markov model and 3D coordinates of amino acid residues. The method introduces states based on the basis vectors in Bravais cubic lattices to learn the path of amino acids of the proteins of each fold. Three hidden Markov models are considered based on simple cubic, body-centered cubic (BCC) and face-centered cubic (FCC) lattices. A 10-fold cross validation was performed on a set of 42 fold SCOP dataset. The proposed composite methodology is compared to fold recognition methods which have HMM as base of their algorithms having approaches on only amino acid sequence or secondary structure. The accuracy of proposed model based on face-centered cubic lattices is quite better in comparison with SAM, 3-HMM optimized and Markov chain optimized in overall experiment. The huge data of 3D space help the model to have greater performance in comparison to methods which use only primary structures or only secondary structures.
Collapse
|
7
|
Scalable remote homology detection and fold recognition in massive protein networks. Proteins 2019; 87:478-491. [PMID: 30714638 DOI: 10.1002/prot.25669] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 12/19/2018] [Accepted: 01/31/2019] [Indexed: 11/10/2022]
Abstract
The global connectivities in very large protein similarity networks contain traces of evolution among the proteins for detecting protein remote evolutionary relations or structural similarities. To investigate how well a protein network captures the evolutionary information, a key limitation is the intensive computation of pairwise sequence similarities needed to construct very large protein networks. In this article, we introduce label propagation on low-rank kernel approximation (LP-LOKA) for searching massively large protein networks. LP-LOKA propagates initial protein similarities in a low-rank graph by Nyström approximation without computing all pairwise similarities. With scalable parallel implementations based on distributed-memory using message-passing interface and Apache-Hadoop/Spark on cloud, LP-LOKA can search protein networks with one million proteins or more. In the experiments on Swiss-Prot/ADDA/CASP data, LP-LOKA significantly improved protein ranking over the widely used HMM-HMM or profile-sequence alignment methods utilizing large protein networks. It was observed that the larger the protein similarity network, the better the performance, especially on relatively small protein superfamilies and folds. The results suggest that computing massively large protein network is necessary to meet the growing need of annotating proteins from newly sequenced species and LP-LOKA is both scalable and accurate for searching massively large protein networks.
Collapse
|
8
|
The role of negative selection in protein evolution revealed through the energetics of the native state ensemble. Proteins 2016; 84:435-47. [PMID: 26800099 DOI: 10.1002/prot.24989] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 12/15/2015] [Accepted: 12/19/2015] [Indexed: 12/14/2022]
Abstract
Knowing the determinants of conformational specificity is essential for understanding protein structure, stability, and fold evolution. To address this issue, a novel statistical measure of energetic compatibility between sequence and structure was developed using an experimentally validated model of the energetics of the native state ensemble. This approach successfully matched sequences from a diverse subset of the human proteome to their respective folds. Unexpectedly, significant energetic compatibility between ostensibly unrelated sequences and structures was also observed. Interrogation of these matches revealed a general framework for understanding the origins of conformational specificity within a proteome: specificity is a complex function of both the ability of a sequence to adopt folds other than the native, and ability of a fold to accommodate sequences other than the native. The regional variation in energetic compatibility indicates that the compatibility is dominated by incompatibility of sequence for alternative fold segments, suggesting that evolution of protein sequences has involved substantial negative selection, with certain segments serving as "gatekeepers" that presumably prevent alternative structures. Beyond these global trends, a size dependence exists in the degree to which the energetic compatibility is determined from negative selection, with smaller proteins displaying more negative selection. This partially explains how short sequences can adopt unique folds, despite the higher probability in shorter proteins for small numbers of mutations to increase compatibility with other folds. In providing evolutionary ground rules for the thermodynamic relationship between sequence and fold, this framework imparts valuable insight for rational design of unique folds or fold switches.
Collapse
|
9
|
Template based protein structure modeling by global optimization in CASP11. Proteins 2015; 84 Suppl 1:221-32. [PMID: 26329522 DOI: 10.1002/prot.24917] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 08/04/2015] [Accepted: 08/21/2015] [Indexed: 11/11/2022]
Abstract
For the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed. It can incorporate sensitive positional and environmental dependence in alignment scores as well as strong nonlinear correlations among various features. Modifications and adjustments were made to the form of the energy function and weight parameters pertaining to the chain building procedure. For the side-chain remodeling step, residue-type dependence was introduced to the cutoff value that determines the entry of a rotamer to the side-chain modeling library. The improved performance of the nns server method is attributed to successful fold recognition achieved by combining several methods including CRFalign and to the current modeling formulation that can incorporate native-like structural aspects present in multiple templates. The LEE protocol is identical to the nns one except that CASP11-released server models are used as templates. The success of LEE in utilizing CASP11 server models indicates that proper template screening and template clustering assisted by appropriate cluster ranking promises a new direction to enhance protein 3D modeling. Proteins 2016; 84(Suppl 1):221-232. © 2015 Wiley Periodicals, Inc.
Collapse
|
10
|
Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 2014; 24:145-53. [PMID: 25297700 DOI: 10.1002/pro.2581] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
Abstract
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Collapse
|
11
|
Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. J Comput Chem 2014; 35:2040-6. [PMID: 25212657 DOI: 10.1002/jcc.23718] [Citation(s) in RCA: 110] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2014] [Revised: 07/12/2014] [Accepted: 08/09/2014] [Indexed: 11/09/2022]
Abstract
Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org.
Collapse
|
12
|
A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J Comput Chem 2013; 34:1925-36. [PMID: 23728619 DOI: 10.1002/jcc.23339] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 03/09/2013] [Accepted: 04/21/2013] [Indexed: 12/19/2022]
Abstract
One of the major challenges for protein tertiary structure prediction strategies is the quality of conformational sampling algorithms, which can effectively and readily search the protein fold space to generate near-native conformations. In an effort to advance the field by making the best use of available homology as well as fold recognition approaches along with ab initio folding methods, we have developed Bhageerath-H Strgen, a homology/ab initio hybrid algorithm for protein conformational sampling. The methodology is tested on the benchmark CASP9 dataset of 116 targets. In 93% of the cases, a structure with TM-score ≥ 0.5 is generated in the pool of decoys. Further, the performance of Bhageerath-H Strgen was seen to be efficient in comparison with different decoy generation methods. The algorithm is web enabled as Bhageerath-H Strgen web tool which is made freely accessible for protein decoy generation (http://www.scfbio-iitd.res.in/software/Bhageerath-HStrgen1.jsp).
Collapse
|
13
|
Abstract
The depth of each atom/residue in a protein structure is a key attribution that has been widely used in protein structure modeling and function annotation. However, the accurate calculation of depth is time consuming. Here, we propose to use the Euclidean distance transform (EDT) to calculate the depth, which conveniently converts the protein structure to a 3D gray-scale image with each pixel labeling the minimum distance of the pixel to the surface of the molecule (i.e. the depth). We tested the proposed EDT method on a set of 261 non-redundant protein structures. The data show that the EDT method is 2.6 times faster than the widely used method by Chakravarty and Varadarajan. The depth value by EDT method is also highly accurate, which is almost identical to the depth calculated by exhaustive search (Pearson's correlation coefficient≈1). We believe the EDT-based depth calculation program can be used as an efficient tool to assist the studies of protein fold recognition and structure-based function annotation.
Collapse
|
14
|
HORIBALFRE program: Higher Order Residue Interactions Based ALgorithm for Fold REcognition. Bioinformation 2011; 7:352-9. [PMID: 22355236 PMCID: PMC3280490 DOI: 10.6026/97320630007352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 11/24/2011] [Indexed: 11/23/2022] Open
Abstract
Understanding the functional and structural implication of a protein encoded in novel genes using function association or fold recognition approaches remains to be a challenging task in the current era of genomes, metagenomes and personal genomes. In an attempt to enhance potential-based fold-recognition methods in recognizing remote homology between proteins, we propose a new approach "Higher Order Residue Interaction Based ALgorithm for Fold REcognition (HORIBALFRE)". Higher order residue interactions refer to a class of interactions in protein structures mediated by C(α) or C(β) atoms within a pre-defined distance cut-off. Higher order residue interactions (pairwise, triplet and quadruplet interactions) play a vital role in attaining the stable conformation of a protein structure. In HORIBALFRE, we incorporated the potential contributions from two body (pairwise) interactions, three body (triplet interactions) and four-body (quadruple interaction) interactions, to implement a new fold recognition algorithm. Core of HORIBALFRE algorithm includes the potentials generated from a library of protein structure derived from manually curated CAMPASS database of structure based sequence alignment. We used Fischer's dataset, with 68 templates and 56 target sequences, derived from SCOP database and performed one-against-all sequence alignment using TCoffee. Various potentials were derived using custom scripts and these potentials were incorporated in the HORIBALFRE algorithm. In this manuscript, we report outline of a novel fold recognition algorithm and initial results. Our results show that inclusion of quadruplet class of higher order residue interaction improves fold recognition.
Collapse
|
15
|
Profile conditional random fields for modeling protein families with structural information. Biophysics (Nagoya-shi) 2009; 5:37-44. [PMID: 27857577 PMCID: PMC5036637 DOI: 10.2142/biophysics.5.37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Accepted: 05/12/2009] [Indexed: 12/01/2022] Open
Abstract
A statistical model of protein families, called profile conditional random fields (CRFs), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate arbitrary correlations in the sequences to be aligned to the model. In addition, like in the FR theory, the profile CRF can incorporate long-range pair-wise interactions between model states via mean-field-like approximations. We give the detailed formulation of the model, self-consistent approximations for treating long-range interactions, and algorithms for computing partition functions and marginal probabilities. We also outline the methods for the global optimization of model parameters as well as a Bayesian framework for parameter learning and selection of optimal alignments.
Collapse
|
16
|
HsdR subunit of the type I restriction-modification enzyme EcoR124I: biophysical characterisation and structural modelling. J Mol Biol 2008; 376:438-452. [PMID: 18164032 PMCID: PMC2878639 DOI: 10.1016/j.jmb.2007.11.024] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Revised: 11/08/2007] [Accepted: 11/09/2007] [Indexed: 01/19/2023]
Abstract
Type I restriction-modification (RM) systems are large, multifunctional enzymes composed of three different subunits. HsdS and HsdM form a complex in which HsdS recognizes the target DNA sequence, and HsdM carries out methylation of adenosine residues. The HsdR subunit, when associated with the HsdS-HsdM complex, translocates DNA in an ATP-dependent process and cleaves unmethylated DNA at a distance of several thousand base-pairs from the recognition site. The molecular mechanism by which these enzymes translocate the DNA is not fully understood, in part because of the absence of crystal structures. To date, crystal structures have been determined for the individual HsdS and HsdM subunits and models have been built for the HsdM-HsdS complex with the DNA. However, no structure is available for the HsdR subunit. In this work, the gene coding for the HsdR subunit of EcoR124I was re-sequenced, which showed that there was an error in the published sequence. This changed the position of the stop codon and altered the last 17 amino acid residues of the protein sequence. An improved purification procedure was developed to enable HsdR to be purified efficiently for biophysical and structural analysis. Analytical ultracentrifugation shows that HsdR is monomeric in solution, and the frictional ratio of 1.21 indicates that the subunit is globular and fairly compact. Small angle neutron-scattering of the HsdR subunit indicates a radius of gyration of 3.4 nm and a maximum dimension of 10 nm. We constructed a model of the HsdR using protein fold-recognition and homology modelling to model individual domains, and small-angle neutron scattering data as restraints to combine them into a single molecule. The model reveals an ellipsoidal shape of the enzymatic core comprising the N-terminal and central domains, and suggests conformational heterogeneity of the C-terminal region implicated in binding of HsdR to the HsdS-HsdM complex.
Collapse
|
17
|
Archaea recruited D-Tyr-tRNATyr deacylase for editing in Thr-tRNA synthetase. RNA (NEW YORK, N.Y.) 2004; 10:1845-1851. [PMID: 15525705 PMCID: PMC1370672 DOI: 10.1261/rna.7115404] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2004] [Accepted: 09/06/2004] [Indexed: 05/24/2023]
Abstract
Aminoacyl-tRNA synthetases (AARSs) are key players in the maintenance of the genetic code through correct pairing of amino acids with their cognate tRNA molecules. To this end, some AARSs, as well as seeking to recognize the correct amino acid during synthesis of aminoacyl-tRNA, enhance specificity through recognition of mischarged aminoacyl-tRNA molecules in a separate editing reaction. Recently, an editing domain, of uncertain provenance, idiosyncratic to some archaeal ThrRSs has been characterized. Here, sequence analyses and molecular modeling are reported that clearly show a relationship of the archaea-specific ThrRS editing domains with d-Tyr-tRNATyr deacylases (DTDs). The model enables the identification of the catalytic site and other substrate binding residues, as well as the proposal of a likely catalytic mechanism. Interestingly, typical DTD sequences, common in bacteria and eukaryotes, are entirely absent in archaea, consistent with an evolutionary scheme in which DTD was co-opted to serve as a ThrRS editing domain in archaea soon after their divergence from eukaryotes. A group of present-day archaebacteria contain a ThrRS obtained from a bacterium by horizontal gene transfer. In some of these cases a vestigial version of the original archaeal ThrRS, of potentially novel function, is maintained.
Collapse
|
18
|
Abstract
The ability to separate correct models of protein structures from less correct models is of the greatest importance for protein structure prediction methods. Several studies have examined the ability of different types of energy function to detect the native, or native-like, protein structure from a large set of decoys. In contrast to earlier studies, we examine here the ability to detect models that only show limited structural similarity to the native structure. These correct models are defined by the existence of a fragment that shows significant similarity between this model and the native structure. It has been shown that the existence of such fragments is useful for comparing the performance between different fold recognition methods and that this performance correlates well with performance in fold recognition. We have developed ProQ, a neural-network-based method to predict the quality of a protein model that extracts structural features, such as frequency of atom-atom contacts, and predicts the quality of a model, as measured either by LGscore or MaxSub. We show that ProQ performs at least as well as other measures when identifying the native structure and is better at the detection of correct models. This performance is maintained over several different test sets. ProQ can also be combined with the Pcons fold recognition predictor (Pmodeller) to increase its performance, with the main advantage being the elimination of a few high-scoring incorrect models. Pmodeller was successful in CASP5 and results from the latest LiveBench, LiveBench-6, indicating that Pmodeller has a higher specificity than Pcons alone.
Collapse
|
19
|
Homology modeling and molecular dynamics simulations of the N-terminal domain of wheat high molecular weight glutenin subunit 10. Protein Sci 2003; 12:34-43. [PMID: 12493826 PMCID: PMC2312395 DOI: 10.1110/ps.0229803] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
High molecular weight glutenin subunits (HMW-GS) are of a particular interest because of their biomechanical properties, which are important in many food systems such as breadmaking. Using fold-recognition techniques, we identified a fold compatible with the N-terminal domain of HMW-GS Dy10. This fold corresponds to the one adopted by proteins belonging to the cereal inhibitor family. Starting from three known protein structures of this family as templates, we built three models for the N-terminal domain of HMW-GS Dy10. We analyzed these models, and we propose a number of hypotheses regarding the N-terminal domain properties that can be tested experimentally. In particular, we discuss two possible ways of interaction between the N-terminal domains of the y-type HMW glutenin subunits. The first way consists in the creation of interchain disulfide bridges. According to our models, we propose two plausible scenarios: (1) the existence of an intrachain disulfide bridge between cysteines 22 and 44, leaving the three other cysteines free of engaging in intermolecular bonds; and (2) the creation of two intrachain disulfide bridges (involving cysteines 22-44 and cysteines 10-55), leaving a single cysteine (45) for creating an intermolecular disulfide bridge. We discuss these scenarios in relation to contradictory experimental results. The second way, although less likely, is nevertheless worth considering. There might exist a possibility for the N-terminal domain of Dy10, Nt-Dy10, to create oligomers, because homologous cereal inhibitor proteins are known to exist as monomers, homodimers, and heterooligomers. We also discuss, in relation to the function of the cereal inhibitor proteins, the possibility that this N-terminal domain has retained similar inhibitory functions.
Collapse
|
20
|
Abstract
The prfA gene product of Gram-positive bacteria is unusual in being implicated in several cellular processes; cell wall synthesis, chromosome segregation, and DNA recombination and repair. However, no homology of PrfA with other proteins has been evident. Here we report a structural relationship between PrfA and the restriction enzyme PvuII, and thereby produce models that predict that PrfA binds DNA. Indeed, wild-type Bacillus stearothermophilus PrfA, but not a catalytic site mutant, nicked one strand of supercoiled plasmid templates leaving 5'-phosphate and 3'-hydroxyl termini. This activity, much lower on linear or relaxed circular double-stranded DNA or on single-stranded DNA, is consistent with a role for this protein in chromosome segregation, DNA recombination, or DNA repair.
Collapse
|
21
|
Abstract
Fold recognition predicts protein three-dimensional structure by establishing relationships between a protein sequence and known protein structures. Most methods explicitly use information derived from the secondary and tertiary structure of the templates. Here we show that rigorous application of a sequence search method (PSI-BLAST) with no reference to secondary or tertiary structure information is able to perform as well as traditional fold recognition methods. Since the method, SENSER, does not require knowledge of the three-dimensional structure, it can be used to infer relationships that are not tractable by methods dependent on structural templates.
Collapse
|
22
|
Abstract
During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating approximately 8%-10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.
Collapse
|
23
|
Abstract
PYRIN domains were identified recently as putative protein-protein interaction domains at the N-termini of several proteins thought to function in apoptotic and inflammatory signaling pathways. The approximately 95 residue PYRIN domains have no statistically significant sequence homology to proteins with known three-dimensional structure. Using secondary structure prediction and potential-based fold recognition methods, however, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of protein-protein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. An homology model of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1, a member of the Apaf-1/Ced-4 family of proteins, was constructed using the three-dimensional structures of the FADD and p75 neurotrophin receptor DDs, and of the Apaf-1 and caspase-9 CARDs, as templates. Validation of the model using a variety of computational techniques indicates that the fold prediction is consistent with the sequence. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction.
Collapse
|
24
|
Modeling of the structural features of integral-membrane proteins reverse-environment prediction of integral membrane protein structure (REPIMPS). Protein Sci 2001; 10:1529-38. [PMID: 11468350 PMCID: PMC2374085 DOI: 10.1110/ps.6301] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
The Profiles-3D application, an inverse-folding methodology appropriate for water-soluble proteins, has been modified to allow the determination of structural properties of integral-membrane proteins (IMPs) and for testing the validity of solved and model structures of IMPs. The modification, known as reverse-environment prediction of integral membrane protein structure (REPIMPS), takes into account the fact that exposed areas of side chains for many residues in IMPs are in contact with lipid and not the aqueous phase. This (1) allows lipid-exposed residues to be classified into the correct physicochemical environment class, (2) significantly improves compatibility scores for IMPs whose structures have been solved, and (3) reduces the possibility of rejecting a three-dimensional structure for an IMP because the presence of lipid was not included. Validation tests of REPIMPS showed that it (1) can locate the transmembrane domain of IMPs with single transmembrane helices more frequently than a range of other methodologies, (2) can rotationally orient transmembrane helices with respect to the lipid environment and surrounding helices in IMPs with multiple transmembrane helices, and (3) has the potential to accurately locate transmembrane domains in IMPs with multiple transmembrane helices. We conclude that correcting for the presence of the lipid environment surrounding the transmembrane segments of IMPs is an essential step for reasonable modeling and verification of the three-dimensional structures of these proteins.
Collapse
|