1
|
Bahrami A, Najafi A, Hashemi M, Miraie-Ashtiani SR. PSSP: Protein splice site prediction algorithm using Bayesian approach. J Bioinform Comput Biol 2020; 17:1950034. [PMID: 32019415 DOI: 10.1142/s0219720019500343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
This study aimed to introduce an algorithm and identify intein motif and blocks involved in protein splicing, and explore the underlying methods in the development of detection of protein motifs. Inteins are mobile protein splicing elements capable of self-splicing post-translationally. They exist in viruses and bacteriophage, notwithstanding this broad phylogenetic distribution, all inteins apportion common structural features. A method was developed to predict intein in a raw sequence, using a ranking and scoring scheme based on amino acid θ value tables. This method aided in the identification and assessment of patterns characterizing the intein sequences. New intein conserved properties are revealed and the known ones are described and localized. We have computed the θ value of each amino acid at block A positions +1 to +13, block B positions l+13 to l+26 and block G positions -7 to +1 for the three categories. The consensus amino acids thus found are listed at the end of each row. We gave statistics for the distance between the blocks, block A to B, block B to F, and block F to G with the average being 66.1, 294, and 10.2 amino acids, respectively. The actual blocks A, B, and G of the one intein found in vacuolar membrane ATPase subunit, a precursor protein, are ranked 1. The results indicate all of the block sequences that are found in nine proteins are ranked at top of the list. The intein sequence is used to search the databases for intein-like proteins. Understanding the functional, structural, and dynamical aspects of inteins is important for intein engineering and the betterment of intein database.
Collapse
Affiliation(s)
- Abolfazl Bahrami
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Islamic Republic of Iran
| | - Ali Najafi
- Molecular Biology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Mohammadreza Hashemi
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Islamic Republic of Iran
| | - Seyed Reza Miraie-Ashtiani
- Department of Animal Science, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Islamic Republic of Iran
| |
Collapse
|
2
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
3
|
Murray JM, Maher S, Mota T, Suzuki K, Kelleher AD, Center RJ, Purcell D. Differentiating founder and chronic HIV envelope sequences. PLoS One 2017; 12:e0171572. [PMID: 28187204 PMCID: PMC5302377 DOI: 10.1371/journal.pone.0171572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 01/23/2017] [Indexed: 11/27/2022] Open
Abstract
Significant progress has been made in characterizing broadly neutralizing antibodies against the HIV envelope glycoprotein Env, but an effective vaccine has proven elusive. Vaccine development would be facilitated if common features of early founder virus required for transmission could be identified. Here we employ a combination of bioinformatic and operations research methods to determine the most prevalent features that distinguish 78 subtype B and 55 subtype C founder Env sequences from an equal number of chronic sequences. There were a number of equivalent optimal networks (based on the fewest covarying amino acid (AA) pairs or a measure of maximal covariance) that separated founders from chronics: 13 pairs for subtype B and 75 for subtype C. Every subtype B optimal solution contained the founder pairs 178–346 Asn-Val, 232–236 Thr-Ser, 240–340 Lys-Lys, 279–315 Asp-Lys, 291–792 Ala-Ile, 322–347 Asp-Thr, 535–620 Leu-Asp, 742–837 Arg-Phe, and 750–836 Asp-Ile; the most common optimal pairs for subtype C were 644–781 Lys-Ala (74 of 75 networks), 133–287 Ala-Gln (73/75) and 307–337 Ile-Gln (73/75). No pair was present in all optimal subtype C solutions highlighting the difficulty in targeting transmission with a single vaccine strain. Relative to the size of its domain (0.35% of Env), the α4β7 binding site occurred most frequently among optimal pairs, especially for subtype C: 4.2% of optimal pairs (1.2% for subtype B). Early sequences from 5 subtype B pre-seroconverters each exhibited at least one clone containing an optimal feature 553–624 (Ser-Asn), 724–747 (Arg-Arg), or 46–293 (Arg-Glu).
Collapse
Affiliation(s)
- John M. Murray
- School of Mathematics and Statistics, UNSW Sydney, Sydney, New South Wales, Australia
- * E-mail:
| | - Stephen Maher
- School of Mathematics and Statistics, UNSW Sydney, Sydney, New South Wales, Australia
- Zuse Institute Berlin, Berlin, Germany
| | - Talia Mota
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Kazuo Suzuki
- The Kirby Institute, UNSW Sydney, Sydney, New South Wales, Australia
| | | | - Rob J. Center
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Damian Purcell
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
4
|
Wozniak PP, Vriend G, Kotulska M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017; 33:1497-1504. [PMID: 28203707 DOI: 10.1093/bioinformatics/btx013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/11/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- P P Wozniak
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| |
Collapse
|
5
|
Bywater RP. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data. PLoS One 2016; 11:e0150769. [PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/18/2022] Open
Abstract
Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.
Collapse
|
6
|
Counts CJ, Ho PS, Donlin MJ, Tavis JE, Chen C. A Functional Interplay between Human Immunodeficiency Virus Type 1 Protease Residues 77 and 93 Involved in Differential Regulation of Precursor Autoprocessing and Mature Protease Activity. PLoS One 2015; 10:e0123561. [PMID: 25893662 PMCID: PMC4404164 DOI: 10.1371/journal.pone.0123561] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2014] [Accepted: 03/04/2015] [Indexed: 11/18/2022] Open
Abstract
HIV-1 protease (PR) is a viral enzyme vital to the production of infectious virions. It is initially synthesized as part of the Gag-Pol polyprotein precursor in the infected cell. The free mature PR is liberated as a result of precursor autoprocessing upon virion release. We previously described a model system to examine autoprocessing in transfected mammalian cells. Here, we report that a covariance analysis of miniprecursor (p6*-PR) sequences derived from drug naïve patients identified a series of amino acid pairs that vary together across independent viral isolates. These covariance pairs were used to build the first topology map of the miniprecursor that suggests high levels of interaction between the p6* peptide and the mature PR. Additionally, several PR-PR covariance pairs are located far from each other (>12 Å Cα to Cα) relative to their positions in the mature PR structure. Biochemical characterization of one such covariance pair (77-93) revealed that each residue shows distinct preference for one of three alkyl amino acids (V, I, and L) and that a polar or charged amino acid at either of these two positions abolishes precursor autoprocessing. The most commonly observed 77V is preferred by the most commonly observed 93I, but the 77I variant is preferred by other 93 variances (L, V, or M) in supporting precursor autoprocessing. Furthermore, the 77I93V covariant enhanced precursor autoprocessing and Gag polyprotein processing but decreased the mature PR activity. Therefore, both covariance and biochemical analyses support a functional association between residues 77 and 93, which are spatially distant from each other in the mature PR structure. Our data also suggests that these covariance pairs differentially regulate precursor autoprocessing and the mature protease activity.
Collapse
Affiliation(s)
- Christopher J. Counts
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado, United States of America
| | - P. Shing Ho
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado, United States of America
| | - Maureen J. Donlin
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, Missouri, United States of America
- Department of Biochemistry and Molecular Biology, Saint Louis University School of Medicine, St. Louis, Missouri, United States of America
- Saint Louis University Liver Center, Saint Louis University School of Medicine, St. Louis, Missouri, United States of America
| | - John E. Tavis
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, Missouri, United States of America
- Saint Louis University Liver Center, Saint Louis University School of Medicine, St. Louis, Missouri, United States of America
| | - Chaoping Chen
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
7
|
Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014; 15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to. RESULTS To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de. CONCLUSIONS Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
8
|
Gniewek P, Kolinski A, Kloczkowski A, Gront D. BioShell-Threading: versatile Monte Carlo package for protein 3D threading. BMC Bioinformatics 2014; 15:22. [PMID: 24444459 PMCID: PMC3937128 DOI: 10.1186/1471-2105-15-22] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2012] [Accepted: 11/18/2013] [Indexed: 11/26/2022] Open
Abstract
Background The comparative modeling approach to protein structure prediction inherently relies on a template structure. Before building a model such a template protein has to be found and aligned with the query sequence. Any error made on this stage may dramatically affects the quality of result. There is a need, therefore, to develop accurate and sensitive alignment protocols. Results BioShell threading software is a versatile tool for aligning protein structures, protein sequences or sequence profiles and query sequences to a template structures. The software is also capable of sub-optimal alignment generation. It can be executed as an application from the UNIX command line, or as a set of Java classes called from a script or a Java application. The implemented Monte Carlo search engine greatly facilitates the development and benchmarking of new alignment scoring schemes even when the functions exhibit non-deterministic polynomial-time complexity. Conclusions Numerical experiments indicate that the new threading application offers template detection abilities and provides much better alignments than other methods. The package along with documentation and examples is available at: http://bioshell.pl/threading3d.
Collapse
Affiliation(s)
| | | | | | - Dominik Gront
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| |
Collapse
|
9
|
Murray JM, Moenne-Loccoz R, Velay A, Habersetzer F, Doffoël M, Gut JP, Fofana I, Zeisel MB, Stoll-Keller F, Baumert TF, Schvoerer E. Genotype 1 hepatitis C virus envelope features that determine antiviral response assessed through optimal covariance networks. PLoS One 2013; 8:e67254. [PMID: 23840641 PMCID: PMC3688619 DOI: 10.1371/journal.pone.0067254] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Accepted: 05/14/2013] [Indexed: 01/25/2023] Open
Abstract
The poor response to the combined antiviral therapy of pegylated alfa-interferon and ribavarin for hepatitis C virus (HCV) infection may be linked to mutations in the viral envelope gene E1E2 (env), which can result in escape from the immune response and higher efficacy of viral entry. Mutations that result in failure of therapy most likely require compensatory mutations to achieve sufficient change in envelope structure and function. Compensatory mutations were investigated by determining positions in the E1E2 gene where amino acids (aa) covaried across groups of individuals. We assessed networks of covarying positions in E1E2 sequences that differentiated sustained virological response (SVR) from non-response (NR) in 43 genotype 1a (17 SVR), and 49 genotype 1b (25 SVR) chronically HCV-infected individuals. Binary integer programming over covariance networks was used to extract aa combinations that differed between response groups. Genotype 1a E1E2 sequences exhibited higher degrees of covariance and clustered into 3 main groups while 1b sequences exhibited no clustering. Between 5 and 9 aa pairs were required to separate SVR from NR in each genotype. aa in hypervariable region 1 were 6 times more likely than chance to occur in the optimal networks. The pair 531-626 (EI) appeared frequently in the optimal networks and was present in 6 of 9 NR in one of the 1a clusters. The most frequent pairs representing SVR were 431-481 (EE), 500-522 (QA) in 1a, and 407-434 (AQ) in 1b. Optimal networks based on covarying aa pairs in HCV envelope can indicate features that are associated with failure or success to antiviral therapy.
Collapse
Affiliation(s)
- John M Murray
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Luo Q, Hamer R, Reinert G, Deane CM. Local network patterns in protein-protein interfaces. PLoS One 2013; 8:e57031. [PMID: 23520460 PMCID: PMC3592891 DOI: 10.1371/journal.pone.0057031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 01/21/2013] [Indexed: 11/25/2022] Open
Abstract
Protein-protein interfaces hold the key to understanding protein-protein interactions. In this paper we investigated local interaction network patterns beyond pair-wise contact sites by considering interfaces as contact networks among residues. A contact site was defined as any residue on the surface of one protein which was in contact with a residue on the surface of another protein. We labeled the sub-graphs of these contact networks by their amino acid types. The observed distributions of these labeled sub-graphs were compared with the corresponding background distributions and the results suggested that there were preferred chemical patterns of closely packed residues at the interface. These preferred patterns point to biological constraints on physical proximity between those residues on one protein which were involved in binding to residues which were close on the interacting partner. Interaction interfaces were far from random and contain information beyond pairs and triangles. To illustrate the possible application of the local network patterns observed, we introduced a signature method, called iScore, based on these local patterns to assess interface predictions. On our data sets iScore achieved 83.6% specificity with 82% sensitivity.
Collapse
Affiliation(s)
- Qiang Luo
- Department of Management, College of Information Systems and Management, National University of Defense Technology, Changsha, Hunan, PR China.
| | | | | | | |
Collapse
|
11
|
Gong YN, Chen GW, Suchard MA. A novel empirical mutual information approach to identify co-evolving amino acid positions of influenza A viruses. Comput Biol Chem 2012; 39:20-8. [PMID: 22858722 DOI: 10.1016/j.compbiolchem.2012.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 06/22/2012] [Indexed: 11/30/2022]
Abstract
Mutual information (MI) is an approach commonly used to estimate the evolutionary correlation of 2 amino acid sites. Although several MI methods exist, prior to our contribution no systematic method had been developed to assess their performance, or to establish numerical thresholds to detect co-evolving amino acid sites. The current study performed a Markov chain Monte Carlo (MCMC) algorithm on influenza viral sequences to capture their evolutionary characteristics. A consensus maximum clade credibility (MCC) tree was estimated from the samples, together with their amino acid substitution statistics, from which we generated synthetic sequences of known dependent and independent paired amino acid sites. A pair-to-pair and influenza-specific amino acid substitution matrix (P2PFLU) incorporated into Bayesian Evolutionary Analysis Sampling Trees (BEAST) enumerated these synthetic sequences. The sequences inherited evolutionary features and co-varying characteristics from the real viral sequences, rendering these synthetic data ideal for exploring their co-evolving features. For the MI measure, we proposed a novel metric called the empirical MI (MI(Em)), which outperformed other MI measures in analysis of receiver operating characteristics (ROC). We implemented our approach on 1086 all-time PB2 sequences of influenza A H5N1 viruses, in which we found 97 sites exhibiting co-evolutionary substitution of one or more amino acid sites. In particular, PB2 451, along with eight other PB2 sites of various MI(Em) scores, was found to co-evolve with PB2 627, a known species-associated amino acid residue which plays a critical role in influenza virus replication.
Collapse
Affiliation(s)
- Yu-Nong Gong
- Graduate Institute of Electrical Engineering, Chang Gung University, Taoyuan, Taiwan
| | | | | |
Collapse
|
12
|
Cheng CP, Lee PF, Liu WC, Wu IC, Chin CY, Chang TT, Tseng VS. Analysis of precore/core covariances associated with viral kinetics and genotypes in hepatitis B e antigen-positive chronic hepatitis B patients. PLoS One 2012; 7:e32553. [PMID: 22384271 PMCID: PMC3288105 DOI: 10.1371/journal.pone.0032553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 02/01/2012] [Indexed: 12/17/2022] Open
Abstract
Hepatitis B virus (HBV) is one of the most common DNA viruses that can cause aggressive hepatitis, cirrhosis and hepatocellular carcinoma. Although many people are persistently infected with HBV, the kinetics in serum levels of viral loads and the host immune responses vary from person to person. HBV precore/core open reading frame (ORF) encoding proteins, hepatitis B e antigen (HBeAg) and core antigen (HBcAg), are two indicators of active viral replication. The aim of this study was to discover a variety of amino acid covariances in responses to viral kinetics, seroconversion and genotypes during the course of HBV infection. A one year follow-up study was conducted with a total number of 1,694 clones from 23 HBeAg-positive chronic hepatitis B patients. Serum alanine aminotransferase, HBV DNA and HBeAg levels were measured monthly as criteria for clustering patients into several different subgroups. Monthly derived multiple precore/core ORFs were directly sequenced and translated into amino acid sequences. For each subgroup, time-dependent covariances were identified from their time-varying sequences over the entire follow-up period. The fluctuating, wavering, HBeAg-nonseroconversion and genotype C subgroups showed greater degrees of covariances than the stationary, declining, HBeAg-seroconversion and genotype B. Referring to literature, mutation hotspots within our identified covariances were associated with the infection process. Remarkably, hotspots were predominant in genotype C. Moreover, covariances were also identified at early stage (spanning from baseline to a peak of serum HBV DNA) in order to determine the intersections with aforementioned time-dependent covariances. Preserved covariances, namely representative covariances, of each subgroup are visually presented using a tree-based structure. Our results suggested that identified covariances were strongly associated with viral kinetics, seroconversion and genotypes. Moreover, representative covariances may benefit clinicians to prescribe a suitable treatment for patients even if they have no obvious symptoms at the early stage of HBV infection.
Collapse
Affiliation(s)
- Chun-Pei Cheng
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Pei-Fen Lee
- Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
| | - Wen-Chun Liu
- Department of Biotechnology, Ming Dao University, Changhua, Taiwan
| | - I-Chin Wu
- Department of Internal Medicine, National Cheng Kung University Hospital, Tainan, Taiwan
- Graduate Institute of Clinical Medicine, National Cheng Kung University, Tainan, Taiwan
- Infectious Disease and Signaling Research Center, National Cheng Kung University, Tainan, Taiwan
| | - Chu-Yu Chin
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Ting-Tsung Chang
- Department of Internal Medicine, National Cheng Kung University Hospital, Tainan, Taiwan
- Institute of Basic Medical Sciences, National Cheng Kung University, Tainan, Taiwan
- Infectious Disease and Signaling Research Center, National Cheng Kung University, Tainan, Taiwan
| | - Vincent S. Tseng
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
- Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
13
|
Monastyrskyy B, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact predictions in CASP9. Proteins 2011; 79 Suppl 10:119-25. [PMID: 21928322 DOI: 10.1002/prot.23160] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Revised: 06/25/2011] [Accepted: 07/27/2011] [Indexed: 01/03/2023]
Abstract
This work presents the results of the assessment of the intramolecular residue-residue contact predictions submitted to CASP9. The methodology for the assessment does not differ from that used in previous CASPs, with two basic evaluation measures being the precision in recognizing contacts and the difference between the distribution of distances in the subset of predicted contact pairs versus all pairs of residues in the structure. The emphasis is placed on the prediction of long-range contacts (i.e., contacts between residues separated by at least 24 residues along sequence) in target proteins that cannot be easily modeled by homology. Although there is considerable activity in the field, the current analysis reports no discernable progress since CASP8.
Collapse
Affiliation(s)
- Bohdan Monastyrskyy
- Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
14
|
|
15
|
Hamer R, Luo Q, Armitage JP, Reinert G, Deane CM. i-Patch: interprotein contact prediction using local network information. Proteins 2011; 78:2781-97. [PMID: 20635422 DOI: 10.1002/prot.22792] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Biological processes are commonly controlled by precise protein-protein interactions. These connections rely on specific amino acids at the binding interfaces. Here we predict the binding residues of such interprotein complexes. We have developed a suite of methods, i-Patch, which predict the interprotein contact sites by considering the two proteins as a network, with residues as nodes and contacts as edges. i-Patch starts with two proteins, A and B, which are assumed to interact, but for which the structure of the complex is not available. However, we assume that for each protein, we have a reference structure and a multiple sequence alignment of homologues. i-Patch then uses the propensities of patches of residues to interact, to predict interprotein contact sites. i-Patch outperforms several other tested algorithms for prediction of interprotein contact sites. It gives 59% precision with 20% recall on a blind test set of 31 protein pairs. Combining the i-Patch scores with an existing correlated mutation algorithm, McBASC, using a logistic model gave little improvement. Results from a case study, on bacterial chemotaxis protein complexes, demonstrate that our predictions can identify contact residues, as well as suggesting unknown interfaces in multiprotein complexes.
Collapse
Affiliation(s)
- Rebecca Hamer
- Oxford Centre for Integrative Systems Biology, Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | | | | | | | | |
Collapse
|
16
|
Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Proteins 2010; 78:1980-91. [PMID: 20408174 DOI: 10.1002/prot.22714] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue-residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue-residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue-residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue-residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| | | |
Collapse
|
17
|
Liu Y, Bahar I. Toward understanding allosteric signaling mechanisms in the ATPase domain of molecular chaperones. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2009:269-80. [PMID: 19908379 DOI: 10.1142/9789814295291_0029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The ATPase cycle of the heat shock protein 70 (HSP70) is largely dependent on the ability of its nucleotide binding domain (NBD), also called ATPase domain, to undergo structural changes between its open and closed conformations. We present here a combined study of the Hsp70 NBD sequence, structure and dynamic features to identify the residues that play a crucial role in mediating the allosteric signaling properties of the ATPase domain. Specifically, we identify the residues involved in the shortest-path communications of the domain modeled as a network of nodes (residues) and links (equilibrium interactions). By comparing the calculations on both closed and open conformation of Hsp70 NBD, we identified a subset of central residues located at the interface between the two lobes of the NBD near the nucleotide binding site, which form a putative communication pathway invariant to structural changes. Two pairs of residues forming contacts at the interface in the closed conformation of the NBD are observed to no longer interact in the open conformation, suggesting that these specific interactions may play a switch role in establishing the transition of the NBD between the two functional forms. Sequence co-evolution analysis and collective dynamics analysis with elastic network model further confirm the key roles of these residues in Hsp70 NBD dynamics and functions.
Collapse
Affiliation(s)
- Ying Liu
- Department of Computational Biology, School of Medicine, University of Pittsburgh, 3064 BST3, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
18
|
Frenkel-Morgenstern M, Tworowski D, Klipcan L, Safro M. Intra-protein compensatory mutations analysis highlights the tRNA recognition regions in aminoacyl-tRNA synthetases. J Biomol Struct Dyn 2009; 27:115-26. [PMID: 19583438 DOI: 10.1080/07391102.2009.10507302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The aminoacyl-tRNA synthetases (aaRSs) covalently attach amino acids to their corresponding nucleic acid adapter molecules, tRNAs. The interactions in the tRNA-aaRSs complexes are mostly non-specific, and largely electrostatic. Tracing a way of aaRS-tRNA mutual adaptation throughout evolution offers a clearer view of understanding how aaRS-tRNA systems preserve patterns of tRNA recognition and binding. In this study, we used the compensatory mutations analysis to explore adaptation of aaRSs in respond to random mutations that can occur in the tRNA-recognition area. We showed that the frequency of compensatory mutations among residues that belong to the recognition region is 1.75-fold higher than that of the exposed residues. The highest frequencies of compensatory mutations are observed for pairs of charged residues, wherein one residue is located within the tRNA-recognition area, while the second is placed outside of the area, and contributes to the formation of the aaRS electrostatic landscape. Given charged residues are compensated by buried charge residues in more than 60% of the analyzed mutations. The cytoplasmatic and mitochondrial aaRSs preserve similar patterns of compensatory mutations in the tRNA recognition areas. Moreover, we found that mitochondrial aaRSs demonstrate a significant increase in the frequency of compensatory mutations in the area. Our findings shed light on the physical nature of compensatory mutations in aaRSs, thereby keeping unchanged tRNA-recognition patterns.
Collapse
|
19
|
Xu F, Du P, Shen H, Hu H, Wu Q, Xie J, Yu L. Correlated mutation analysis on the catalytic domains of serine/threonine protein kinases. PLoS One 2009; 4:e5913. [PMID: 19526051 PMCID: PMC2690836 DOI: 10.1371/journal.pone.0005913] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Accepted: 05/11/2009] [Indexed: 01/15/2023] Open
Abstract
Background Protein kinases (PKs) have emerged as the largest family of signaling proteins in eukaryotic cells and are involved in every aspect of cellular regulation. Great progresses have been made in understanding the mechanisms of PKs phosphorylating their substrates, but the detailed mechanisms, by which PKs ensure their substrate specificity with their structurally conserved catalytic domains, still have not been adequately understood. Correlated mutation analysis based on large sets of diverse sequence data may provide new insights into this question. Methodology/Principal Findings Statistical coupling, residue correlation and mutual information analyses along with clustering were applied to analyze the structure-based multiple sequence alignment of the catalytic domains of the Ser/Thr PK family. Two clusters of highly coupled sites were identified. Mapping these positions onto the 3D structure of PK catalytic domain showed that these two groups of positions form two physically close networks. We named these two networks as θ-shaped and γ-shaped networks, respectively. Conclusions/Significance The θ-shaped network links the active site cleft and the substrate binding regions, and might participate in PKs recognizing and interacting with their substrates. The γ-shaped network is mainly situated in one side of substrate binding regions, linking the activation loop and the substrate binding regions. It might play a role in supporting the activation loop and substrate binding regions before catalysis, and participate in product releasing after phosphoryl transfer. Our results exhibit significant correlations with experimental observations, and can be used as a guide to further experimental and theoretical studies on the mechanisms of PKs interacting with their substrates.
Collapse
Affiliation(s)
- Feng Xu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| | - Pan Du
- Biomedical Informatics Center, Northwestern University, Chicago, Illinois, United States of America
| | - Hongbo Shen
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Hairong Hu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Qi Wu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Jun Xie
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Long Yu
- Institute of Biomedical Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| |
Collapse
|
20
|
Samsonov SA, Teyra J, Anders G, Pisabarro MT. Analysis of the impact of solvent on contacts prediction in proteins. BMC STRUCTURAL BIOLOGY 2009; 9:22. [PMID: 19368710 PMCID: PMC2676287 DOI: 10.1186/1472-6807-9-22] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 04/15/2009] [Indexed: 11/10/2022]
Abstract
Background The correlated mutations concept is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Approaches based on this concept have been widely used for protein contacts prediction since the 90s. Previously, we have shown that water-mediated interactions play an important role in protein interfaces. We have observed that current "dry" correlated mutations approaches might not properly predict certain interactions in protein interfaces due to the fact that they are water-mediated. Results The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose we use linear combinations of the predictions obtained by the application of two different similarity matrices: a standard "dry" similarity matrix (DRY) and a "wet" similarity matrix (WET) derived from all water-mediated protein interfacial interactions in the PDB. We analyze two datasets containing 50 domains and 10 domain pairs from PFAM and compare the results obtained by using a combination of both matrices. We find that for both intra- and interdomain contacts predictions the introduction of a combination of a "wet" and a "dry" similarity matrix improves the predictions in comparison to the "dry" one alone. Conclusion Our analysis, despite the complexity of its possible general applicability, opens up that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches.
Collapse
|
21
|
Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR. Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. ACTA ACUST UNITED AC 2009; 25:1264-70. [PMID: 19289446 DOI: 10.1093/bioinformatics/btp149] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS We propose a novel hidden Markov model (HMM)-based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 x L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature. AVAILABILITY http://predictioncenter.org/Services/FragHMMent/.
Collapse
Affiliation(s)
- Patrik Björkholm
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
Current treatment for chronic hepatitis C is expensive, is often accompanied by burdensome side effects, and, sadly, fails in almost half of cases. The ability to predict such failures prior to treatment could save a great deal of pain and expense for the patient with HCV. In this issue of the JCI, Aurora and colleagues describe the development of genetic markers predictive of treatment response based on a study of viral sequence variation (see the related article beginning on page 225). Genome-wide covariation analyses of pretreatment virus sequences from 94 patients showed distinct patterns of mutations strongly associated with the ultimate success or failure of treatment. Such analyses suggest markers predictive of response to therapy and may lead to new insights into the underlying biology of hepatitis C.
Collapse
Affiliation(s)
- Thomas S Oh
- Center for the Study of Hepatitis C, The Rockefeller University, New York, NY 10065, USA
| | | |
Collapse
|
23
|
Ashkenazy H, Unger R, Kliger Y. Optimal data collection for correlated mutation analysis. Proteins 2009; 74:545-55. [PMID: 18655065 DOI: 10.1002/prot.22168] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The main objective of correlated mutation analysis (CMA) is to predict intraprotein residue-residue interactions from sequence alone. Despite considerable progress in algorithms and computer capabilities, the performance of CMA methods remains quite low. Here we examine whether, and to what extent, the quality of CMA methods depends on the sequences that are included in the multiple sequence alignment (MSA). The results revealed a strong correlation between the number of homologs in an MSA and CMA prediction strength. Furthermore, many of the current methods include only orthologs in the MSA, we found that it is beneficial to include both orthologs and paralogs in the MSA. Remarkably, even remote homologs contribute to the improved accuracy. Based on our findings we put forward an automated data collection procedure, with a minimal coverage of 50% between the query protein and its orthologs and paralogs. This procedure improves accuracy even in the absence of manual curation. In this era of massive sequencing and exploding sequence data, our results suggest that correlated mutation-based methods have not reached their inherent performance limitations and that the role of CMA in structural biology is far from being fulfilled.
Collapse
|
24
|
Aurora R, Donlin MJ, Cannon NA, Tavis JE. Genome-wide hepatitis C virus amino acid covariance networks can predict response to antiviral therapy in humans. J Clin Invest 2008; 119:225-36. [PMID: 19104147 DOI: 10.1172/jci37085] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 10/22/2008] [Indexed: 12/17/2022] Open
Abstract
Hepatitis C virus (HCV) is a common RNA virus that causes hepatitis and liver cancer. Infection is treated with IFN-alpha and ribavirin, but this expensive and physically demanding therapy fails in half of patients. The genomic sequences of independent HCV isolates differ by approximately 10%, but the effects of this variation on the response to therapy are unknown. To address this question, we analyzed amino acid covariance within the full viral coding region of pretherapy HCV sequences from 94 participants in the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C (Virahep-C) clinical study. Covarying positions were common and linked together into networks that differed by response to therapy. There were 3-fold more hydrophobic amino acid pairs in HCV from nonresponding patients, and these hydrophobic interactions were predicted to contribute to failure of therapy by stabilizing viral protein complexes. Using our analysis to detect patterns within the networks, we could predict the outcome of therapy with greater than 95% coverage and 100% accuracy, raising the possibility of a prognostic test to reduce therapeutic failures. Furthermore, the hub positions in the networks are attractive antiviral targets because of their genetic linkage with many other positions that we predict would suppress evolution of resistant variants. Finally, covariance network analysis could be applicable to any virus with sufficient genetic variation, including most human RNA viruses.
Collapse
Affiliation(s)
- Rajeev Aurora
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA.
| | | | | | | |
Collapse
|
25
|
Miller CS, Eisenberg D. Using inferred residue contacts to distinguish between correct and incorrect protein models. ACTA ACUST UNITED AC 2008; 24:1575-82. [PMID: 18511466 PMCID: PMC2638260 DOI: 10.1093/bioinformatics/btn248] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The de novo prediction of 3D protein structure is enjoying a period of dramatic improvements. Often, a remaining difficulty is to select the model closest to the true structure from a group of low-energy candidates. To what extent can inter-residue contact predictions from multiple sequence alignments, information which is orthogonal to that used in most structure prediction algorithms, be used to identify those models most similar to the native protein structure? Results: We present a Bayesian inference procedure to identify residue pairs that are spatially proximal in a protein structure. The method takes as input a multiple sequence alignment, and outputs an accurate posterior probability of proximity for each residue pair. We exploit a recent metagenomic sequencing project to create large, diverse and informative multiple sequence alignments for a test set of 1656 known protein structures. The method infers spatially proximal residue pairs in this test set with good accuracy: top-ranked predictions achieve an average accuracy of 38% (for an average 21-fold improvement over random predictions) in cross-validation tests. Notably, the accuracy of predicted 3D models generated by a range of structure prediction algorithms strongly correlates with how well the models satisfy probable residue contacts inferred via our method. This correlation allows for confident rejection of incorrect structural models. Availability: An implementation of the method is freely available at http://www.doe-mbi.ucla.edu/services Contact:david@mbi.ucla.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher S Miller
- UCLA-DOE Institute for Genomics & Proteomics, Molecular Biology Institute, Box 951570, UCLA, Los Angeles, CA 90095, USA
| | | |
Collapse
|
26
|
Helles G. A comparative study of the reported performance of ab initio protein structure prediction algorithms. J R Soc Interface 2008; 5:387-96. [PMID: 18077243 DOI: 10.1098/rsif.2007.1278] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings--protein representation and fragment assembly--were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.
| |
Collapse
|
27
|
Liu Y, Eyal E, Bahar I. Analysis of correlated mutations in HIV-1 protease using spectral clustering. ACTA ACUST UNITED AC 2008; 24:1243-50. [PMID: 18375964 PMCID: PMC2373918 DOI: 10.1093/bioinformatics/btn110] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Motivation: The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. Results: HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids. Contact:bahar@ccbb.pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ying Liu
- Department of Computational Biology, School of Medicine, University of Pittsburgh, PA 15232, USA
| | | | | |
Collapse
|
28
|
Merkl R, Zwick M. H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 2008; 9:151. [PMID: 18366663 PMCID: PMC2323388 DOI: 10.1186/1471-2105-9-151] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 03/18/2008] [Indexed: 11/15/2022] Open
Abstract
Background A multiple sequence alignment (MSA) generated for a protein can be used to characterise residues by means of a statistical analysis of single columns. In addition to the examination of individual positions, the investigation of co-variation of amino acid frequencies offers insights into function and evolution of the protein and residues. Results We introduce conn(k), a novel parameter for the characterisation of individual residues. For each residue k, conn(k) is the number of most extreme signals of co-evolution. These signals were deduced from a normalised mutual information (MI) value U(k, l) computed for all pairs of residues k, l. We demonstrate that conn(k) is a more robust indicator than an individual MI-value for the prediction of residues most plausibly important for the evolution of a protein. This proposition was inferred by means of statistical methods. It was further confirmed by the analysis of several proteins. A server, which computes conn(k)-values is available at . Conclusion The algorithms H2r, which analyses MSAs and computes conn(k)-values, characterises a specific class of residues. In contrast to strictly conserved ones, these residues possess some flexibility in the composition of side chains. However, their allocation is sensibly balanced with several other positions, as indicated by conn(k).
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, D-93040 Regensburg, Germany.
| | | |
Collapse
|
29
|
Frenkel-Morgenstern M, Magid R, Eyal E, Pietrokovski S. Refining intra-protein contact prediction by graph analysis. BMC Bioinformatics 2007; 8 Suppl 5:S6. [PMID: 17570865 PMCID: PMC1892094 DOI: 10.1186/1471-2105-8-s5-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence. Results We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred) was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%. Conclusion Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.
Collapse
Affiliation(s)
| | - Rachel Magid
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Eran Eyal
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Shmuel Pietrokovski
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| |
Collapse
|
30
|
Eyal E, Pietrokovski S, Bahar I. Rapid assessment of correlated amino acids from pair-to-pair (P2P) substitution matrices. Bioinformatics 2007; 23:1837-9. [PMID: 17496318 DOI: 10.1093/bioinformatics/btm256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Identification of correlated amino acids in proteins has been a topic of broad interest in view of its functional implications and importance in protein design. A new set of pair-to-pair (P2P) substitution matrices for amino acids was recently introduced as a useful tool for inferring information on such correlated sites. We present a website developed for automated application of these matrices for analysis of query sequences. The site offers options for graphical analysis of correlations, as well as visualization of correlated amino acids on representative, structurally characterized, members of the examined family of sequences. AVAILABILITY http://www.ccbb.pitt.edu/p2p.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Computational Biology, School of Medicine, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15213, USA.
| | | | | |
Collapse
|