1
|
Thangappan J, Madan B, Wu S, Lee SG. Measuring the Conformational Distance of GPCR-related Proteins Using a Joint-based Descriptor. Sci Rep 2017; 7:15205. [PMID: 29123217 PMCID: PMC5680341 DOI: 10.1038/s41598-017-15513-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 10/27/2017] [Indexed: 01/19/2023] Open
Abstract
Joint-based descriptor is a new level of macroscopic descriptor for protein structure using joints of secondary structures as a basic element. Here, we propose how the joint-based descriptor can be applied to examine the conformational distances or differences of transmembrane (TM) proteins. Specifically, we performed three independent studies that measured the global and conformational distances between GPCR A family and its related structures. First, the conformational distances of GPCR A family and other 7TM proteins were evaluated. This provided the information on the distant and close families or superfamilies to GPCR A family and permitted the identification of conserved local conformations. Second, computational models of GPCR A family proteins were validated, which enabled us to estimate how much they reproduce the native conformation of GPCR A proteins at global and local conformational level. Finally, the conformational distances between active and inactive states of GPCR proteins were estimated, which identified the difference of local conformation. The proposed macroscopic joint-based approach is expected to allow us to investigate structural features, evolutionary relationships, computational models and conformational changes of TM proteins in a more simplistic manner.
Collapse
Affiliation(s)
- Jayaraman Thangappan
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea
| | - Bharat Madan
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea
| | - Sangwook Wu
- Department of Physics, Pukyong National University, Busan, 608-737, Republic of Korea.
| | - Sun-Gu Lee
- Department of Chemical Engineering, Pusan National University, Busan, 609-735, Republic of Korea.
| |
Collapse
|
2
|
Abstract
Profile ALIgNmEnt (PRALINE) is a versatile multiple sequence alignment toolkit. In its main alignment protocol, PRALINE follows the global progressive alignment algorithm. It provides various alignment optimization strategies to address the different situations that call for protein multiple sequence alignment: global profile preprocessing, homology-extended alignment, secondary structure-guided alignment, and transmembrane aware alignment. A number of combinations of these strategies are enabled as well. PRALINE is accessible via the online server http://www.ibi.vu.nl/programs/PRALINEwww/. The server facilitates extensive visualization possibilities aiding the interpretation of alignments generated, which can be written out in pdf format for publication purposes. PRALINE also allows the sequences in the alignment to be represented in a dendrogram to show their mutual relationships according to the alignment. The chapter ends with a discussion of various issues occurring in multiple sequence alignment.
Collapse
Affiliation(s)
- Punto Bawono
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
3
|
Wang Y, Sadreyev RI, Grishin NV. PROCAIN: protein profile comparison with assisting information. Nucleic Acids Res 2009; 37:3522-30. [PMID: 19357092 PMCID: PMC2699500 DOI: 10.1093/nar/gkp212] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of 'vertical' MSA context (substitution constraints at individual sequence positions) and 'horizontal' context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.
Collapse
Affiliation(s)
- Yong Wang
- Biomedical Engineering Program, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | |
Collapse
|
4
|
Sadreyev RI, Grishin NV. Accurate statistical model of comparison between multiple sequence alignments. Nucleic Acids Res 2008; 36:2240-8. [PMID: 18285364 PMCID: PMC2367703 DOI: 10.1093/nar/gkn065] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Comparison of multiple protein sequence alignments (MSA) reveals unexpected evolutionary relations between protein families and leads to exciting predictions of spatial structure and function. The power of MSA comparison critically depends on the quality of statistical model used to rank the similarities found in a database search, so that biologically relevant relationships are discriminated from spurious connections. Here, we develop an accurate statistical description of MSA comparison that does not originate from conventional models of single sequence comparison and captures essential features of protein families. As a final result, we compute E-values for the similarity between any two MSA using a mathematical function that depends on MSA lengths and sequence diversity. To develop these estimates of statistical significance, we first establish a procedure for generating realistic alignment decoys that reproduce natural patterns of sequence conservation dictated by protein secondary structure. Second, since similarity scores between these alignments do not follow the classic Gumbel extreme value distribution, we propose a novel distribution that yields statistically perfect agreement with the data. Third, we apply this random model to database searches and show that it surpasses conventional models in the accuracy of detecting remote protein similarities.
Collapse
|
5
|
Qi Y, Sadreyev RI, Wang Y, Kim BH, Grishin NV. A comprehensive system for evaluation of remote sequence similarity detection. BMC Bioinformatics 2007; 8:314. [PMID: 17725841 PMCID: PMC2031906 DOI: 10.1186/1471-2105-8-314] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 08/28/2007] [Indexed: 11/25/2022] Open
Abstract
Background Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality. Results With the aim of designing such a method, we (i) select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii) develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode) or to the aligned region only (local mode). Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development. Conclusion The presented benchmark provides a new tool for a statistically unbiased assessment of methods for remote sequence similarity detection, from various complementary perspectives. This tool should be useful both for users choosing the best method for a given purpose, and for developers designing new, more powerful methods. The benchmark set, reference alignments, and evaluation codes can be downloaded from .
Collapse
Affiliation(s)
- Yuan Qi
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ruslan I Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Yong Wang
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Bong-Hyun Kim
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| |
Collapse
|
6
|
Ohlson T, Aggarwal V, Elofsson A, MacCallum RM. Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps. BMC Bioinformatics 2006; 7:357. [PMID: 16869963 PMCID: PMC1562450 DOI: 10.1186/1471-2105-7-357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 07/25/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.
Collapse
Affiliation(s)
- Tomas Ohlson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Varun Aggarwal
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Division of Cell and Molecular Biology, Imperial College London, London, UK
| |
Collapse
|
7
|
Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006; 16:374-84. [PMID: 16713709 DOI: 10.1016/j.sbi.2006.05.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/22/2006] [Accepted: 05/08/2006] [Indexed: 10/24/2022]
Abstract
Sequence comparison is a major step in the prediction of protein structure from existing templates in the Protein Data Bank. The identification of potentially remote homologues to be used as templates for modeling target sequences of unknown structure and their accurate alignment remain challenges, despite many years of study. The most recent advances have been in combining as many sources of information as possible--including amino acid variation in the form of profiles or hidden Markov models for both the target and template families, known and predicted secondary structures of the template and target, respectively, the combination of structure alignment for distant homologues and sequence alignment for close homologues to build better profiles, and the anchoring of certain regions of the alignment based on existing biological data. Newer technologies have been applied to the problem, including the use of support vector machines to tackle the fold classification problem for a target sequence and the alignment of hidden Markov models. Finally, using the consensus of many fold recognition methods, whether based on profile-profile alignments, threading or other approaches, continues to be one of the most successful strategies for both recognition and alignment of remote homologues. Although there is still room for improvement in identification and alignment methods, additional progress may come from model building and refinement methods that can compensate for large structural changes between remotely related targets and templates, as well as for regions of misalignment.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
| |
Collapse
|
8
|
Ku CJ, Yona G. The distance-profile representation and its application to detection of distantly related protein families. BMC Bioinformatics 2005; 6:282. [PMID: 16316461 PMCID: PMC1345692 DOI: 10.1186/1471-2105-6-282] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Accepted: 11/29/2005] [Indexed: 11/11/2022] Open
Abstract
Background Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring the similarity between proteins through sequence or structural alignment. However, these methods do not exploit collective aspects of the protein space and the computed scores are often noisy and frequently fail to recognize distantly related protein families. Results We describe an algorithm that improves over the state of the art in homology detection by utilizing global information on the proximity of entities in the protein space. Our method relies on a vectorial representation of proteins and protein families and uses structure-specific association measures between proteins and template structures to form a high-dimensional feature vector for each query protein. These vectors are then processed and transformed to sparse feature vectors that are treated as statistical fingerprints of the query proteins. The new representation induces a new metric between proteins measured by the statistical difference between their corresponding probability distributions. Conclusion Using several performance measures we show that the new tool considerably improves the performance in recognizing distant homologies compared to existing approaches such as PSIBLAST and FUGUE.
Collapse
Affiliation(s)
- Chin-Jen Ku
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Golan Yona
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| |
Collapse
|
9
|
Simossis VA, Heringa J. PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 2005; 33:W289-94. [PMID: 15980472 PMCID: PMC1160151 DOI: 10.1093/nar/gki390] [Citation(s) in RCA: 364] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2005] [Revised: 03/10/2005] [Accepted: 03/10/2005] [Indexed: 11/14/2022] Open
Abstract
PRofile ALIgNEment (PRALINE) is a fully customizable multiple sequence alignment application. In addition to a number of available alignment strategies, PRALINE can integrate information from database homology searches to generate a homology-extended multiple alignment. PRALINE also provides a choice of seven different secondary structure prediction programs that can be used individually or in combination as a consensus for integrating structural information into the alignment process. The program can be used through two separate interfaces: one has been designed to cater to more advanced needs of researchers in the field, and the other for standard construction of high confidence alignments. The web-based output is designed to facilitate the comprehensive visualization of the generated alignments by means of five default colour schemes based on: residue type, position conservation, position reliability, residue hydrophobicity and secondary structure, depending on the options set. A user can also define a custom colour scheme by selecting which colour will represent one or more amino acids in the alignment. All generated alignments are also made available in the PDF format for easy figure generation for publications. The grouping of sequences, on which the alignment is based, can also be visualized as a dendrogram. PRALINE is available at http://ibivu.cs.vu.nl/programs/pralinewww/.
Collapse
Affiliation(s)
- V. A. Simossis
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Heringa
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
- Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth & Life Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|
10
|
Simossis VA, Kleinjung J, Heringa J. Homology-extended sequence alignment. Nucleic Acids Res 2005; 33:816-24. [PMID: 15699183 PMCID: PMC549400 DOI: 10.1093/nar/gki233] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2004] [Revised: 01/05/2005] [Accepted: 01/20/2005] [Indexed: 11/15/2022] Open
Abstract
We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.
Collapse
Affiliation(s)
- V. A. Simossis
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Kleinjung
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Heringa
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
- Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth and Life Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|