Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yu YK, Altschul SF. The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 2004;21:902-11. [PMID: 15509610 DOI: 10.1093/bioinformatics/bti070] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Yu YK, Altschul SF. The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 2004;21:902-11. [PMID: 15509610 DOI: 10.1093/bioinformatics/bti070] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Paila U, Kondam R, Ranjan A. Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome. Nucleic Acids Res 2008;36:6664-75. [PMID: 18948281 PMCID: PMC2588515 DOI: 10.1093/nar/gkn635] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Przybylski D, Rost B. Powerful fusion: PSI-BLAST and consensus sequences. Bioinformatics 2008;24:1987-93. [PMID: 18678588 PMCID: PMC2577777 DOI: 10.1093/bioinformatics/btn384] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Abstract

Motivation: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences.

Results: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences.

Availability:http://www.rostlab.org/services/consensus/

Contact:dariusz@mit.edu

Collapse

Brick K, Pizzi E. A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins. BMC Bioinformatics 2008;9:236. [PMID: 18485187 PMCID: PMC2408606 DOI: 10.1186/1471-2105-9-236] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2008] [Accepted: 05/16/2008] [Indexed: 11/15/2022] Open

Abstract

Background

The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased Plasmodium proteomes.

Results

We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of P. falciparum and P. yoelii proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two Plasmodium species and to classify members of the P. falciparum RIFIN/STEVOR family.

Conclusion

We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.

Collapse

Coronado JE, Mneimneh S, Epstein SL, Qiu WG, Lipke PN. Conserved processes and lineage-specific proteins in fungal cell wall evolution. EUKARYOTIC CELL 2007;6:2269-77. [PMID: 17951517 PMCID: PMC2168262 DOI: 10.1128/ec.00044-07] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 10/03/2007] [Indexed: 11/20/2022]

Goonesekere NCW, Lee B. Context-specific amino acid substitution matrices and their use in the detection of protein homologs. Proteins 2007;71:910-9. [DOI: 10.1002/prot.21775] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Coronado JE, Attie O, Epstein SL, Qiu WG, Lipke PN. Composition-modified matrices improve identification of homologs of saccharomyces cerevisiae low-complexity glycoproteins. EUKARYOTIC CELL 2006;5:628-37. [PMID: 16607010 PMCID: PMC1459670 DOI: 10.1128/ec.5.4.628-637.2006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Yu YK, Gertz EM, Agarwala R, Schäffer AA, Altschul SF. Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res 2006;34:5966-73. [PMID: 17068079 PMCID: PMC1635310 DOI: 10.1093/nar/gkl731] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bulka B, desJardins M, Freeland SJ. An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices. BMC Bioinformatics 2006;7:329. [PMID: 16817972 PMCID: PMC1524819 DOI: 10.1186/1471-2105-7-329] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Accepted: 07/03/2006] [Indexed: 11/26/2022] Open

Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006;16:374-84. [PMID: 16713709 DOI: 10.1016/j.sbi.2006.05.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/22/2006] [Accepted: 05/08/2006] [Indexed: 10/24/2022]

Reddy DA, Prasad BVLS, Mitra CK. Comparative analysis of core promoter region: information content from mono and dinucleotide substitution matrices. Comput Biol Chem 2006;30:58-62. [PMID: 16321573 DOI: 10.1016/j.compbiolchem.2005.10.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 10/04/2005] [Accepted: 10/04/2005] [Indexed: 10/25/2022]

Lu Y, Freeland S. On the evolution of the standard amino-acid alphabet. Genome Biol 2006;7:102. [PMID: 16515719 PMCID: PMC1431706 DOI: 10.1186/gb-2006-7-1-102] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK. Protein database searches using compositionally adjusted substitution matrices. FEBS J 2005;272:5101-9. [PMID: 16218944 PMCID: PMC1343503 DOI: 10.1111/j.1742-4658.2005.04945.x] [Citation(s) in RCA: 740] [Impact Index Per Article: 38.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Sardiu ME, Alves G, Yu YK. Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005;72:061917. [PMID: 16485984 DOI: 10.1103/physreve.72.061917] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2005] [Revised: 10/21/2005] [Indexed: 05/06/2023]

Abstract

Sequence alignment is one of the most important bioinformatics tools for modern molecular biology. The statistical characterization of gapped alignment scores has been a long-standing problem in sequence alignment research. Using a variant of the directed path in random media model, we investigate the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared. Such statistics are used to distinguish accidental similarity due to compositional similarity from biologically significant similarity. To accommodate the compositional bias, we introduce an extra parameter p indicating the probability for positive matching scores to occur. When p is small, a high scoring alignment obviously cannot come from compositional similarity. When p is large, the highest scoring point within a global alignment tends to be close to the end of both sequences, in which case we say the system percolates. By applying finite-size scaling theory on percolating probability functions of various sizes (sequence lengths), the critical p at infinite size is obtained. For alignment of length t, the fact that the score fluctuation grows as chi(t)1/3 is confirmed upon investigating the scaling form of the alignment score. Using the Kolmogorov-Smirnov statistics test, we show that the random variable , if properly scaled, follows the Tracy-Widom distributions: Gaussian orthogonal ensemble for p slightly larger than pc and Gaussian unitary ensemble for larger p. Although these results deepen our understanding of the distribution of alignment scores, the use of these results in practical applications remains somewhat heuristic and needs to be further developed. Nevertheless, the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.

Collapse

Olsen R, Loomis WF. A collection of amino acid replacement matrices derived from clusters of orthologs. J Mol Evol 2005;61:659-65. [PMID: 16245010 DOI: 10.1007/s00239-005-0060-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2005] [Accepted: 06/04/2005] [Indexed: 12/01/2022]