Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Vries JK, Liu X, Bahar I. The relationship between n-gram patterns and protein secondary structure. Proteins 2007;68:830-8. [PMID: 17523186 DOI: 10.1002/prot.21480] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Number

Cited by Other Article(s)

Malamon JS. DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences. Heliyon 2024;10:e36914. [PMID: 39281454 PMCID: PMC11399624 DOI: 10.1016/j.heliyon.2024.e36914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 09/18/2024] Open

Mizuno Y, Nakasone W, Nakamura M, Otaki JM. In Silico and In Vitro Evaluation of the Molecular Mimicry of the SARS-CoV-2 Spike Protein by Common Short Constituent Sequences (cSCSs) in the Human Proteome: Toward Safer Epitope Design for Vaccine Development. Vaccines (Basel) 2024;12:539. [PMID: 38793790 PMCID: PMC11125730 DOI: 10.3390/vaccines12050539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/12/2024] [Accepted: 05/12/2024] [Indexed: 05/26/2024] Open

Kondo R, Kasahara K, Takahashi T. Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank. Biophys Physicobiol 2022;19:1-12. [PMID: 35532457 PMCID: PMC8926306 DOI: 10.2142/biophysico.bppb-v19.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/02/2022] [Indexed: 12/05/2022] Open

Endo S, Motomura K, Tsuhako M, Kakazu Y, Nakamura M, M. Otaki J. Search for Human-Specific Proteins Based on Availability Scores of Short Constituent Sequences: Identification of a WRWSH Protein in Human Testis. Comput Biol Chem 2020. [DOI: 10.5772/intechopen.89653] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Konopka BM, Marciniak M, Dyrka W. Quantiprot - a Python package for quantitative analysis of protein sequences. BMC Bioinformatics 2017;18:339. [PMID: 28716000 PMCID: PMC5512976 DOI: 10.1186/s12859-017-1751-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 07/05/2017] [Indexed: 11/17/2022] Open

Masso M, Vaisman II. Sequence and structure based models of HIV-1 protease and reverse transcriptase drug resistance. BMC Genomics 2013;14 Suppl 4:S3. [PMID: 24268064 PMCID: PMC3849442 DOI: 10.1186/1471-2164-14-s4-s3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Successful management of chronic human immunodeficiency virus type 1 (HIV-1) infection with a cocktail of antiretroviral medications can be negatively affected by the presence of drug resistant mutations in the viral targets. These targets include the HIV-1 protease (PR) and reverse transcriptase (RT) proteins, for which a number of inhibitors are available on the market and routinely prescribed. Protein mutational patterns are associated with varying degrees of resistance to their respective inhibitors, with extremes that can range from continued susceptibility to cross-resistance across all drugs.

RESULTS

Here we implement statistical learning algorithms to develop structure- and sequence-based models for systematically predicting the effects of mutations in the PR and RT proteins on resistance to each of eight and eleven inhibitors, respectively. Employing a four-body statistical potential, mutant proteins are represented as feature vectors whose components quantify relative environmental perturbations at amino acid residue positions in the respective target structures upon mutation. Two approaches are implemented in developing sequence-based models, based on use of either relative frequencies or counts of n-grams, to generate vectors for representing mutant proteins. To the best of our knowledge, this is the first reported study on structure- and sequence-based predictive models of HIV-1 PR and RT drug resistance developed by implementing a four-body statistical potential and n-grams, respectively, to generate mutant attribute vectors. Performance of the learning methods is evaluated on the basis of tenfold cross-validation, using previously assayed and publicly available in vitro data relating mutational patterns in the targets to quantified inhibitor susceptibility changes.

CONCLUSION

Overall performance results are competitive with those of a previously published study utilizing a sequence-based strategy, while our structure- and sequence-based models provide orthogonal and complementary prediction methodologies, respectively. In a novel application, we describe a technique for identifying every possible pair of RT inhibitors as either potentially effective together as part of a cocktail, or a combination that is to be avoided.

Collapse

Motomura K, Nakamura M, Otaki JM. A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package. Comput Struct Biotechnol J 2013;5:e201302010. [PMID: 24688703 PMCID: PMC3962227 DOI: 10.5936/csbj.201302010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 02/07/2013] [Accepted: 02/08/2013] [Indexed: 11/23/2022] Open

Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM. Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach. PLoS One 2012;7:e50039. [PMID: 23185527 PMCID: PMC3503725 DOI: 10.1371/journal.pone.0050039] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 10/15/2012] [Indexed: 11/19/2022] Open

Abstract

The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or "words". We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and "compressed" English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., "key words") and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.

Collapse

Karaçali B. Hierarchical motif vectors for prediction of functional sites in amino acid sequences using quasi-supervised learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1432-1441. [PMID: 22585139 DOI: 10.1109/tcbb.2012.68] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Ganapathiraju MK, Mitchell AD, Thahir M, Motwani K, Ananthasubramanian S. Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences. J Bioinform Comput Biol 2012;10:1250016. [PMID: 22817111 DOI: 10.1142/s0219720012500163] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Zhang KX, Ouellette BFF. GAIA: a gram-based interaction analysis tool--an approach for identifying interacting domains in yeast. BMC Bioinformatics 2009;10 Suppl 1:S60. [PMID: 19208164 PMCID: PMC2648738 DOI: 10.1186/1471-2105-10-s1-s60] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract

Background

Protein-Protein Interactions (PPIs) play important roles in many biological functions. Protein domains, which are defined as independently folding structural blocks of proteins, physically interact with each other to perform these biological functions. Therefore, the identification of Domain-Domain Interactions (DDIs) is of great biological interests because it is generally accepted that PPIs are mediated by DDIs. As a result, much effort has been put on the prediction of domain pair interactions based on computational methods. Many DDI prediction tools using PPIs network and domain evolution information have been reported. However, tools that combine the primary sequences, domain annotations, and structural annotations of proteins have not been evaluated before.

Results

In this study, we report a novel approach called Gram-bAsed Interaction Analysis (GAIA). GAIA extracts peptide segments that are composed of fixed length of continuous amino acids, called n-grams (where n is the number of amino acids), from the annotated domain and DDI data set in Saccharomyces cerevisiae (budding yeast) and identifies a list of n-grams that may contribute to DDIs and PPIs based on the frequencies of their appearance. GAIA also reports the coordinate position of gram pairs on each interacting domain pair. We demonstrate that our approach improves on other DDI prediction approaches when tested against a gold-standard data set and achieves a true positive rate of 82% and a false positive rate of 21%. We also identify a list of 4-gram pairs that are significantly over-represented in the DDI data set and may mediate PPIs.

Conclusion

GAIA represents a novel and reliable way to predict DDIs that mediate PPIs. Our results, which show the localizations of interacting grams/hotspots, provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactome of cells.

Collapse

Vries JK, Liu X. Subfamily specific conservation profiles for proteins based on n-gram patterns. BMC Bioinformatics 2008;9:72. [PMID: 18234090 PMCID: PMC2267698 DOI: 10.1186/1471-2105-9-72] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022] Open