Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Osmanbeyoglu HU, Ganapathiraju MK. N-gram analysis of 970 microbial organisms reveals presence of biological language models. BMC Bioinformatics 2011;12:12. [PMID: 21219653 PMCID: PMC3027111 DOI: 10.1186/1471-2105-12-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open

For:	Osmanbeyoglu HU, Ganapathiraju MK. N-gram analysis of 970 microbial organisms reveals presence of biological language models. BMC Bioinformatics 2011;12:12. [PMID: 21219653 PMCID: PMC3027111 DOI: 10.1186/1471-2105-12-12] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Jia R, Guo X, Liu H, Zhao F, Fan Z, Wang M, Sui J, Yin B, Wang Z, Wang Z. Analysis of Staged Features of Gastritis-Cancer Transformation and Identification of Potential Biomarkers in Gastric Cancer. J Inflamm Res 2022;15:6857-6868. [PMID: 36597437 PMCID: PMC9805741 DOI: 10.2147/jir.s390448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/16/2022] [Indexed: 12/29/2022] Open

Abstract

Purpose

This work aims to elucidate the staged characteristics during gastritis-cancer transformation based on the transcriptome and use bioinformatics to identify potential biomarkers.

Patients and Methods

We collected blood samples from healthy controls, patients with non-atrophic gastritis, atrophic gastritis, and gastric cancer, and tissue samples from patients with gastric cancer, respectively. RNA-seq was then performed. Differentially expressed genes, weighted gene co-expression network analysis and functional enrichment analysis were used to illustrate the staged characteristics of gastritis-cancer transformation. Genes with diagnostic potential were further identified in combination with ROC analysis. Additionally, for the gastric cancer stage, the gene expression of the collected tissue transcriptome was validated using the Cancer Genome Atlas and combined with survival analysis to identify potential biomarkers.

Results

The 279 overlapping genes among the differentially expressed genes of NAG, AG and CA indicated that the expression characteristics of different stages were different. However, the 2243 overlapping genes of differential genes between adjacent stages indicated a certain consistency in the expression characteristics of stage development. The core functions of different stages have strong stage specificity and basically have no similarities. Twenty genes with diagnostic potential for AG or CA were obtained, respectively, and no gene could effectively differentiate NAG samples. Thirty-four potential biomarkers for gastric cancer were identified, of which 14 genes have not been reported, including ACTG2, C1QTNF2, NCAPH and SORCS1.

Conclusion

There may be a stable development mechanism in the process of gastritis-carcinoma transformation, resulting in the differences in the performance of each stage. The newly discovered staging features and potential biomarkers in this work can provide references for related research.

Collapse

Affiliation(s)

Ruikang Jia The Affiliated Hospital and the Medical College, Hebei University of Engineering, Handan, Hebei Province, People’s Republic of China,Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Xiaohui Guo Handan Central Hospital, Handan, Hebei Province, People’s Republic of China
Huiyun Liu Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Feiyue Zhao Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Zhibin Fan Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Menglei Wang Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Jianliang Sui The Affiliated Hospital and the Medical College, Hebei University of Engineering, Handan, Hebei Province, People’s Republic of China,Key Laboratory of Chinese Medicine for Gastric Medicine, Hebei Province, Handan Pharmaceutical Co. LTD, Handan, People’s Republic of China
Binghua Yin Handan Central Hospital, Handan, Hebei Province, People’s Republic of China
Zhihong Wang People’s Hospital of Huangzhou District, Huanggang City, People’s Republic of China
Zhen Wang The Affiliated Hospital and the Medical College, Hebei University of Engineering, Handan, Hebei Province, People’s Republic of China,Key Laboratory of Metabolism and Molecular Medicine, Ministry of Education, and Department of Biochemistry and Molecular Biology, Fudan University Shanghai Medical College, Shanghai, People’s Republic of China,Correspondence: Zhen Wang, The Affiliated Hospital and the Medical College, Hebei University of Engineering, Handan, Hebei Province, People’s Republic of China, Tel +8619903200632, Email

Collapse

van Bragt JJ, Brinkman P, de Vries R, Vijverberg SJ, Weersink EJ, Haarman EG, de Jongh FH, Kester S, Lucas A, in 't Veen JC, Sterk PJ, Bel EH, Maitland-van der Zee AH. Identification of recent exacerbations in COPD patients by electronic nose. ERJ Open Res 2020;6:00307-2020. [PMID: 33447611 PMCID: PMC7792783 DOI: 10.1183/23120541.00307-2020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 09/28/2020] [Indexed: 12/17/2022] Open

Abstract

Molecular profiling of exhaled breath by electronic nose (eNose) might be suitable as a noninvasive tool that can help in monitoring of clinically unstable COPD patients. However, supporting data are still lacking. Therefore, as a first step, this study aimed to determine the accuracy of exhaled breath analysis by eNose to identify COPD patients who recently exacerbated, defined as an exacerbation in the previous 3 months. Data for this exploratory, cross-sectional study were extracted from the multicentre BreathCloud cohort. Patients with a physician-reported diagnosis of COPD (n=364) on maintenance treatment were included in the analysis. Exacerbations were defined as a worsening of respiratory symptoms requiring treatment with oral corticosteroids, antibiotics or both. Data analysis involved eNose signal processing, ambient air correction and statistics based on principal component (PC) analysis followed by linear discriminant analysis (LDA). Before analysis, patients were randomly divided into a training (n=254) and validation (n=110) set. In the training set, LDA based on PCs 1-4 discriminated between patients with a recent exacerbation or no exacerbation with high accuracy (receiver operating characteristic (ROC)-area under the curve (AUC)=0.98, 95% CI 0.97-1.00). This high accuracy was confirmed in the validation set (AUC=0.98, 95% CI 0.94-1.00). Smoking, health status score, use of inhaled corticosteroids or vital capacity did not influence these results. Exhaled breath analysis by eNose can discriminate with high accuracy between COPD patients who experienced an exacerbation within 3 months prior to measurement and those who did not. This suggests that COPD patients who recently exacerbated have their own exhaled molecular fingerprint that could be valuable for monitoring purposes.

Collapse

Delibaş E, Arslan A, Şeker A, Diri B. A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up. J Mol Graph Model 2020;100:107693. [PMID: 32805559 DOI: 10.1016/j.jmgm.2020.107693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 06/15/2020] [Accepted: 07/06/2020] [Indexed: 11/17/2022]

Genetic evaluation of the Iberian lynx ex situ conservation programme. Heredity (Edinb) 2019;123:647-661. [PMID: 30952964 DOI: 10.1038/s41437-019-0217-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/08/2019] [Accepted: 03/11/2019] [Indexed: 11/09/2022] Open

Cai S, Palazoglu A, Zhang L, Hu J. Process alarm prediction using deep learning and word embedding methods. ISA TRANSACTIONS 2019;85:274-283. [PMID: 30401489 DOI: 10.1016/j.isatra.2018.10.032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 09/21/2018] [Accepted: 10/19/2018] [Indexed: 06/08/2023]

Kleinman-Ruiz D, Martínez-Cruz B, Soriano L, Lucena-Perez M, Cruz F, Villanueva B, Fernández J, Godoy JA. Novel efficient genome-wide SNP panels for the conservation of the highly endangered Iberian lynx. BMC Genomics 2017;18:556. [PMID: 28732460 PMCID: PMC5522595 DOI: 10.1186/s12864-017-3946-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 07/13/2017] [Indexed: 12/21/2022] Open

Fan Y, Siklenka K, Arora SK, Ribeiro P, Kimmins S, Xia J. miRNet - dissecting miRNA-target interactions and functional associations through network-based visual analysis. Nucleic Acids Res 2016;44:W135-41. [PMID: 27105848 PMCID: PMC4987881 DOI: 10.1093/nar/gkw288] [Citation(s) in RCA: 307] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 04/01/2016] [Accepted: 04/08/2016] [Indexed: 01/01/2023] Open

Dynamic alarm prediction for critical alarms using a probabilistic model. Chin J Chem Eng 2016. [DOI: 10.1016/j.cjche.2016.04.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Huang HH, Yu C. Clustering DNA sequences using the out-of-place measure with reduced n-grams. J Theor Biol 2016;406:61-72. [PMID: 27375217 DOI: 10.1016/j.jtbi.2016.06.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 05/18/2016] [Accepted: 06/21/2016] [Indexed: 11/25/2022]

Frades I, Resjö S, Andreasson E. Comparison of phosphorylation patterns across eukaryotes by discriminative N-gram analysis. BMC Bioinformatics 2015. [PMID: 26224486 PMCID: PMC4520095 DOI: 10.1186/s12859-015-0657-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase–substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets.

Results

First the discriminative n-grams of five species from five different kingdom/phyla were identified. A signature with 5540 discriminative n-grams that could be found in other species from the same kingdoms/phyla was created. Using a test data set, the ability of the signature to classify species in their corresponding kingdom/phylum was confirmed using classification methods. Lastly, ortholog proteins among proteins with n-grams were identified in order to determine to what degree was the identity of the detected n-grams a property of phosphosites rather than a consequence of species-specific or kingdom/phylum-specific protein inventory. The motifs were grouped in clusters of equal physico-chemical nature and their distribution was similar between species in the same kingdom/phylum while clear differences were found among species of different kingdom/phylum. For example, the animal-specific top discriminative n-grams contained many basic amino acids and the plant-specific motifs were mainly acidic. Secondary structure prediction methods show that the discriminative n-grams in the majority of the cases lack from a regular secondary structure as on average they had 88 % of random coil compared to 66 % found in the phosphoproteins they were derived from.

Conclusions

The discriminative n-grams were able to classify organisms in their corresponding kingdom/phylum, they show different patterns among species of different kingdom/phylum and these regions can contribute to evolutionary divergence as they are in disordered regions that can evolve rapidly. The differences found possibly reflect group-specific differences in the kinomes of the different groups of species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0657-2) contains supplementary material, which is available to authorized users.

Collapse

Maury JJP, Ng D, Bi X, Bardor M, Choo ABH. Multiple Reaction Monitoring Mass Spectrometry for the Discovery and Quantification of O-GlcNAc-Modified Proteins. Anal Chem 2013;86:395-402. [DOI: 10.1021/ac401821d] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Zemková M, Trifonov EN, Zahradník D. One common structural feature of "words" in protein sequences and human texts. J Biomol Struct Dyn 2013;32:1085-91. [PMID: 23808620 DOI: 10.1080/07391102.2013.809317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Srinivasan SM, Vural S, King BR, Guda C. Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics 2013;14:96. [PMID: 23496846 PMCID: PMC3610217 DOI: 10.1186/1471-2105-14-96] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2012] [Accepted: 12/17/2012] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class.

RESULTS

We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks.

CONCLUSION

The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.

Collapse

Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM. Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach. PLoS One 2012;7:e50039. [PMID: 23185527 PMCID: PMC3503725 DOI: 10.1371/journal.pone.0050039] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 10/15/2012] [Indexed: 11/19/2022] Open

Abstract

The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or "words". We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and "compressed" English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., "key words") and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.

Collapse

Ganapathiraju MK, Mitchell AD, Thahir M, Motwani K, Ananthasubramanian S. Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences. J Bioinform Comput Biol 2012;10:1250016. [PMID: 22817111 DOI: 10.1142/s0219720012500163] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

King BR, Vural S, Pandey S, Barteau A, Guda C. ngLOC: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes. BMC Res Notes 2012;5:351. [PMID: 22780965 PMCID: PMC3532370 DOI: 10.1186/1756-0500-5-351] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Accepted: 06/22/2012] [Indexed: 01/04/2023] Open