Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tetko IV, Facius A, Ruepp A, Mewes HW. Super paramagnetic clustering of protein sequences. BMC Bioinformatics 2005;6:82. [PMID: 15804359 PMCID: PMC1084344 DOI: 10.1186/1471-2105-6-82] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2004] [Accepted: 04/01/2005] [Indexed: 11/30/2022] Open

For:	Tetko IV, Facius A, Ruepp A, Mewes HW. Super paramagnetic clustering of protein sequences. BMC Bioinformatics 2005;6:82. [PMID: 15804359 PMCID: PMC1084344 DOI: 10.1186/1471-2105-6-82] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2004] [Accepted: 04/01/2005] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Alhusain L, Hafez AM. Nonparametric approaches for population structure analysis. Hum Genomics 2018;12:25. [PMID: 29743099 PMCID: PMC5944014 DOI: 10.1186/s40246-018-0156-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 12/28/2022] Open

Parejo M, Wragg D, Gauthier L, Vignal A, Neumann P, Neuditschko M. Using Whole-Genome Sequence Information to Foster Conservation Efforts for the European Dark Honey Bee, Apis mellifera mellifera. Front Ecol Evol 2016. [DOI: 10.3389/fevo.2016.00140] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Tetko IV, Maran U, Tropsha A. Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development. Mol Inform 2016;36. [PMID: 27778468 DOI: 10.1002/minf.201600082] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/03/2016] [Indexed: 01/08/2023]

Skuta C, Bartůněk P, Svozil D. InCHlib - interactive cluster heatmap for web applications. J Cheminform 2014;6:44. [PMID: 25264459 PMCID: PMC4173117 DOI: 10.1186/s13321-014-0044-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/08/2014] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap.

RESULTS

We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust.

CONCLUSIONS

The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.

Collapse

Li L, Liu C, Wang F, Miao W, Zhang J, Kang Z, Chen Y, Peng L. Unraveling the hidden heterogeneities of breast cancer based on functional miRNA cluster. PLoS One 2014;9:e87601. [PMID: 24498150 PMCID: PMC3907466 DOI: 10.1371/journal.pone.0087601] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 12/23/2013] [Indexed: 12/21/2022] Open

Abstract

It has become increasingly clear that the current taxonomy of clinical phenotypes is mixed with molecular heterogeneity, which potentially affects the treatment effect for involved patients. Defining the hidden molecular-distinct diseases using modern large-scale genomic approaches is therefore useful for refining clinical practice and improving intervention strategies. Given that microRNA expression profiling has provided a powerful way to dissect hidden genetic heterogeneity for complex diseases, the aim of the study was to develop a bioinformatics approach that identifies microRNA features leading to the hidden subtyping of complex clinical phenotypes. The basic strategy of the proposed method was to identify optimal miRNA clusters by iteratively partitioning the sample and feature space using the two-ways super-paramagnetic clustering technique. We evaluated the obtained optimal miRNA cluster by determining the consistency of co-expression and the chromosome location among the within-cluster microRNAs, and concluded that the optimal miRNA cluster could lead to a natural partition of disease samples. We applied the proposed method to a publicly available microarray dataset of breast cancer patients that have notoriously heterogeneous phenotypes. We obtained a feature subset of 13 microRNAs that could classify the 71 breast cancer patients into five subtypes with significantly different five-year overall survival rates (45%, 82.4%, 70.6%, 100% and 60% respectively; p = 0.008). By building a multivariate Cox proportional-hazards prediction model for the feature subset, we identified has-miR-146b as one of the most significant predictor (p = 0.045; hazard ratios = 0.39). The proposed algorithm is a promising computational strategy for dissecting hidden genetic heterogeneity for complex diseases, and will be of value for improving cancer diagnosis and treatment.

Collapse

Neuditschko M, Khatkar MS, Raadsma HW. NetView: a high-definition network-visualization approach to detect fine-scale population structures from genome-wide patterns of variation. PLoS One 2012;7:e48375. [PMID: 23152744 PMCID: PMC3485224 DOI: 10.1371/journal.pone.0048375] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Accepted: 09/25/2012] [Indexed: 02/06/2023] Open

Abstract

High-throughput sequencing and single nucleotide polymorphism (SNP) genotyping can be used to infer complex population structures. Fine-scale population structure analysis tracing individual ancestry remains one of the major challenges. Based on network theory and recent advances in SNP chip technology, we investigated an unsupervised network clustering method called Super Paramagnetic Clustering (Spc). When applied to whole-genome marker data it identifies the natural divisions of groups of individuals into population clusters without use of prior ancestry information. Furthermore, we optimised an analysis pipeline called NetView, a high-definition network visualization, starting with computation of genetic distance, followed clustering using Spc and finally visualization of clusters with Cytoscape. We compared NetView against commonly used methodologies including Principal Component Analyses (PCA) and a model-based algorithm, Admixture, on whole-genome-wide SNP data derived from three previously described data sets: simulated (2.5 million SNPs, 5 populations), human (1.4 million SNPs, 11 populations) and cattle (32,653 SNPs, 19 populations). We demonstrate that individuals can be effectively allocated to their correct population whilst simultaneously revealing fine-scale structure within the populations. Analyzing the human HapMap populations, we identified unexpected genetic relatedness among individuals, and population stratification within the Indian, African and Mexican samples. In the cattle data set, we correctly assigned all individuals to their respective breeds and detected fine-scale population sub-structures reflecting different sample origins and phenotypes. The NetView pipeline is computationally extremely efficient and can be easily applied on large-scale genome-wide data sets to assign individuals to particular populations and to reproduce fine-scale population structures without prior knowledge of individual ancestry. NetView can be used on any data from which a genetic relationship/distance between individuals can be calculated.

Collapse

Linard B, Nguyen NH, Prosdocimi F, Poch O, Thompson JD. EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data. Evol Bioinform Online 2011;8:61-77. [PMID: 22267905 PMCID: PMC3256995 DOI: 10.4137/ebo.s8814] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Carotid plaque age is a feature of plaque stability inversely related to levels of plasma insulin. PLoS One 2011;6:e18248. [PMID: 21490968 PMCID: PMC3072386 DOI: 10.1371/journal.pone.0018248] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 03/02/2011] [Indexed: 12/25/2022] Open

Abstract

Background

The stability of atherosclerotic plaques determines the risk for rupture, which may lead to thrombus formation and potentially severe clinical complications such as myocardial infarction and stroke. Although the rate of plaque formation may be important for plaque stability, this process is not well understood. We took advantage of the atmospheric ¹⁴C-declination curve (a result of the atomic bomb tests in the 1950s and 1960s) to determine the average biological age of carotid plaques.

Methodology/Principal Finding

The cores of carotid plaques were dissected from 29 well-characterized, symptomatic patients with carotid stenosis and analyzed for ¹⁴C content by accelerator mass spectrometry. The average plaque age (i.e. formation time) was 9.6±3.3 years. All but two plaques had formed within 5–15 years before surgery. Plaque age was not associated with the chronological ages of the patients but was inversely related to plasma insulin levels (p = 0.0014). Most plaques were echo-lucent rather than echo-rich (2.24±0.97, range 1–5). However, plaques in the lowest tercile of plaque age (most recently formed) were characterized by further instability with a higher content of lipids and macrophages (67.8±12.4 vs. 50.4±6.2, p = 0.00005; 57.6±26.1 vs. 39.8±25.7, p<0.0005, respectively), less collagen (45.3±6.1 vs. 51.1±9.8, p<0.05), and fewer smooth muscle cells (130±31 vs. 141±21, p<0.05) than plaques in the highest tercile. Microarray analysis of plaques in the lowest tercile also showed increased activity of genes involved in immune responses and oxidative phosphorylation.

Conclusions/Significance

Our results show, for the first time, that plaque age, as judge by relative incorporation of ¹⁴C, can improve our understanding of carotid plaque stability and therefore risk for clinical complications. Our results also suggest that levels of plasma insulin might be involved in determining carotid plaque age.

Collapse

Frech C, Chen N. Genome-wide comparative gene family classification. PLoS One 2010;5:e13409. [PMID: 20976221 PMCID: PMC2955529 DOI: 10.1371/journal.pone.0013409] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2010] [Accepted: 09/21/2010] [Indexed: 11/18/2022] Open

Multi-organ expression profiling uncovers a gene module in coronary artery disease involving transendothelial migration of leukocytes and LIM domain binding 2: the Stockholm Atherosclerosis Gene Expression (STAGE) study. PLoS Genet 2009;5:e1000754. [PMID: 19997623 PMCID: PMC2780352 DOI: 10.1371/journal.pgen.1000754] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 11/04/2009] [Indexed: 02/07/2023] Open

Abstract

Environmental exposures filtered through the genetic make-up of each individual alter the transcriptional repertoire in organs central to metabolic homeostasis, thereby affecting arterial lipid accumulation, inflammation, and the development of coronary artery disease (CAD). The primary aim of the Stockholm Atherosclerosis Gene Expression (STAGE) study was to determine whether there are functionally associated genes (rather than individual genes) important for CAD development. To this end, two-way clustering was used on 278 transcriptional profiles of liver, skeletal muscle, and visceral fat (n = 66/tissue) and atherosclerotic and unaffected arterial wall (n = 40/tissue) isolated from CAD patients during coronary artery bypass surgery. The first step, across all mRNA signals (n = 15,042/12,621 RefSeqs/genes) in each tissue, resulted in a total of 60 tissue clusters (n = 3958 genes). In the second step (performed within tissue clusters), one atherosclerotic lesion (n = 49/48) and one visceral fat (n = 59) cluster segregated the patients into two groups that differed in the extent of coronary stenosis (P = 0.008 and P = 0.00015). The associations of these clusters with coronary atherosclerosis were validated by analyzing carotid atherosclerosis expression profiles. Remarkably, in one cluster (n = 55/54) relating to carotid stenosis (P = 0.04), 27 genes in the two clusters relating to coronary stenosis were confirmed (n = 16/17, P<10(-27 and-30)). Genes in the transendothelial migration of leukocytes (TEML) pathway were overrepresented in all three clusters, referred to as the atherosclerosis module (A-module). In a second validation step, using three independent cohorts, the A-module was found to be genetically enriched with CAD risk by 1.8-fold (P<0.004). The transcription co-factor LIM domain binding 2 (LDB2) was identified as a potential high-hierarchy regulator of the A-module, a notion supported by subnetwork analysis, by cellular and lesion expression of LDB2, and by the expression of 13 TEML genes in Ldb2-deficient arterial wall. Thus, the A-module appears to be important for atherosclerosis development and, together with LDB2, merits further attention in CAD research.

Collapse

Andreopoulos B, An A, Wang X, Schroeder M. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform 2009;10:297-314. [PMID: 19240124 DOI: 10.1093/bib/bbn058] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Surmeli D, Ratmann O, Mewes HW, Tetko IV. FunCat functional inference with belief propagation and feature integration. Comput Biol Chem 2008;32:375-7. [DOI: 10.1016/j.compbiolchem.2008.06.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Revised: 06/03/2008] [Accepted: 06/22/2008] [Indexed: 11/26/2022]

Tetko IV, Rodchenkov IV, Walter MC, Rattei T, Mewes HW. Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information. ACTA ACUST UNITED AC 2008;24:621-8. [PMID: 18174184 DOI: 10.1093/bioinformatics/btm633] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Rattei T, Tischler P, Arnold R, Hamberger F, Krebs J, Krumsiek J, Wachinger B, Stümpflen V, Mewes W. SIMAP--structuring the network of protein similarities. Nucleic Acids Res 2007;36:D289-92. [PMID: 18037617 PMCID: PMC2238827 DOI: 10.1093/nar/gkm963] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Zhang W, Li L, Li X, Jiang W, Huo J, Wang Y, Lin M, Rao S. Unravelling the hidden heterogeneities of diffuse large B-cell lymphoma based on coupled two-way clustering. BMC Genomics 2007;8:332. [PMID: 17888167 PMCID: PMC2082044 DOI: 10.1186/1471-2164-8-332] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2006] [Accepted: 09/22/2007] [Indexed: 01/14/2023] Open

Abstract

BACKGROUND

It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology.

RESULTS

We applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study.

CONCLUSION

Our results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers.

Collapse

Kelil A, Wang S, Brzezinski R, Fleury A. CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 2007;8:286. [PMID: 17683581 PMCID: PMC1976428 DOI: 10.1186/1471-2105-8-286] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2007] [Accepted: 08/04/2007] [Indexed: 11/14/2022] Open

Abstract

Background

The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions".

Results

To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity.

Conclusion

We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.

Collapse

Cho YR, Hwang W, Ramanathan M, Zhang A. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinformatics 2007;8:265. [PMID: 17650343 PMCID: PMC1971074 DOI: 10.1186/1471-2105-8-265] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2007] [Accepted: 07/24/2007] [Indexed: 12/05/2022] Open

Marttinen P, Corander J, Törönen P, Holm L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 2006;22:2466-74. [PMID: 16870932 DOI: 10.1093/bioinformatics/btl411] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Tetko IV. Computing chemistry on the web. Drug Discov Today 2005;10:1497-500. [PMID: 16257371 DOI: 10.1016/s1359-6446(05)03584-1] [Citation(s) in RCA: 161] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]