Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Datta S, Datta S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 2006;7:397. [PMID: 16945146 PMCID: PMC1590054 DOI: 10.1186/1471-2105-7-397] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 08/31/2006] [Indexed: 11/10/2022] Open

For:	Datta S, Datta S. Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes. BMC Bioinformatics 2006;7:397. [PMID: 16945146 PMCID: PMC1590054 DOI: 10.1186/1471-2105-7-397] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 08/31/2006] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Smith ML, Barrett ME. Development and validation of the Upstream Social Interaction Risk Scale (U-SIRS-13): a scale to assess threats to social connectedness among older adults. Front Public Health 2024;12:1454847. [PMID: 39351036 PMCID: PMC11439676 DOI: 10.3389/fpubh.2024.1454847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 09/02/2024] [Indexed: 10/04/2024] Open

Abstract

Background

Social interactions are essential to social connectedness among older adults. While many scales have been developed to measure various aspects of social connectedness, most are narrow in scope, which may not be optimally encompassing, practical, or relevant for use with older adults across clinical and community settings. Efforts are needed to create more sensitive scales that can identify "upstream risk," which may facilitate timey referral and/or intervention.

Objective

The purposes of this study were to: (1) develop and validate a brief scale to measure threats to social connectedness among older adults in the context of their social interactions; and (2) offer practical scoring and implementation recommendations for utilization in research and practice contexts.

Methods

A sequential process was used to develop the initial instrument used in this study, which was then methodologically reduced to create a brief 13-item scale. Relevant, existing scales and measures were identified and compiled, which were then critically assessed by a combination of research and practice experts to optimize the pool of relevant items that assess threats to social connectedness while reducing potential redundancies. Then, a national sample of 4,082 older adults ages 60 years and older completed a web-based questionnaire containing the initial 36 items about social connection. Several data analysis methods were applied to assess the underlying dimensionality of the data and construct measures of different factors related to risk, including item response theory (IRT) modeling, clustering techniques, and structural equation modeling (SEM).

Results

IRT modeling reduced the initial 36 items to create the 13-item Upstream Social Interaction Risk Scale (U-SIRS-13) with strong model fit. The dimensionality assessment using different clustering algorithms supported a 2-factor solution to classify risk. The SEM predicting highest risk items fit exceptionally well (RMSEA = 0.048; CFI = 0.954). For the 13-item scale, theta scores generated from IRT were strongly correlated with the summed count of items binarily identifying risk (r = 0.896, p < 0.001), thus supporting the use of practical scoring techniques for research and practice (Cronbach's alpha = 0.80).

Conclusion

The U-SIRS-13 is a multidimensional scale with strong face, content, and construct validity. Findings support its practical utility to identify threats to social connectedness among older adults posed by limited physical opportunities for social interactions and lacking emotional fulfillment from social interactions.

Collapse

Venn B, Leifeld T, Zhang P, Mühlhaus T. Temporal classification of short time series data. BMC Bioinformatics 2024;25:30. [PMID: 38233793 PMCID: PMC10792935 DOI: 10.1186/s12859-024-05636-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/03/2024] [Indexed: 01/19/2024] Open

Till SE, Lu Y, Reinholz AK, Boos AM, Krych AJ, Okoroha KR, Camp CL. Artificial Intelligence Can Define and Predict the "Optimal Observed Outcome" After Anterior Shoulder Instability Surgery: An Analysis of 200 Patients With 11-Year Mean Follow-Up. Arthrosc Sports Med Rehabil 2023;5:100773. [PMID: 37520500 PMCID: PMC10382895 DOI: 10.1016/j.asmr.2023.100773] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 06/14/2023] [Indexed: 08/01/2023] Open

Abstract

Purpose

The purpose of this study was to use unsupervised machine learning clustering to define the "optimal observed outcome" after surgery for anterior shoulder instability (ASI) and to identify predictors for achieving it.

Methods

Medical records, images, and operative reports were reviewed for patients <40 years old undergoing surgery for ASI. Four unsupervised machine learning clustering algorithms partitioned subjects into "optimal observed outcome" or "suboptimal outcome" based on combinations of actually observed outcomes. Demographic, clinical, and treatment variables were compared between groups using descriptive statistics and Kaplan-Meier survival curves. Variables were assessed for prognostic value through multivariate stepwise logistic regression.

Results

Two hundred patients with a mean follow-up of 11 years were included. Of these, 146 (64%) obtained the "optimal observed outcome," characterized by decreased: postoperative pain (23% vs 52%; P < 0.001), recurrent instability (12% vs 41%; P < 0.001), revision surgery (10% vs 24%; P = 0.015), osteoarthritis (OA) (5% vs 19%; P = 0.005), and restricted motion (161° vs 168°; P = 0.001). Forty-one percent of patients had a "perfect outcome," defined as ideal performance across all outcomes. Time from initial instability to presentation (odds ratio [OR] = 0.96; 95% confidence interval [CI], 0.92-0.98; P = 0.006) and habitual/voluntary instability (OR = 0.17; 95% CI, 0.04-0.77; P = 0.020) were negative predictors of achieving the "optimal observed outcome." A predilection toward subluxations rather than dislocations before surgery (OR = 1.30; 95% CI, 1.02-1.65; P = 0.030) was a positive predictor. Type of surgery performed was not a significant predictor.

Conclusion

After surgery for ASI, 64% of patients achieved the "optimal observed outcome" defined as minimal postoperative pain, no recurrent instability or OA, low revision surgery rates, and increased range of motion, of whom only 41% achieved a "perfect outcome." Positive predictors were shorter time to presentation and predilection toward preoperative subluxations over dislocations.

Level of Evidence

Retrospective cohort, level IV.

Collapse

Demir Karaman E, Işık Z. Multi-Omics Data Analysis Identifies Prognostic Biomarkers across Cancers. Med Sci (Basel) 2023;11:44. [PMID: 37489460 PMCID: PMC10366886 DOI: 10.3390/medsci11030044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/18/2023] [Accepted: 06/20/2023] [Indexed: 07/26/2023] Open

Smith RN, Rosales IA, Tomaszewski KT, Mahowald GT, Araujo-Medina M, Acheampong E, Bruce A, Rios A, Otsuka T, Tsuji T, Hotta K, Colvin R. Utility of Banff Human Organ Transplant Gene Panel in Human Kidney Transplant Biopsies. Transplantation 2023;107:1188-1199. [PMID: 36525551 PMCID: PMC10132999 DOI: 10.1097/tp.0000000000004389] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Dovrou A, Bei E, Sfakianakis S, Marias K, Papanikolaou N, Zervakis M. Synergies of Radiomics and Transcriptomics in Lung Cancer Diagnosis: A Pilot Study. Diagnostics (Basel) 2023;13:738. [PMID: 36832225 PMCID: PMC9955510 DOI: 10.3390/diagnostics13040738] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/10/2023] [Accepted: 02/10/2023] [Indexed: 02/17/2023] Open

Abstract

Radiotranscriptomics is an emerging field that aims to investigate the relationships between the radiomic features extracted from medical images and gene expression profiles that contribute in the diagnosis, treatment planning, and prognosis of cancer. This study proposes a methodological framework for the investigation of these associations with application on non-small-cell lung cancer (NSCLC). Six publicly available NSCLC datasets with transcriptomics data were used to derive and validate a transcriptomic signature for its ability to differentiate between cancer and non-malignant lung tissue. A publicly available dataset of 24 NSCLC-diagnosed patients, with both transcriptomic and imaging data, was used for the joint radiotranscriptomic analysis. For each patient, 749 Computed Tomography (CT) radiomic features were extracted and the corresponding transcriptomics data were provided through DNA microarrays. The radiomic features were clustered using the iterative K-means algorithm resulting in 77 homogeneous clusters, represented by meta-radiomic features. The most significant differentially expressed genes (DEGs) were selected by performing Significance Analysis of Microarrays (SAM) and 2-fold change. The interactions among the CT imaging features and the selected DEGs were investigated using SAM and a Spearman rank correlation test with a False Discovery Rate (FDR) of 5%, leading to the extraction of 73 DEGs significantly correlated with radiomic features. These genes were used to produce predictive models of the meta-radiomics features, defined as p-metaomics features, by performing Lasso regression. Of the 77 meta-radiomic features, 51 can be modeled in terms of the transcriptomic signature. These significant radiotranscriptomics relationships form a reliable basis to biologically justify the radiomics features extracted from anatomic imaging modalities. Thus, the biological value of these radiomic features was justified via enrichment analysis on their transcriptomics-based regression models, revealing closely associated biological processes and pathways. Overall, the proposed methodological framework provides joint radiotranscriptomics markers and models to support the connection and complementarities between the transcriptome and the phenotype in cancer, as demonstrated in the case of NSCLC.

Collapse

Esnault C, Rollot M, Guilmin P, Zucker JD. Qluster: An easy-to-implement generic workflow for robust clustering of health data. Front Artif Intell 2023;5:1055294. [PMID: 36814808 PMCID: PMC9939832 DOI: 10.3389/frai.2022.1055294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 12/22/2022] [Indexed: 02/08/2023] Open

Abstract

The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors.

Collapse

Isik Z, Leblebici A, Demir Karaman E, Karaca C, Ellidokuz H, Koc A, Ellidokuz EB, Basbinar Y. In silico identification of novel biomarkers for key players in transition from normal colon tissue to adenomatous polyps. PLoS One 2022;17:e0267973. [PMID: 35486660 PMCID: PMC9053805 DOI: 10.1371/journal.pone.0267973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 04/19/2022] [Indexed: 11/18/2022] Open

Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022;6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open

Using Simulated Pest Models and Biological Clustering Validation to Improve Zoning Methods in Site-Specific Pest Management. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12041900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Tenekeci S, Isik Z. Integrative Biological Network Analysis to Identify Shared Genes in Metabolic Disorders. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:522-530. [PMID: 32396100 DOI: 10.1109/tcbb.2020.2993301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Nowak J, Eng RC, Matz T, Waack M, Persson S, Sampathkumar A, Nikoloski Z. A network-based framework for shape analysis enables accurate characterization of leaf epidermal cells. Nat Commun 2021;12:458. [PMID: 33469016 PMCID: PMC7815848 DOI: 10.1038/s41467-020-20730-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 12/17/2020] [Indexed: 01/29/2023] Open

Parraga-Alava J, Inostroza-Ponta M. Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance. J Bioinform Comput Biol 2020;18:2050038. [PMID: 33148094 DOI: 10.1142/s0219720020500389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Kang K, Kim HH, Choi Y. Tiotropium is Predicted to be a Promising Drug for COVID-19 Through Transcriptome-Based Comprehensive Molecular Pathway Analysis. Viruses 2020;12:E776. [PMID: 32698440 PMCID: PMC7412475 DOI: 10.3390/v12070776] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 07/10/2020] [Accepted: 07/17/2020] [Indexed: 12/12/2022] Open

Abstract

The coronavirus disease 2019 (COVID-19) outbreak caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) affects almost everyone in the world in many ways. We previously predicted antivirals (atazanavir, remdesivir and lopinavir/ritonavir) and non-antiviral drugs (tiotropium and rapamycin) that may inhibit the replication complex of SARS-CoV-2 using our molecular transformer-drug target interaction (MT-DTI) deep-learning-based drug-target affinity prediction model. In this study, we dissected molecular pathways upregulated in SARS-CoV-2-infected normal human bronchial epithelial (NHBE) cells by analyzing an RNA-seq data set with various bioinformatics approaches, such as gene ontology, protein-protein interaction-based network and gene set enrichment analyses. The results indicated that the SARS-CoV-2 infection strongly activates TNF and NFκB-signaling pathways through significant upregulation of the TNF, IL1B, IL6, IL8, NFKB1, NFKB2 and RELB genes. In addition to these pathways, lung fibrosis, keratinization/cornification, rheumatoid arthritis, and negative regulation of interferon-gamma production pathways were also significantly upregulated. We observed that these pathologic features of SARS-CoV-2 are similar to those observed in patients with chronic obstructive pulmonary disease (COPD). Intriguingly, tiotropium, as predicted by MT-DTI, is currently used as a therapeutic intervention in COPD patients. Treatment with tiotropium has been shown to improve pulmonary function by alleviating airway inflammation. Accordingly, a literature search summarized that tiotropium reduced expressions of IL1B, IL6, IL8, RELA, NFKB1 and TNF in vitro or in vivo, and many of them have been known to be deregulated in COPD patients. These results suggest that COVID-19 is similar to an acute mode of COPD caused by the SARS-CoV-2 infection, and therefore tiotropium may be effective for COVID-19 patients.

Collapse

Dutta P, Saha S, Pai S, Kumar A. A Protein Interaction Information-based Generative Model for Enhancing Gene Clustering. Sci Rep 2020;10:665. [PMID: 31959782 PMCID: PMC6971242 DOI: 10.1038/s41598-020-57437-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 12/20/2019] [Indexed: 11/18/2022] Open

Lu Y, Phillips CA, Langston MA. A robustness metric for biological data clustering algorithms. BMC Bioinformatics 2019;20:503. [PMID: 31874625 PMCID: PMC6929270 DOI: 10.1186/s12859-019-3089-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/10/2019] [Indexed: 02/05/2023] Open

Alzheimer's disease clinical variants show distinct regional patterns of neurofibrillary tangle accumulation. Acta Neuropathol 2019;138:597-612. [PMID: 31250152 DOI: 10.1007/s00401-019-02036-6] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/16/2019] [Accepted: 06/16/2019] [Indexed: 10/26/2022]

Abstract

The clinical spectrum of Alzheimer's disease (AD) extends well beyond the classic amnestic-predominant syndrome. The previous studies have suggested differential neurofibrillary tangle (NFT) burden between amnestic and logopenic primary progressive aphasia presentations of AD. In this study, we explored the regional distribution of NFT pathology and its relationship to AD presentation across five different clinical syndromes. We assessed NFT density throughout six selected neocortical and hippocampal regions using thioflavin-S fluorescent microscopy in a well-characterized clinicopathological cohort of pure AD cases enriched for atypical clinical presentations. Subjects underwent apolipoprotein E genotyping and neuropsychological testing. Main cognitive domains (executive, visuospatial, language, and memory function) were assessed using an established composite z score. Our results showed that NFT regional burden aligns with the clinical presentation and region-specific cognitive scores. Cortical, but not hippocampal, NFT burden was higher among atypical clinical variants relative to the amnestic syndrome. In analyses of specific clinical variants, logopenic primary progressive aphasia showed higher NFT density in the superior temporal gyrus (p = 0.0091), and corticobasal syndrome showed higher NFT density in the primary motor cortex (p = 0.0205) relative to the amnestic syndrome. Higher NFT burden in the angular gyrus and CA1 sector of the hippocampus were independently associated with worsening visuospatial dysfunction. In addition, unbiased hierarchical clustering based on regional NFT densities identified three groups characterized by a low overall NFT burden, high overall burden, and cortical-predominant burden, respectively, which were found to differ in sex ratio, age, disease duration, and clinical presentation. In comparison, the typical, hippocampal sparing, and limbic-predominant subtypes derived from a previously proposed algorithm did not reproduce the same degree of clinical relevance in this sample. Overall, our results suggest domain-specific functional consequences of regional NFT accumulation. Mapping these consequences presents an opportunity to increase understanding of the neuropathological framework underlying atypical clinical manifestations.

Collapse

Barido-Sottani J, Chapman SD, Kosman E, Mushegian AR. Measuring similarity between gene interaction profiles. BMC Bioinformatics 2019;20:435. [PMID: 31438841 PMCID: PMC6704681 DOI: 10.1186/s12859-019-3024-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 08/09/2019] [Indexed: 11/14/2022] Open

Abstract

Background

Gene and protein interaction data are often represented as interaction networks, where nodes stand for genes or gene products and each edge stands for a relationship between a pair of gene nodes. Commonly, that relationship within a pair is specified by high similarity between profiles (vectors) of experimentally defined interactions of each of the two genes with all other genes in the genome; only gene pairs that interact with similar sets of genes are linked by an edge in the network. The tight groups of genes/gene products that work together in a cell can be discovered by the analysis of those complex networks.

Results

We show that the choice of the similarity measure between pairs of gene vectors impacts the properties of networks and of gene modules detected within them. We re-analyzed well-studied data on yeast genetic interactions, constructed four genetic networks using four different similarity measures, and detected gene modules in each network using the same algorithm. The four networks induced different numbers of putative functional gene modules, and each similarity measure induced some unique modules. In an example of a putative functional connection suggested by comparing genetic interaction vectors, we predict a link between SUN-domain proteins and protein glycosylation in the endoplasmic reticulum.

Conclusions

The discovery of molecular modules in genetic networks is sensitive to the way of measuring similarity between profiles of gene interactions in a cell. In the absence of a formal way to choose the “best” measure, it is advisable to explore the measures with different mathematical properties, which may identify different sets of connections between genes.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-3024-x) contains supplementary material, which is available to authorized users.

Collapse

A Review of Computational Methods for Clustering Genes with Similar Biological Functions. Processes (Basel) 2019. [DOI: 10.3390/pr7090550] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Kim J, Stanescu DE, Won KJ. CellBIC: bimodality-based top-down clustering of single-cell RNA sequencing data reveals hierarchical structure of the cell type. Nucleic Acids Res 2019;46:e124. [PMID: 30102368 PMCID: PMC6265269 DOI: 10.1093/nar/gky698] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 07/23/2018] [Indexed: 01/08/2023] Open

Yang M, Chen J, Xu L, Shi X, Zhou X, An R, Wang X. A Network Pharmacology Approach to Uncover the Molecular Mechanisms of Herbal Formula Ban-Xia-Xie-Xin-Tang. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2018;2018:4050714. [PMID: 30410554 PMCID: PMC6206573 DOI: 10.1155/2018/4050714] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 10/03/2018] [Indexed: 02/07/2023]

Biological networks integration based on dense module identification for gene prioritization from microarray data. GENE REPORTS 2018. [DOI: 10.1016/j.genrep.2018.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Endovascular Biopsy and Endothelial Cell Gene Expression Analysis of Dialysis Arteriovenous Fistulas: A Feasibility Study. J Vasc Interv Radiol 2018;29:1403-1409.e2. [PMID: 30174159 DOI: 10.1016/j.jvir.2018.04.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 04/10/2018] [Accepted: 04/22/2018] [Indexed: 02/07/2023] Open

Saelens W, Cannoodt R, Saeys Y. A comprehensive evaluation of module detection methods for gene expression data. Nat Commun 2018;9:1090. [PMID: 29545622 PMCID: PMC5854612 DOI: 10.1038/s41467-018-03424-4] [Citation(s) in RCA: 148] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 02/12/2018] [Indexed: 12/19/2022] Open

Leale G, Baya AE, Milone DH, Granitto PM, Stegmayer G. Inferring Unknown Biological Function by Integration of GO Annotations and Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:168-180. [PMID: 27723603 DOI: 10.1109/tcbb.2016.2615960] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Dutta P, Saha S. Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering. Comput Biol Med 2017;89:31-43. [PMID: 28783536 DOI: 10.1016/j.compbiomed.2017.07.015] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 07/28/2017] [Accepted: 07/28/2017] [Indexed: 11/29/2022]

Endovascular Biopsy: In Vivo Cerebral Aneurysm Endothelial Cell Sampling and Gene Expression Analysis. Transl Stroke Res 2017;9:20-33. [PMID: 28900857 DOI: 10.1007/s12975-017-0560-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 07/31/2017] [Accepted: 08/01/2017] [Indexed: 10/18/2022]

Abstract

There is limited data describing endothelial cell (EC) gene expression between aneurysms and arteries partly because of risks associated with surgical tissue collection. Endovascular biopsy (EB) is a lower risk alternative to conventional surgical methods, though no such efforts have been attempted for aneurysms. We sought (1) to establish the feasibility of EB to isolate viable ECs by fluorescence-activated cell sorting (FACS), (2) to characterize the differences in gene expression by anatomic location and rupture status using single-cell qPCR, and (3) to demonstrate the utility of unsupervised clustering algorithms to identify cell subpopulations. EB was performed in 10 patients (5 ruptured, 5 non-ruptured). FACS was used to isolate the ECs and single-cell qPCR was used to quantify the expression of 48 genes. Linear mixed models and exploratory multilevel component analysis (MCA) and self-organizing maps (SOMs) were performed to identify possible subpopulations of cells. ECs were collected from all aneurysms and there were no adverse events. A total of 437 ECs was collected, 94 (22%) of which were aneurysmal cells and 319 (73%) demonstrated EC-specific gene expression. Ruptured aneurysm cells, relative controls, yielded a median p value of 0.40 with five genes (10%) with p values < 0.05. The five genes (TIE1, ENG, VEGFA, MMP2, and VWF) demonstrated uniformly reduced expression relative the remaining ECs. MCA and SOM analyses identified a population of outlying cells characterized by cell marker gene expression profiles different from endothelial cells. After removal of these cells, no cell clustering based on genetic co-expressivity was found to differentiate aneurysm cells from control cells. Endovascular sampling is a reliable method for cell collection for brain aneurysm gene analysis and may serve as a technique to further vascular molecular research. There is utility in combining mixed and clustering methods, despite no specific subpopulation identified in this trial.

Collapse

Ji G, Lin Q, Long Y, Ye C, Ye W, Wu X. PAcluster: Clustering polyadenylation site data using canonical correlation analysis. J Bioinform Comput Biol 2017;15:1750018. [PMID: 28874086 DOI: 10.1142/s0219720017500184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Khan A, Katanic D, Thakar J. Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes. BMC Bioinformatics 2017;18:295. [PMID: 28587632 PMCID: PMC5461682 DOI: 10.1186/s12859-017-1669-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Accepted: 05/03/2017] [Indexed: 01/06/2023] Open

Abstract

BACKGROUND

Despite advances in the gene-set enrichment analysis methods; inadequate definitions of gene-sets cause a major limitation in the discovery of novel biological processes from the transcriptomic datasets. Typically, gene-sets are obtained from publicly available pathway databases, which contain generalized definitions frequently derived by manual curation. Recently unsupervised clustering algorithms have been proposed to identify gene-sets from transcriptomics datasets deposited in public domain. These data-driven definitions of the gene-sets can be context-specific revealing novel biological mechanisms. However, the previously proposed algorithms for identification of data-driven gene-sets are based on hard clustering which do not allow overlap across clusters, a characteristic that is predominantly observed across biological pathways.

RESULTS

We developed a pipeline using fuzzy-C-means (FCM) soft clustering approach to identify gene-sets which recapitulates topological characteristics of biological pathways. Specifically, we apply our pipeline to derive gene-sets from transcriptomic data measuring response of monocyte derived dendritic cells and A549 epithelial cells to influenza infections. Our approach apply Ward's method for the selection of initial conditions, optimize parameters of FCM algorithm for human cell-specific transcriptomic data and identify robust gene-sets along with versatile viral responsive genes.

CONCLUSION

We validate our gene-sets and demonstrate that by identifying genes associated with multiple gene-sets, FCM clustering algorithm significantly improves interpretation of transcriptomic data facilitating investigation of novel biological processes by leveraging on transcriptomic data available in the public domain. We develop an interactive 'Fuzzy Inference of Gene-sets (FIGS)' package (GitHub: https://github.com/Thakar-Lab/FIGS ) to facilitate use of of pipeline. Future extension of FIGS across different immune cell-types will improve mechanistic investigation followed by high-throughput omics studies.

Collapse

Exploratory analysis of local gene groups in breast cancer guided by biological networks. HEALTH AND TECHNOLOGY 2017. [DOI: 10.1007/s12553-016-0155-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

A new cognitive filtering approach based on Freeman K3 Neural Networks. APPL INTELL 2016. [DOI: 10.1007/s10489-016-0772-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Cifola I, Lionetti M, Pinatel E, Todoerti K, Mangano E, Pietrelli A, Fabris S, Mosca L, Simeon V, Petrucci MT, Morabito F, Offidani M, Di Raimondo F, Falcone A, Caravita T, Battaglia C, De Bellis G, Palumbo A, Musto P, Neri A. Whole-exome sequencing of primary plasma cell leukemia discloses heterogeneous mutational patterns. Oncotarget 2016;6:17543-58. [PMID: 26046463 PMCID: PMC4627327 DOI: 10.18632/oncotarget.4028] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 05/11/2015] [Indexed: 02/04/2023] Open

Affiliation(s)

Ingrid Cifola Institute for Biomedical Technologies, National Research Council, Milan, Italy
Marta Lionetti Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy.,Hematology, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Eva Pinatel Institute for Biomedical Technologies, National Research Council, Milan, Italy
Katia Todoerti Laboratory of Pre-Clinical and Translational Research, IRCCS-CROB, Referral Cancer Center of Basilicata, Rionero in Vulture (PZ), Italy
Eleonora Mangano Institute for Biomedical Technologies, National Research Council, Milan, Italy
Alessandro Pietrelli Institute for Biomedical Technologies, National Research Council, Milan, Italy
Sonia Fabris Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy.,Hematology, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Laura Mosca Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy.,Hematology, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Vittorio Simeon Laboratory of Pre-Clinical and Translational Research, IRCCS-CROB, Referral Cancer Center of Basilicata, Rionero in Vulture (PZ), Italy
Maria Teresa Petrucci Hematology, Department of Cellular Biotechnologies and Hematology, La Sapienza University, Rome, Italy
Fortunato Morabito Hematology Unit, Azienda Ospedaliera di Cosenza, Cosenza, Italy
Massimo Offidani Hematologic Clinic, Azienda Ospedaliero-Universitaria Ospedali Riuniti di Ancona, Ancona, Italy
Francesco Di Raimondo Department of Biomedical Sciences, Division of Hematology, Ospedale Ferrarotto, University of Catania, Catania, Italy
Antonietta Falcone Hematology Unit, IRCCS "Casa Sollievo della Sofferenza" Hospital, San Giovanni Rotondo, Italy
Tommaso Caravita Department of Hematology, Ospedale S. Eugenio, Tor Vergata University, Rome, Italy
Cristina Battaglia Institute for Biomedical Technologies, National Research Council, Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, University of Milan, Milan, Italy
Gianluca De Bellis Institute for Biomedical Technologies, National Research Council, Milan, Italy
Antonio Palumbo Division of Hematology, University of Torino, A.O.U. San Giovanni Battista, Torino, Italy
Pellegrino Musto Scientific Direction, IRCCS-CROB, Referral Cancer Center of Basilicata, Rionero in Vulture (PZ), Italy
Antonino Neri Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy.,Hematology, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy

Collapse

Li H, Li C, Hu J, Fan X. A Resampling Based Clustering Algorithm for Replicated Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:1295-1303. [PMID: 26671802 DOI: 10.1109/tcbb.2015.2403320] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Knowledge-Based Analysis for Detecting Key Signaling Events from Time-Series Phosphoproteomics Data. PLoS Comput Biol 2015;11:e1004403. [PMID: 26252020 PMCID: PMC4529189 DOI: 10.1371/journal.pcbi.1004403] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 06/11/2015] [Indexed: 12/24/2022] Open

Abstract

Cell signaling underlies transcription/epigenetic control of a vast majority of cell-fate decisions. A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. In a typical phosphoproteomics study, phosphorylation sites (substrates) of active kinases are quantified proteome-wide. By analyzing the activities of phosphorylation sites over a time-course, the temporal dynamics of signaling cascades can be elucidated. Since many substrates of a given kinase have similar temporal kinetics, clustering phosphorylation sites into distinctive clusters can facilitate identification of their respective kinases. Here we present a knowledge-based CLUster Evaluation (CLUE) approach for identifying the most informative partitioning of a given temporal phosphoproteomics data. Our approach utilizes prior knowledge, annotated kinase-substrate relationships mined from literature and curated databases, to first generate biologically meaningful partitioning of the phosphorylation sites and then determine key kinases associated with each cluster. We demonstrate the utility of the proposed approach on two time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway. The proposed approach will be a valuable resource in the identification and characterizing of signaling networks from phosphoproteomics data.

A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. Mass spectrometry-based technologies have emerged as a powerful tool to profile proteome-wide phosphorylation events in vivo at a single amino acid resolution with high precision. However, development of algorithms to analyze and identify signaling events from high-throughput phosphoproteomics data is still in its infancy. Here we propose a knowledge-based CLUster Evaluation (CLUE) approach for identifying key signaling cascades from time-series phosphoproteomics data. Our approach utilizes known kinase-substrate annotations from curated phosphoproteomics databases to first determine the optimal clustering of the phosphorylation sites and then identify enriched kinase(s). We apply CLUE on time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway.

Collapse

Ye N, Yin H, Liu J, Dai X, Yin T. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature. BIOMED RESEARCH INTERNATIONAL 2015;2015:853734. [PMID: 26199946 PMCID: PMC4496643 DOI: 10.1155/2015/853734] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 05/20/2015] [Accepted: 06/11/2015] [Indexed: 12/21/2022]

Berenstein AJ, Piñero J, Furlong LI, Chernomoretz A. Mining the modular structure of protein interaction networks. PLoS One 2015;10:e0122477. [PMID: 25856434 PMCID: PMC4391834 DOI: 10.1371/journal.pone.0122477] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 02/11/2015] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis.

METHODOLOGY

We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns.

RESULTS

As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.

Collapse

Chang JS, Kim Y, Kim SH, Hwang S, Kim J, Chung IW, Kim YS, Jung HY. Differences in the internal structure of hallucinatory experiences between clinical and nonclinical populations. Psychiatry Res 2015;226:204-10. [PMID: 25619435 DOI: 10.1016/j.psychres.2014.12.051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 12/30/2014] [Accepted: 12/31/2014] [Indexed: 10/24/2022]

Omranian N, Mueller-Roeber B, Nikoloski Z. Segmentation of biological multivariate time-series data. Sci Rep 2015;5:8937. [PMID: 25758050 PMCID: PMC5390911 DOI: 10.1038/srep08937] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 02/06/2015] [Indexed: 11/15/2022] Open

Vavoulis DV, Francescatto M, Heutink P, Gough J. DGEclust: differential expression analysis of clustered count data. Genome Biol 2015;16:39. [PMID: 25853652 PMCID: PMC4365804 DOI: 10.1186/s13059-015-0604-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 02/03/2015] [Indexed: 11/10/2022] Open

Milone DH, Stegmayer G, López M, Kamenetzky L, Carrari F. Improving clustering with metabolic pathway data. BMC Bioinformatics 2014;15:101. [PMID: 24717120 PMCID: PMC4002909 DOI: 10.1186/1471-2105-15-101] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 03/25/2014] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters.

RESULTS

A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view.

CONCLUSIONS

Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.

Collapse

Sirinukunwattana K, Savage RS, Bari MF, Snead DRJ, Rajpoot NM. Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics. PLoS One 2013;8:e75748. [PMID: 24194826 PMCID: PMC3806770 DOI: 10.1371/journal.pone.0075748] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 08/19/2013] [Indexed: 11/29/2022] Open

Marx H, Lemeer S, Klaeger S, Rattei T, Kuster B. MScDB: a mass spectrometry-centric protein sequence database for proteomics. J Proteome Res 2013;12:2386-98. [PMID: 23627461 DOI: 10.1021/pr400215r] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Darkins R, Cooke EJ, Ghahramani Z, Kirk PDW, Wild DL, Savage RS. Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm. PLoS One 2013;8:e59795. [PMID: 23565168 PMCID: PMC3614914 DOI: 10.1371/journal.pone.0059795] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 02/19/2013] [Indexed: 11/19/2022] Open

The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn 2013. [DOI: 10.1007/s10994-013-5334-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Pestian J, Matykiewicz P, Holland-Bouley K, Standridge S, Spencer M, Glauser T. Selecting anti-epileptic drugs: a pediatric epileptologist's view, a computer's view. Acta Neurol Scand 2013;127:208-15. [PMID: 22998126 PMCID: PMC3574228 DOI: 10.1111/ane.12002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2012] [Indexed: 01/13/2023]

Verbanck M, Lê S, Pagès J. A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data. BMC Bioinformatics 2013;14:42. [PMID: 23387364 PMCID: PMC3635920 DOI: 10.1186/1471-2105-14-42] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 01/18/2013] [Indexed: 12/03/2022] Open

Mukhopadhyay A, Maulik U, Bandyopadhyay S. An Interactive Approach to Multiobjective Clustering of Gene Expression Patterns. IEEE Trans Biomed Eng 2013;60:35-41. [DOI: 10.1109/tbme.2012.2220765] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Sîrbu A, Kerr G, Crane M, Ruskin HJ. RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS One 2012;7:e50986. [PMID: 23251411 PMCID: PMC3518479 DOI: 10.1371/journal.pone.0050986] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 10/30/2012] [Indexed: 01/13/2023] Open

Abstract

With the fast development of high-throughput sequencing technologies, a new generation of genome-wide gene expression measurements is under way. This is based on mRNA sequencing (RNA-seq), which complements the already mature technology of microarrays, and is expected to overcome some of the latter's disadvantages. These RNA-seq data pose new challenges, however, as strengths and weaknesses have yet to be fully identified. Ideally, Next (or Second) Generation Sequencing measures can be integrated for more comprehensive gene expression investigation to facilitate analysis of whole regulatory networks. At present, however, the nature of these data is not very well understood. In this paper we study three alternative gene expression time series datasets for the Drosophila melanogaster embryo development, in order to compare three measurement techniques: RNA-seq, single-channel and dual-channel microarrays. The aim is to study the state of the art for the three technologies, with a view of assessing overlapping features, data compatibility and integration potential, in the context of time series measurements. This involves using established tools for each of the three different technologies, and technical and biological replicates (for RNA-seq and microarrays, respectively), due to the limited availability of biological RNA-seq replicates for time series data. The approach consists of a sensitivity analysis for differential expression and clustering. In general, the RNA-seq dataset displayed highest sensitivity to differential expression. The single-channel data performed similarly for the differentially expressed genes common to gene sets considered. Cluster analysis was used to identify different features of the gene space for the three datasets, with higher similarities found for the RNA-seq and single-channel microarray dataset.

Collapse

Kirk P, Griffin JE, Savage RS, Ghahramani Z, Wild DL. Bayesian correlated clustering to integrate multiple datasets. ACTA ACUST UNITED AC 2012;28:3290-7. [PMID: 23047558 PMCID: PMC3519452 DOI: 10.1093/bioinformatics/bts595] [Citation(s) in RCA: 158] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Abstract

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets.

Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.

Contact:D.L.Wild@warwick.ac.uk

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Gao C, Weisman D, Gou N, Ilyin V, Gu AZ. Analyzing high dimensional toxicogenomic data using consensus clustering. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2012;46:8413-8421. [PMID: 22703334 DOI: 10.1021/es3000454] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

Abstract

Rapid development of high-throughput toxicogenomics technologies has created new approaches to screen environmental samples for mechanistic toxicity assessment. However, challenges remain in the analysis, especially clustering of the resulting high-dimensional data. Because of the lack of commonly accepted validation methods, it is difficult to compare clustering results between studies or to identify the key experimental or data features that impact the clustering results. We applied consensus clustering (CC), an approach that clusters the input data repeatedly through iterative resampling, and identifies frequently occurring high-confidence clusters. We used CC to analyze a set of high dimensional transcriptomics data with temporal resolution, which were generated using our E. coli whole-cell array system for a diverse variety of toxicants at different dose concentrations. The CC analysis allowed us to evaluate the clustering results' robustness and sensitivity against a number of conditions that represent the common variations in high-throughput experiments, including noisy data, subsets of treatments, subsets of reporter genes, and subsets of time points. We demonstrated the value of utilizing rich time-series data and underscored the importance of careful selection of sampling times for a given experimental system. The results also indicated that temporal data compression using our proposed Transcriptional Effect Level Index (TELI) concept followed by CC largely conserved the cluster resolution. We also found that for our cellular stress response ensemble-based high-throughput transcriptomics assay platform, the size and composition of the reporter gene set are critical factors that affect the resulting coherency of clusters. Taken together, these results demonstrated that more robust consensus clustering such as CC may be valuable in analyzing high-dimensional toxicogenomic data sets.

Collapse