Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Abrams ZB, Coombes CE, Li S, Coombes KR. Mercator: A Pipeline For Multi-Method, Unsupervised Visualization And Distance Generation. Bioinformatics 2021;37:2780-2781. [PMID: 33515233 PMCID: PMC8428582 DOI: 10.1093/bioinformatics/btab037] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 01/12/2021] [Accepted: 01/22/2021] [Indexed: 11/13/2022] Open

For:	Abrams ZB, Coombes CE, Li S, Coombes KR. Mercator: A Pipeline For Multi-Method, Unsupervised Visualization And Distance Generation. Bioinformatics 2021;37:2780-2781. [PMID: 33515233 PMCID: PMC8428582 DOI: 10.1093/bioinformatics/btab037] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 01/12/2021] [Accepted: 01/22/2021] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Abrams ZB, Tally DG, Abruzzo LV, Coombes KR. RCytoGPS: An R Package for Reading and Visualizing Cytogenetics Data. Bioinformatics 2021;37:4589-4590. [PMID: 34601554 DOI: 10.1093/bioinformatics/btab683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 09/20/2021] [Accepted: 09/24/2021] [Indexed: 11/13/2022] Open

Coombes CE, Liu X, Abrams ZB, Coombes KR, Brock G. Simulation-derived best practices for clustering clinical data. J Biomed Inform 2021;118:103788. [PMID: 33862229 DOI: 10.1016/j.jbi.2021.103788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 03/23/2021] [Accepted: 04/11/2021] [Indexed: 11/18/2022]

Abstract

INTRODUCTION

Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data.

METHODS

We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit.

RESULTS

HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets.

DISCUSSION

Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.

Collapse