Dreger RM, Fuller J, Lemoine RL. Clustering Seven Data Sets by Means of Some or All of Seven Clustering Methods.
MULTIVARIATE BEHAVIORAL RESEARCH 1988;
23:203-230. [PMID:
26764946 DOI:
10.1207/s15327906mbr2302_5]
[Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Seven data sets, one artificially contrived, the others real data from different areas of psychology and sociology but mainly concerned with children, were subjected to clustering by seven different algorithms. These methods are for the most part rather well known: Holzinger and Harman's (1941) B-coefficient, Overall and Klett's (1972) Linear Typal Analysis, McQuitty and Koch's (1974a, b) elementary linkage analysis, Rohlf and his colleagues' (Rohlf, Kishpaugh, & Kirk, 1971) Numerical Taxonomy System, the Statistical Analysis System (SAS Institute, 1982) hierarchical clustering (Cluster) method, Cattell and Coulter's (1966) Taxonome, and one not very well known, Bolz's (1978) Type Analysis. Insofar as possible, all methods were run on all sets of data on the computer with appropriate adjustments to the respective programs. With the B-coefficient both hand and machine calculations were carried out. Statistical and logical comparisons were made among the different methods used on the data sets. All methods had their strengths and weaknesses, some being more adequate with some data sets and others with others. Surprisingly, the B-coefficient, at least with smaller sets compared favorably with other methods, even though it is scarcely known to modern clustering literature.
Collapse