Interpret the key results for Cluster Observations

Complete the following steps to interpret a cluster observations analysis. Key output includes the similarity and distance values, the dendrogram, and the final partition.

Step 1: Examine the similarity and distance levels

At each step in the amalgamation process, view the clusters that are formed and examine their similarity and distance levels. The higher the similarity level, the more similar the observations are in each cluster. The lower the distance level, the closer the observations are in each cluster.

Ideally, the clusters should have a relatively high similarity level and a relatively low distance level. However, you must balance that goal with having a reasonable and practical number of clusters.

Amalgamation Steps

StepNumber of
clusters
Similarity
level
Distance
level
Clusters
joined
New clusterNumber
of obs.
in new
cluster
11996.60050.162751316132
21895.46420.217151720172
31795.26480.226696962
41692.91780.339051718173
51590.52960.453391115112
61490.31240.463781219122
71388.24310.5628521422
81288.24310.562855852
91185.97440.6714661063
101083.06390.8108071373
11983.06390.810801312
12881.40390.8902721725
13779.81850.9661761165
14678.75341.0171641243
15566.21121.617602527
16462.00361.819041617
17341.04742.8222914110
18240.17182.8642127210
1910.00004.7873912120
Key Results: Similarity level, Distance level

In these results, the data contain a total of 20 observations. In step 1, two clusters (observations 13 and 16 in the worksheet) are joined to form a new cluster. This step creates 19 clusters in the data, with a similarity level of 96.6005 and a distance level of 0.16275. Although the similarity level is high and the distance level is low, the number of clusters is too high to be useful. At each subsequent step, as new clusters are formed, the similarity level decreases and the distance level increases. At the final step, all the observations are joined into a single cluster.

To view the similarity levels in the dendrogram, hold your pointer over a horizontal line in the tree diagram, in Minitab.

Step 2: Determine the final groupings for your data

Use the similarity level for the clusters that are joined at each step to help determine the final groupings for the data. Look for an abrupt change in the similarity level between steps. The step that precedes the abrupt change in similarity may provide a good cut-off point for the final partition. For the final partition, the clusters should have a reasonably high similarity level. You should also use your practical knowledge of the data to determine the final groupings that make the most sense for your application.

For example, the following amalgamation table shows that the similarity level decreases by increments of approximately 3 or less until step 15. The similarity decreases by more than 20 (from 62.0036 to 41.0474) at steps 16 and 17, when the number of clusters changes from 4 to 3. These results indicate that 4 clusters may be sufficient for the final partition. If this grouping makes intuitive sense, then it is probably a good choice.

Amalgamation Steps

StepNumber of
clusters
Similarity
level
Distance
level
Clusters
joined
New clusterNumber
of obs.
in new
cluster
11996.60050.162751316132
21895.46420.217151720172
31795.26480.226696962
41692.91780.339051718173
51590.52960.453391115112
61490.31240.463781219122
71388.24310.5628521422
81288.24310.562855852
91185.97440.6714661063
101083.06390.8108071373
11983.06390.810801312
12881.40390.8902721725
13779.81850.9661761165
14678.75341.0171641243
15566.21121.617602527
16462.00361.819041617
17341.04742.8222914110
18240.17182.8642127210
1910.00004.7873912120
Key Results: Similarity level, Number of clusters

The decision about final grouping is also called cutting the dendrogram. Cutting the dendrogram is similar to drawing a horizontal line across the dendrogram to specify the final grouping. For example, to cut this dendrogram into four clusters, imagine drawing a horizontal line about halfway down the vertical axis, just below the similarity level of approximately 41.

Step 3: Examine the final partition

After you determine the final groupings in step 2, rerun the analysis and specify the number of clusters (or the similarity level) for the final partition. Minitab displays the final partition table, which shows the characteristics of each cluster in the final partition. For example, the average distance from the centroid provides a measure of the variability of the observations within each cluster.

Examine the clusters in the final partition to determine whether the grouping seems logical for your application. If you are still unsure, you can repeat the analysis, and compare dendrograms for different final groupings, to decide which final grouping is the most logical for your data.
Note

For more information on these statistics, go to Final partition.

Final Partition

Number of
observations
Within
cluster sum
of squares
Average
distance
from
centroid
Maximum
distance
from
centroid
Cluster173.257130.6125401.12081
Cluster272.722470.5813900.95186
Cluster330.559770.3989640.54907
Cluster430.371160.3265330.48848

Cluster Centroids

VariableCluster1Cluster2Cluster3Cluster4Grand centroid
Gender0.97468-0.974680.97468-0.97468-0.0000000
Height-1.003521.01283-0.372770.351050.0000000
Weight-0.906720.93927-0.867970.79203-0.0000000
Handedness0.638080.63808-1.48885-1.488850.0000000

Distances Between Cluster Centroids

Cluster1Cluster2Cluster3Cluster4
Cluster10.000003.357592.218823.61171
Cluster23.357590.000003.675572.23236
Cluster32.218823.675570.000002.66074
Cluster43.611712.232362.660740.00000
Key Results: Final partition, dendrogram

This dendrogram was created using a final partition of 4 clusters, which occurs at a similarity level of approximately 40. The first cluster (far left) is composed of seven observations (the observations in rows 1, 3, 6, 9, 10, 11, and 15 of the worksheet). The second cluster, directly to the right, is composed of 3 observations (the observations in rows 4, 12, and 19 of the worksheet). The third cluster is composed of 7 observations (the observations in rows 2, 14, 17, 20, 18, 5, and 8). The fourth cluster, on the far right, is composed of 3 observations (the observations in rows 7, 13, and 16). If you cut the dendrogram higher, then there would be fewer final clusters, but their similarity level would be lower. If you cut the dendrogram lower, then the similarity level would be higher, but there would be more final clusters.