Interpret all statistics and graphs for Cluster K-Means

Find definitions and interpretation guidance for every statistic and graph that is provided with the cluster k-means analysis.

In This Topic

Number of observations
Within cluster sum of squares
Average distance from centroid
Maximum distance from centroid
Cluster centroid
Grand centroid
Distances between cluster centroids

Number of observations

The number of observations in each cluster in the final partition.

Interpretation

Examine the number of observations in each cluster when you interpret the measures of variability, such as the average distance and the within-cluster sum of squares. The variability of a cluster may be affected by its having a smaller or larger number of observations. For example, the within-cluster sum of squares becomes larger as more observations are added.

Examine clusters that have significantly fewer observations than other clusters. Clusters that have very few observations may contain outliers or unusual observations with unique characteristics.

Within cluster sum of squares

The sum of the squared deviations from each observation and the cluster centroid.

Interpretation

The within-cluster sum of squares is a measure of the variability of the observations within each cluster. In general, a cluster that has a small sum of squares is more compact than a cluster that has a large sum of squares. Clusters that have higher values exhibit greater variability of the observations within the cluster.

However, similar to sums of squares and mean squares in ANOVA, the within-cluster sum of squares is influenced by the number of observations. As the number of observations increases, the sum of squares becomes larger. Therefore, the within-cluster sum of squares is often not directly comparable across clusters with different numbers of observations. To compare the within-cluster variability of different clusters, use the average distance from centroid instead.

Average distance from centroid

The average of the distances from observations to the centroid of each cluster.

Interpretation

The average distance from observations to the cluster centroid is a measure of the variability of the observations within each cluster. In general, a cluster that has a smaller average distance is more compact than a cluster that has a larger average distance. Clusters that have higher values exhibit greater variability of the observations within the cluster.

Maximum distance from centroid

The maximum of the distances from observations to the centroid of each cluster.

Interpretation

The maximum distance from observations to the cluster centroid is a measure of the variability of the observations within each cluster. A higher maximum value, especially in relation to the average distance, indicates an observation in the cluster that lies farther from the cluster centroid.

Cluster centroid

The middle of a cluster. A centroid is a vector that contains one number for each variable, where each number is the mean of a variable for the observations in that cluster. The centroid can be thought of as the multi-dimensional average of the cluster.

Interpretation

Use the cluster centroid as a general measure of cluster location and to help interpret each cluster. Each centroid can be seen as representing the "average observation" within a cluster across all the variables in the analysis.

Minitab calculates the distances between the centroids of the clusters that are included in the final partition. For each cluster, Minitab also calculates various distance measures between the cluster centroid and the observations within the cluster. For more information, see the topic for each distance measure.

Grand centroid

The grand centroid is a vector of variable means for all observations.

Distances between cluster centroids

The distances between cluster centroids measures how far apart the centroids of the clusters in the final partition are from one another.

Interpretation

Although the distance values are not very informative by themselves, you can compare the distances to see how different the clusters are from each other. A larger distance generally indicates a greater difference between the clusters.