In Variables, enter the columns that contain the measurement data.
You must have two or more numeric columns, with each column representing a different measurement. You must delete rows with missing data from the worksheet before using this procedure. When you have a large data set with many missing values, it may more convenient to subset your worksheet to exclude the rows with missing values, rather than delete each row manually. For more information, go to Overview for Subset Worksheet.
C1 | C2 | C3 | C4 | C5 |
---|---|---|---|---|
Clients | Rate of Return | Sales | Years | Initial |
150 | 13.5 | 50400200 | 18 | 1 |
98 | 11.7 | 45665230 | 12 | 2 |
79 | 12.0 | 19800800 | 7 | 0 |
122 | 11.4 | 42560000 | 13 | 0 |
143 | 12.4 | 47635980 | 15 | 0 |
49 | 9.8 | 22342600 | 6 | 3 |
Indicate the starting cluster designations. K-means procedures work best when you provide good starting points for clusters. Base the initial clustering on practical and/or engineering knowledge about the observations being clustered. For more information, go to How the cluster K-means process starts.
Select Standardize variables to have Minitab weight all the variables equally. Standardizing is good practice in most cases, and is particularly important when the variables use different scales. Suppose variable A is on a scale in dollars from $0 to $10,000,000, and variable B is a ratio on a scale from 0.0 to 1.0. If the variables are not standardized, then the cluster procedure places much more weight on variable A than on variable B due to the larger values of its scale, which is probably not the desired result. Therefore, the variables should be standardized.
Minitab standardizes all variables by subtracting the means and dividing by the standard deviation before calculating the distance matrix. When you standardize variables, the grand centroid is 0 for all clusters.