Specify the data for your analysis, select the linkage and distance methods, indicate whether to standardize the variables, specify the final partition, and select the graph options.
In Variables or distance matrix, enter either the columns that contain measurement data or a stored distance matrix that contains the distances between all pairs of observations.
If you enter a stored distance matrix, Minitab cannot calculate statistics for the final partition.
For measurement data, you must have two or more numeric columns, and each column must represent a different measurement. Delete rows that have missing data from the worksheet before you perform this analysis. If you have many rows of data, you may want to subset your worksheet to exclude the rows that have missing values. For more information, go to Overview for Subset Worksheet.
You cannot enter a categorical variable for this analysis. If you have a categorical variable, you must first convert the text values to a numerical scale, or you must perform a separate analysis for each level of the categorical variable. For more information, go to Data considerations for Cluster Observations.
For a stored distance matrix, the entry in row i and column j of distance matrix D is the distance between observations i and j. For information on creating and using stored matrices in Minitab, go to Overview for Matrices.
C1 | C2 | C3 | C4 |
---|---|---|---|
Gender | Height | Weight | Handedness |
2 | 67 | 155 | 1 |
1 | 74 | 193 | 1 |
2 | 68 | 152 | 1 |
1 | 70 | 172 | 0 |
1 | 72 | 169 | 1 |
2 | 66 | 134 | 0 |
From Linkage method, select a method to specify how the distance between two clusters is defined. You might want to try several linkage methods to see which method provides the most useful results for your data.
For Cluster Observations, distance refers to the distance between observations, and linkage refers to the distance between the clusters of observations. For Cluster Variables, distance refers to the distance between variables, and linkage refers to the distance between the clusters of variables.
If you selected Average, Centroid, Median, or Ward as the linkage method, you should usually use one of the squared distance measures.
Select Standardize variables to have Minitab weight all the variables equally. Standardizing is good practice in most cases, and is particularly important when the variables use different scales. Suppose variable A is on a scale in dollars from $0 to $10,000,000, and variable B is a ratio on a scale from 0.0 to 1.0. If the variables are not standardized, then the cluster observations procedure places much more weight on variable A than on variable B due to the larger values of its scale, which is probably not the desired result. Therefore, the variables should be standardized.
When you standardize the variables, Minitab makes all the means equal to 0 and all the variances equal to 1. To make only the variances equal, do not select the standardize option, but instead select either Pearson or Squared Pearson under Distance measure.
For the best results, you should be flexible with the criteria. For example, if you define the final partition using the number of clusters, you should also consider changes in similarity level, as well. A precipitous drop in similarity when adding a specific cluster might prompt you to specify the final partition before this grouping. Conversely, if you define the final partition using the similarity level, you might determine that similarity levels do not change much over a range of clusters, and for the sake of simplicity you may choose to go with the step with the fewest clusters.
If you do not know what value to enter to specify the final partition, first perform the analysis using the default setting (1 cluster in the final partition). Minitab displays the results for all possible numbers of clusters. Use the results to determine a value to enter for the final partition. Then repeat the analysis and specify the final partition that you determined. For more information, go to Determine the final grouping of clusters.
Select to display a tree diagram that shows how clusters were formed at each step in the amalgamation procedure. The dendrogram allows you to view the similarity (or distance) values for the clusters at each step.
To change the default display of the dendrogram, click Customize.