Interpret all statistics and graphs for Discriminant Analysis

Find definitions and interpretation guidance for every statistic and graph that is provided with discriminant analysis.

True group

The actual group into which an observation is classified. The true group is determined by the values in the grouping column of the worksheet.

Interpretation

To assess the classification of the observations into each group, compare the groups that the observations were put into with their true groups.

Summary of Classification


True Group
Put into Group123
15950
21533
30257
Total N606060
N correct 595357
Proportion0.9830.8830.950

Column 2 of this Summary of classification table shows that 53 observations from were correctly assigned to Group 2. However, 5 observations from Group 2 were instead put into Group 1, and 2 observations from Group 2 were put into Group 3. Therefore, 7 of the observations from Group 2 were incorrectly classified into other groups.

Summary of Misclassified Observations

ObservationTrue GroupPred GroupGroupSquared
Distance
Probability
4**1213.5240.438
      23.0280.562
      325.5790.000
65**2112.7640.677
      24.2440.323
      329.4190.000
71**2113.3570.592
      24.1010.408
      327.0970.000
78**2112.3270.775
      24.8010.225
      329.6950.000
79**2111.5280.891
      25.7320.109
      332.5240.000
100**2115.0160.878
      28.9620.122
      338.2130.000
107**23139.02260.000
      27.36040.032
      30.52490.968
116**23131.8980.000
      27.9130.285
      36.0700.715
123**32130.1640.000
      25.6620.823
      38.7380.177
124**32126.3280.000
      24.0540.918
      38.8870.082
125**32128.5420.000
      23.0590.521
      33.2300.479

Row 1 of this Summary of Misclassified Observations table shows that observation 4 was predicted to belong to Group 2, but actually belongs to Group 1.

Put into Group

The group into which an observation is predicted to belong to based on the discriminant analysis.

Interpretation

To assess the classification of the observations into each group, compare the groups that the observations were put into with their true groups. For example, row 2 of the following Summary of classification table shows that a total of 1 + 53 + 3 = 57 observations were put into Group 2. Of those 57 observations, 53 observations were correctly assigned to Group 2. However, 1 observation that was put into Group 2 was actually from Group 1, and 3 observations that were put into Group 2 were actually from Group 3. Therefore, 4 of the observations predicted to belong to Group 2 were actually from other groups.

Summary of Classification


True Group
Put into Group123
15950
21533
30257
Total N606060
N correct 595357
Proportion0.9830.8830.950

Total N

The total number of observations in each true group.

N correct

The number of observations correctly placed into each true group. Minitab displays the N correct for each true group and the total N correct tor all the groups.

Interpretation

Use the N correct value to determine how many observations in your data set are predicted to belong to the group that they have been assigned to. For example, for Group 1, suppose the N correct value is 52 and the Total N value is 60. This indicates that 60 values are identified as belonging to Group 1 based on the values in the grouping column of the worksheet. Of those 60 observations, 52 are predicted to belong to Group 1 based on the discriminant function used for the analysis. Therefore, the number of observations that are correctly placed into each true group is 52.

Proportion

The proportion of observations correctly placed in each true group.

Interpretation

Use the proportion of observations correctly placed in each group to evaluate how well your observations are classified. For example, the proportions in the Summary of classification table indicate the following:

  • 98.3% of the observations in group 1 are correctly placed.
  • 88.3% of the observations in group 2 are correctly placed.
  • 95% of the observations in group 3 are correctly placed.

Therefore, classifying observations into group 2 has the most problems.

Summary of Classification


True Group
Put into Group123
15950
21533
30257
Total N606060
N correct 595357
Proportion0.9830.8830.950

N

The number of non-missing values in the data set. N equals the total number of observations in all of the groups.

Proportion Correct

The proportion of correct classifications for all groups. This value equals the number of correctly placed observations (N Correct) divided by the total number of observations (N).

Squared Distance Between Groups

The squared distance from one group center (mean) to another group center (mean). An observation is classified into a group if the squared distance (also called the Mahalanobis distance) of the observation to the group center (mean) is the minimum.

Note

If you use the quadratic function, Minitab displays the Generalized Squared Distance table. For more information on how squared distances are calculated for each function, go to Distance and discriminant functions for Discriminant Analysis.

Interpretation

Although the distance values are not very informative by themselves, you can compare the distances to see how different the groups are. For example, the following results indicate that the greatest distance is between groups 1 and 3 (48.0911). The difference between groups 1 and 2 is 12.9853, and the difference between groups 2 and 3 is 11.3197.

Squared Distance Between Groups

123
10.000012.985348.0911
212.98530.000011.3197
348.091111.31970.0000

Linear Discriminant Function for Groups

The linear discriminant function for groups indicates the linear equation associated with each group. The linear discriminant scores for each group correspond to the regression coefficients in multiple regression analysis.

Interpretation

Use the linear discriminant function for groups to determine how the predictor variables differentiate between the groups. For example, when you have three groups, Minitab estimates a function for discriminating between the following groups:
  • Group 1 and groups 2 and 3
  • Group 2 and groups 1 and 3
  • Group 3 and groups 1 and 2

The groups with the largest linear discriminant function, or regression coefficients, contribute most to the classification of observations. For example, in the following results, group 1 has the largest linear discriminant function (17.4) for test scores, which indicates that test scores for group 1 contribute more than those of group 2 or group 3 to the classification of group membership. Group 3 has the largest linear discriminant function for motivation, which indicates that the motivation scores of group 3 contribute more than those of group 1 or group 2 to the classification of group membership.

Linear Discriminant Function for Groups

123
Constant-9707.5-9269.0-8921.1
Test Score17.417.016.7
Motivation-3.2-3.7-4.3

Pooled mean

The pooled means is the weighted average of the means of each true group. To display the pooled mean, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Interpretation

Use the pooled mean to describe the center of all the observations in the data. For example, in the following results, the overall test score mean for all the groups is 1102.1

Group Means



Means for Group
VariablePooled Mean123
Test Score1102.11127.41100.61078.3
Motivation47.05653.60047.41740.150

Means for Group

The sum of the values in each true group divided by the number of (non-missing) values in each true group. To display the means for groups, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Interpretation

Use group means to describe each true group with a single value that represents the center of the data. For example, in the following results, group 1 has the highest mean test score (1127.4), while group 3 has the lowest mean test score (1078.3). The mean test score for Group 2 is in the middle (1100.6). \

Group Means



Means for Group
VariablePooled Mean123
Test Score1102.11127.41100.61078.3
Motivation47.05653.60047.41740.150

Pooled StDev

The pooled standard deviation is a weighted average of the standard deviations of each true group. To display the pooled standard deviation, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Interpretation

Use the pooled standard deviation to determine how spread out the individual data points are about their true group mean. For example, in the following results, the pooled standard deviation for the test scores for all the groups is 8.109.

Group Standard Deviations



StDev for Group
VariablePooled StDev123
Test Score8.1098.3089.2666.511
Motivation2.9942.4093.2433.251

StDev for Groups

The most common measure of dispersion, or how spread out the data are about the mean. The standard deviation of the groups is the standard deviation of each true group. To display the standard deviations for groups, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Interpretation

Use the standard deviation for the groups to determine how spread out the data are from the mean in each true group. For example, in the following results, the test scores for group 2 have the highest standard deviation (9.266). This indicates that the test scores for Group 2 have the greatest variability of the three groups. Group 3 has the lowest standard deviation (6.511) and the lowest variability of test scores of the three groups.

Group Standard Deviations



StDev for Group
VariablePooled StDev123
Test Score8.1098.3089.2666.511
Motivation2.9942.4093.2433.251

Pooled Covariance Matrix

A weighted matrix of the relationship between all observations in all groups. The pooled covariance matrix is calculated by averaging the individual group covariance matrices element by element.

To display the pooled covariance matrix, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Covariance matrix

A nonstandardized matrix that indicates the relationship between each pair of variables. The covariance is similar to the correlation coefficient, which is the covariance divided by the product of the standard deviations of the variables.

To display the covariance matrix for each group, you must click Options and select Above plus mean, std. dev., and covariance summary when you perform the analysis.

Observation

Observation number for each observation. The observation number corresponds to the row of the classified observation in the Minitab worksheet. Minitab displays the symbols ** after the observation number if the observation was misclassified (that is, if the true group differs from the predicted group).

To see the predicted and true group for every observation in your data set, you must click Options and select Above plus complete classification summary when you perform the analysis.

Pred Group

The predicted group for each observation is the group membership that Minitab assigns to the observation based on the predicted squared distance. To see the predicted and true group for each observation in your data set, you must click Options and select Above plus complete classification summary when you perform the analysis.

Interpretation

Compare the predicted group and the true group for each observation to determine whether the observation was classified correctly. If the predicted group differs from the true group, then the observation was misclassified.

X-val Group

The predicted group using cross-validation (X-val) is the group membership that Minitab assigns to the observation based on the predicted squared distance using cross-validation. To see the predicted group using cross-validation for each observation, you must select Use cross validation on the main dialog box, and then click Options and select Above plus complete classification summary, when you perform the analysis.

Interpretation

Compare the predicted group using cross-validation and the true group for each observation to determine whether the observation was classified correctly. If the predicted group using cross-validation differs from the true group, then the observation was misclassified.

Important

The predicted group using cross-validation omits an observation to create the discrimination rule and then sees how well the rule works for that specific observation. When you don't use cross-validation, you bias the discrimination rule by using that observation to create the rule.

Squared Distance

The predicted squared distance values for each observation from each group. The squared distance value indicates how far away an observation is from each group mean. To see the squared distance for each observation in your data, you must click Options and select Above plus complete classification summary when you perform the analysis.

Note

If you use cross-validation when you perform the analysis, Minitab calculates the predicted squared distance for each observation both with cross-validation (X-val) and without cross-validation (Pred). For more information on how the squared distances are calculated, go to Distance and discriminant functions for Discriminant Analysis.