Data considerations for Discriminant Analysis

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The predictor variables should be quantitative
You must have one or more numeric columns containing measurement data for each predictor. Minitab uses the data to define the relationship between the predictor and the response. If you have a categorical predictor, you cannot use this analysis. Use logistic regression instead.
The predictors should not be highly correlated
Correlation among the predictors is called multicollinearity. If multicollinearity is severe, or if one or more of the predictors is essentially constant, Minitab cannot perform the discriminant analysis and displays a message.
The response variable should indicate the group
You should have a single grouping column that contains identifiers for up to 20 groups. The group identifiers may be numeric, text, or date/time.
The data for the predictor variables should be normal for each group
Multivariate normality is a formal assumption for discriminant analysis. The linear discriminant function is reasonably robust to departures from normality, but the quadratic discriminant function is more sensitive to the normality assumption. Consider using logistic regression if your predictors are not normal. Logistic regression provides more accurate results in these cases.
Enter prior probabilities for the analysis, when possible
Sometimes you know the probability of an observation belonging to a group before you perform a discriminant analysis. For example, if you are classifying the buyers of a particular car, you may already know that 60% of purchasers are male and 40% are female. If you know or can estimate prior probabilities, specify them for the analysis to increase the accuracy of your results.