Data considerations for Cross Tabulation and Chi-Square

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The sample should be selected randomly

Random samples are used to make generalizations, or inferences, about a population. If your data are not collected randomly, your results may not be valid.

Each observation should be independent from all other observations

Independence of the observations is a critical assumption for the chi-square test of association.

All the data must be categorized into mutually exclusive row and column categories

The chi-square test of association cannot be performed when categories of the variables overlap. Thus, each observation must be categorized into one and only one category.

The expected counts must not be too small
Each sample should be large enough so that there is a reasonable chance of observing outcomes in every category. If the expected counts are too low, the p-value for the test may not be accurate. Minitab indicates, in your results, whether the expected counts are too low.
If the expected count for a category is too low, you may be able to combine that category with adjacent categories to achieve the minimum expected count. You should combine categories only when necessary because you lose information when you combine categories. Or you could use Fisher's exact test, which is accurate for all sample sizes. For more information, go to What is Fisher's exact test?.