A grouping variable (also called group variable and by variable) is a categorical variable that is used to divide output by a common property. This property is identified by group labels that are stored in one or more categorical variables in the worksheet. Consider the following example:
A quality engineer tests the strength of wire samples for paper clips and records two grouping variables: whether or not the wire is heat treated, and if the wire is made of brass or steel.
In this example, if you consider only the variable 'Treated', you have two groups. Also, if you consider only the variable 'Material', you have two groups. If you consider both variables, you have four groups: treated brass, untreated brass, treated steel, and untreated steel.
The group information that you collect points to important differences in your data, and lets you to employ your graphs and analyses in useful ways. For example, with groups you can:
- Examine the effect of different groups. The quality engineer might create a boxplot of wire strength using the Treated grouping variable to visualize the difference in strength between treated and untreated wire.
- Examine observations from different groups individually. The engineer might want to calculate the mean strength for brass and steel wire separately.
- Exclude or include specific groups from your analysis. The engineer might want to exclude heat treated brass wire from an analysis because of inconsistencies in the sample. They could subset the data to exclude these observation.