# Ways to use a grouping variable

## Divide output based on a common property

A grouping variable (also called group variable and by variable) is a categorical variable that is used to divide output by a common property. This property is identified by group labels that are stored in one or more categorical variables in the worksheet. Consider the following example:

A quality engineer tests the strength of wire samples for paper clips and records two grouping variables: whether or not the wire is heat treated, and if the wire is made of brass or steel.
C1 C2-T C3-T
Strength Treated Material
7.54 Yes Brass
6.27 Yes Steel
8.11 No Steel
3.28 Yes Brass

In this example, if you consider only the variable 'Treated', you have two groups. Also, if you consider only the variable 'Material', you have two groups. If you consider both variables, you have four groups: treated brass, untreated brass, treated steel, and untreated steel.

The group information that you collect points to important differences in your data, and lets you to employ your graphs and analyses in useful ways. For example, with groups you can:
• Examine the effect of different groups. The quality engineer might create a boxplot of wire strength using the Treated grouping variable to visualize the difference in strength between treated and untreated wire.
• Examine observations from different groups individually. The engineer might want to calculate the mean strength for brass and steel wire separately.
• Exclude or include specific groups from your analysis. The engineer might want to exclude heat treated brass wire from an analysis because of inconsistencies in the sample. They could subset the data to exclude these observation.

## Sort data in a worksheet

You can reorder the rows in a worksheet by the values in the grouping column.

## Merge data in two worksheets

You can align the rows in two worksheets by the values in each worksheet's grouping column. Consider the following worksheets, which share a common grouping variable, Factory.

Merging the columns would put the two sets of columns side by side, with observations from factory A sharing a row with observations from the factory D, and so on like this:

C1-T C2 C3-T C4
Factory: Worksheet 1 Accidents Factory: Worksheet 2 New Hires
A 4 D 0
B 5 C 10
C 0 A 25
D 1 B 35

But by specifying Factory as a By Column for each worksheet, when merged the individual rows of each worksheet are rearranged to agree based on the values in Factory:

C1-T C2 C3 C4
Factory: Worksheet 1 Accidents Factory: Worksheet 2 New Hires
A 4 A 25
B 5 B 35
C 0 C 10
D 1 D 0
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy