Graphing Y vs Categorical X

Summary

Provides a graphical means to assess and communicate how changes to process inputs affect the process output, either statically or dynamically. The type of graph you select dictates the type of data display. Some graphs better display the distribution of the data; some better display changes to the mean or variation; some better display the data over time (dynamically) to highlight patterns.

Answers the questions:
  • Does the mean of the process output change at different levels of a process input?
  • Does the variation of the process output change at different levels of a process input?
  • Do the dynamic patterns of the process output change at different levels of a process input?
When to Use Purpose
Start of project Assist in project selection by identifying process outputs that exhibit shifts, changes in variation, or changes in time-based patterns as a result of changes to one or more process inputs.
Mid-project Investigate effects of input variables on the process output over time.
End of project Graphically compare the pre-project process dynamic behavior to the post-improvement dynamic behavior.

Data

One numeric variable (continuous or discrete) with optional categorical variables.

How-To

  1. Verify the measurement system for the Y data is adequate.
  2. Establish a data collection strategy to determine the following:
    • The amount of data to collect at each level of an input variable
    • The best time interval, if data are being collected over time
  3. Enter data for each Y-variable into a single column.
  4. Place categorical input (X) variables in additional columns. These variables identify process segments with specific input conditions so you can compare results across segments. The graph displays the output at each level of the input, using a variety of display methods.

Guidelines

  • Choose from different graph types:
    • Histograms
    • Boxplots
    • Individual value plots
    • Time series plots
    • I charts
  • Each graph type is best suited for displaying a particular type of information. Select the graph based on how you want to view the data, the number of levels of the input variable, and the sizes of the samples at each level of the input variable.
  • Static graphs display data at a single point in time, regardless of whether the data were actually collected at different points in time.
    • Histograms:
      • Are most useful for displaying the distribution of the data, relative changes in central tendency, and relative amounts of variation
      • Are not well-suited for making comparisons across a large number of levels of the input variable
      • Require fairly large samples (typically 50 or more at each level)
      • Do not provide precise comparisons of mean or variation
      • Provide a static view of the data; they cannot view data dynamically
    • Boxplots:
      • Are most useful fordisplaying precise differences in means, relative differences in variation, and identifying outliers
      • Can be used with a large number of levels of the input variable
      • Require moderate size samples (typically 20 or more at each level)
      • Provide a static view of the data; they cannot view data dynamically
    • Individual value plots:
      • Are most useful for displaying precise differences in means, relative differences in variation, and identifying outliers
      • Can be used with a large number of levels of the input variable
      • Have no sample-size requirements; they can be used with small samples
      • Provide a static view of the data; cannot view data dynamically
  • Dynamic graphs display data across time (time is the X-axis) regardless of whether the data were collected at equally spaced time intervals.
    • Time series plots:
      • Are most useful for displaying dynamic behavior of data, differences in means, differences in variation, and identifying outliers
      • Cannot be used with a large number of levels of the input variable
      • Have no sample size requirements; they can be used with small samples
      • Do not provide precise comparisons of mean or variation
      • Provide a dynamic view of the data; they cannot be used to view data statically
    • I charts:
      • Are most useful for displaying dynamic behavior of data, precise differences in means, precise differences in variation, and identifying outliers
      • Cannot be used with a large number of levels of the input variable
      • Require moderate sample sizes of the input variable (typically 20 or more at each level)
      • Provide a precise comparison of means and a precise comparison of variation
      • Provide a dynamic view of the data, cannot be used to view the data statically
  • When using a time series plot or an I chart, you must collect the data at equally spaced intervals in time. If the time intervals are not equally spaced, the patterns you observe in the plot can be very misleading.
  • Categorical grouping variables are used with these graphs to show the effects of different levels of a factor. For example, if you want to examine hourly yield per FTE of a forms-processing operation (Y) to detect differences between shifts, you can view the data dynamically with a time series plot or view the data statically with an individual value plot (hourly data per shift would be small samples, eight observations per level).
  • Minitab allows up to three categorical grouping variables for the static graphs and the time series plot, but only one categorical variable for the I chart.
  • Pay attention to the sample-size requirements for each type of graph. Violations of these requirements may lead to graphical distortions and misleading conclusions.
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy