Use graphs to explore data and assess relationships between the variables. Also, use graphs to summarize data and to help interpret statistical results.

Use a boxplot to provide a static picture of the location and spread of the Y variable (the process output) by showing the minimum and maximum values, first quartile (25% of points are less than this value), third quartile (75% of points are less than this value), median (or mean), and potential outliers. If you also include a categorical X variable, you can look at the location and spread of the Y at each level (for example, factor setting) of the X variable.

Answers the questions:

- What is the general location of the Y data?
- How wide is the spread of the Y data?
- Does the sample contain any unusual data points (outliers)?
- Does changing the level of an input variable (X) affect the location or the spread of the output Y?

When to Use | Purpose |
---|---|

Mid-project | The first rule in data analysis is to always plot your data before running any statistical tests. The boxplot is a logical choice for comparison tests where you are looking at what happens to the process output under various conditions, such as changes to a process input. |

Mid-project | Assess if an input (X) has an impact on the process mean or process variation and help eliminate noncritical X's from consideration. |

Mid-project | Identify levels (settings) of the process input that have the desired impact on the output mean or variation. |

Mid-project | Communicate the effects of process inputs on the process output to project stakeholders. |

Your data must be a numeric value for Y, with an optional discrete value for X (categories for comparison).

- The boxplot is very prone to misinterpretation when the sample size is small. When the sample size is less than 20, use a dotplot or individual value plot.
- The boxplot provides a good visual comparison even when the number of levels of an X variable are high. If the number of levels of an X variable is greater than five, the boxplot provides a better visual comparison than the dotplot (assuming more than 20 points per category).

- Choose from one of two common data layouts that you can use with boxplots:
- Choose a boxplot with groups (stacked data) when you enter one column for the Y variable and one for the X (categorical) variable (optional). Note: You can have up to four categorical variables. Minitab draws a separate box for each combination of levels of the categorical variables; however, the boxes all appear in the same graph window. This display is handy for making comparisons across levels of X variables.
- Choose a boxplot with multiple Ys (unstacked data) when you enter the Y data into a separate column for each level of the X variable. Minitab draws a separate box for each Y. The boxes can be plotted either in separate graph windows or in the same graph window with a common scale.

For more information, go to Insert an analysis capture tool.

Use a contour plot to provide a topographical view of the predicted process output (usually modeled through a DOE) versus two of the process inputs.

Answers the questions:

- If two process inputs (factors) change simultaneously, what is the impact on the process output?
- How robust, or stable, is the optimum solution?
- What is the predicted value of the process output for a particular combination of settings of the two process inputs?
- What settings of the key inputs will result in the optimal process output?

When to Use | Purpose |
---|---|

Mid-project | Helps to visualize the effects of two process inputs (factors) on the process output. The contours on the graph represent values of the predicted process output at various settings of the two factors on the plot. |

Mid-project | Helps to assess the region around an optimal solution. If the region around the optimum is relatively flat, the optimum is robust to variation in the two factors. If the region is not relatively flat, any deviation of the one or both of the two factors could have serious consequences on the process output. |

Mid-project | Used as a graphical aid when using regression, ANOVA, or DOE. |

Your data must be a continuous value for Y and two continuous Xs.

- Without significant data points between the high and low factor settings, the map may be seriously misleading because it will generate contours within the inference space even if no interior data points are provided.
- The Contour Plot is often a key tool used to identify optimum process conditions when evaluating a quadratic model derived from a DOE. Quadratic or higher-order models require interior points, thus the surface plots from these models are quite accurate.
- A contour plot is a useful tool to evaluate robustness of a solution for these higher-order models, provided the experiment has few unusual observations. To model the surface, the DOE model uses a quadratic equation, which tends to smooth the data abnormalities. If the data have a large number of unusual observations, the smoothed surface may not accurately depict the Y data. Always check the model for a high r-squared value to be sure that it explains most of the variation in the Y data.
- If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

- Enter Y data in one column.
- Enter factor levels into additional columns, one for each factor.
- You can produce a contour plot in two ways:
- Choose a contour plot, then specify the output as the z-variable, select one factor to be the x-variable, and a second factor to be the y-variable. If you are plotting data from a DOE, you should use the second method, described below, because it uses the model from the DOE to predict the process output.
- Choose a factorial plot (DOE) or choose a response surface type surface plot (DOE), and then specify the output as the response, select one factor to be the x-variable, and a second factor to be the y-variable. This version of the contour plot uses the DOE model to create the plot; therefore, if there are additional factors in the model, you need to specify the levels at which to hold all other factors. It should be noted that you cannot use this method to create a contour plot if you have a 2K factorial design with center points, or a general full factorial (GFF) design.

- For both methods described above, if you have more than two x-variables, you must specify at what values to "hold" the additional x-variables. You can set additional variables at their minimum values, maximum values, means, or other specified value.

For more information, go to Insert an analysis capture tool.

Use a dotplot to provide a static picture of the location and spread of the Y variable (the process output). Each dot represents the actual location of a data point in the sample; however, if the sample is large, each dot can represent multiple points). If you also include a categorical x-variable, you can look at the location and spread of the Y at each level (for example, factor setting) of the x-variable.

Answers the questions:

- What is the general location of the Y data?
- How wide is the spread of the Y data?
- Are any unusual data points (outliers) present in the sample?
- If you change the level of an input variable (X), does it affect the location or spread of the output Y?

When to Use | Purpose |
---|---|

Mid-project | The first rule in data analysis is to always plot your data before running any statistical tests. The dotplot is a logical choice for any comparison tests in which you are looking at what happens to the process output under various conditions, such as changes to a process input. |

Mid-project | Assess if an input (X) has an impact on the process mean or process variation and help eliminate noncritical X's from consideration. |

Mid-project | Identify levels (settings) of the process input that have the desired impact on the output mean or variation. |

Mid-project | Communicate the effects of process inputs on the process output to project stakeholders. |

Your data must be numeric Y, with optional discrete values for X (categories for comparison).

- You can use the dotplot with up to three nested x-variables. For example, you could plot sales by store, day of the week, and time of day.
- The dotplot does not provide statistical evaluation of possible outliers. In the case for which you have sufficient data (a sample size greater than 20), use the boxplot.
- The dotplot does not provide a good visual comparison when the number of levels of an x-variable are high. If the number of levels of an x-variable is greater than five, the boxplot provides a better visual comparison than the dotplot (assuming more than 20 points per category).

There are two common data layouts you can use with dotplots:

- Choose a dotplot with groups (stacked data) when you enter one column for the y-variable and one for the x-, or categorical, variable (optional). Note: You can have up to four categorical variables. Minitab draws a separate dotplot for each combination of levels of the categorical variables, but the plots all appear in the same graph window. This arrangement is handy for making comparisons across levels of x-variables.
- Choose a dotplot with multiple Y's (unstacked data) when you enter the Y data into a separate column for each level of the x-variable. Minitab displays a dotplot for each Y. The plots can either be plotted in separate graph windows or in the same graph window with a common scale.

For more information, go to Insert an analysis capture tool.

Use a histogram to provide a static view of the data collected from a process. A histogram displays the basic distribution of the data, including where the central location is, the amount of variation, and whether it is skewed, normal, or symmetric. You can add normal curves (or fit 13 other distributions) to verify whether data are reasonably normal.

Answers the questions:

- What is the general shape and location of a sample of Y data?
- Does the sample contain any unusual data points (outliers)?
- Are the data reasonably normal?

When to Use | Purpose |
---|---|

Mid-project | The first rule in data analysis is to always look at a graph of the data before conducting any kind of statistical test. The histogram is a logical choice for any tests in which you are comparing a process output to a standard. Histograms also help you determine whether the data are reasonably normal, a common assumption in many statistical tests. |

Mid-project | Histograms are good tools for communicating the distribution of a process output at various points in a project to the project stakeholders. |

Your data must be numeric values for Y or X (continuous or discrete).

- Histograms for small quantities of data can be somewhat misleading. The number of bars (bins) in the histogram is a function of the sample size. Small samples do not contain many bars, and their bars increase in width. Data points close to the edge of a bar are sometimes combined into one bar. A larger sample would have more bars.
- You should not make assessments of normality with histograms when you have small samples (n < 50) because the potential exists for combining as described above. If you have small samples, you probably should use a probability plot.

- Enter data for each Y-variable in separate columns to generate a separate histogram for each column of Y data. The data you use to create the histogram can be either continuous or discrete (numeric).
- You can also have up to two categorical, or By variables. If you have categorical variables, Minitab creates a histogram of the Y data for each combination of values of the categorical variables. You can place these graphs in panels in the same window or in separate windows.

For more information, go to Insert an analysis capture tool.

Use an individual value plot to provide a static picture of the location and spread of the Y-variable (the process output). Each dot represents the actual location of a data point in the sample (the points are offset symmetrically on the plot so you can see multiple points in the same location). If you also include a categorical X-variable, you can look at the location and spread of the Y at each level, or factor setting, of the X-variable.

Answers the questions:

- What is the general location of the Y data?
- How wide is the spread of the Y data?
- Does the sample have any unusual data points (outliers)?
- If you change the level of an input variable (X), is the location or spread of the output Y affected?

When to Use | Purpose |
---|---|

Mid-project | The first rule in data analysis is to always plot your data before running any statistical tests. The individual value plot is a logical choice for any comparison tests in which you are looking at what happens to the process output under various conditions, such as changes to a process input. |

Mid-project | Assess if an input (X) has an impact on the process mean or process variation and help eliminate noncritical X's from consideration. |

Mid-project | Identify levels (settings) of the process input that have the desired impact on the output mean or variation. |

Mid-project | Communicate the effects of process inputs on the process output to project stakeholders. |

Your data must be a numeric Y variable, with an optional discrete X variable (categories for comparison).

- You can use the individual value plot with up to three nested X-variables. For example, you can plot sales by store, day of the week, and time of day.
- The individual value plot does not provide statistical evaluation of possible outliers. When you have sufficient data (a sample size greater than 20), use the boxplot.
- The individual value plot does not provide a good visual comparison when the number of levels of an X-variable are high. If the number of levels of an X-variable is greater than 10, the boxplot provides a better visual comparison than the individual value plot (assuming greater than 20 points per category).

You can display individual value plots in one of three common layouts.

- For one sample of data, choose an individual value plot with one Y (simple).
- For stacked data (one column for the Y-variable and one for the X-variable), choose an individual value plot with one Y (groups).
- For unstacked data (separate columns for each value of the X-variable), choose an individual value plot with multiple Ys (simple).

For more information, go to Insert an analysis capture tool.

Use an interactions plot to graphically display the average value of the output for multiple levels of 2-process inputs. The interactions plot displays the magnitude and direction of change in the output as you simultaneously change the levels of the 2-process inputs. You can also use it to plot standard deviations in a DOE to study the effects of 2-process inputs on process variation.

Answers the questions:

- If I change two process inputs (factors) at the same time, is the effect on the process mean the same as it would be if I only changed one of the inputs?
- If I change two process inputs (factors) at the same time, is the effect on the process variation the same as it would be if I only changed one of the inputs (only when plotting standard deviations in a DOE)?
- What combination of settings of two key inputs results in the optimal process output?

When to Use | Purpose |
---|---|

Mid-project | Fixing two inputs at two or more different settings (levels) helps to determine which combinations of inputs have significant influence on the mean of the output. |

Mid-project | Fixing two inputs at two or more different settings (levels) and recording the standard deviation of the output at each setting helps to determine which inputs have significant influence on the process variation. |

Mid-project | Verify changes to inputs result in significant differences from the pre-project mean. |

Mid-project | Used as a graphical aid when using ANOVA or with a DOE. |

Mid-project | Good tool for communicating the effects of process inputs on the process output to project stakeholders. |

Your data must be values for continuous Y and two Xs (numeric or categorical factors). If factors are numeric, they must be controlled at specific levels.

- Any significant interaction takes precedence over the main effects of the two factors involved in the interaction. For example, if you have two factors (A and B) and the AB interaction is significant, you should evaluate the A and B settings using the interactions plot and not the main effects plot.
- The interactions plot is often a key tool in identifying optimum process conditions when the results of a DOE show statistically significant interactions.
- If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

You can use an interaction plot with experimental data with or without designed experiments (DOEs):

- With DOE, the Y data and factor data should already be in the worksheet. In this case, use factorial plots (DOE) to generate the plot.
- Without DOE, use an interactions plot (ANOVA) to generate the plot.

Enter the data as follows:

- Verify the measurement systems for the Y data and the inputs (factors) are adequate.
- Develop a data collection strategy (who should collect the data, as well as where and when; the preciseness of the data; how to record the data, and so on).
- Enter Y data in one column.
- Enter the factor levels into additional columns, one for each factor. If you have additional columns for the levels of additional factors (X's), Minitab creates and tiles the multiple plots.

For more information, go to Insert an analysis capture tool.

Use a main effects plot to graphically display the average value of the output for multiple levels of a given single input. A main effects plot displays the magnitude and direction of change in the output as you change the value of the input. You can also use it to plot standard deviations in a DOE to study the effects of an input on process variation.

Answers the questions:

- If I change an input from one level to another, does the mean of the process stay the same or does it change?
- If I change an input from one level to another, does the variation of the process stay the same or does it change (only when plotting standard deviations in DOE)?
- What setting of the process input results in the optimal process output?

When to Use | Purpose |
---|---|

Mid-project | Fixing an input at two or more different settings (levels) helps to determine which inputs have significant influence on the mean of the output. |

Mid-project | Verify changes to inputs result in significant differences from the pre-project mean. |

Mid-project | Fixing an input at two or more different settings (levels) and recording the standard deviation of the output at each setting helps to determine which inputs have significant influence on the process variation. |

Mid-project | Used as a graphical aid when using ANOVA or with a DOE. |

Mid-project | Good tool for communicating the effects of process inputs on the process output to project stakeholders. |

Your data must be values for continuous Y (output) and usually a single X (input) at 2 or more levels.

- Any significant interaction takes precedence over the main effects of the two factors involved in the interaction. For example, if you have two factors (A and B) and the AB interaction is significant, you should evaluate the A and B settings using the interactions plot and not the main effects plot.
- If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

You can use a main effects plot with experimental data with or without designed experiments (DOEs):

- With DOE, the Y data and factor data should already be in the worksheet. In this case, use factorial plots (DOE) to generate the plot.
- Without DOE, use a main effects plot (ANOVA) to generate the plot.

You should enter the data as follows:

- Verify the measurement systems for the Y data and the inputs (factors) are adequate.
- Develop a data collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
- Enter Y data in one column.
- Enter factor levels into additional columns, one for each factor.
- If you have additional columns for the levels of additional factors (X's), Minitab creates and tiles the multiple main effects plots.

For more information, go to Insert an analysis capture tool.

Use a matrix plot to quickly and easily allow the graphical evaluation of relationships between all pairs of variables in a larger group of variables. Each pairwise combination appears in a separate panel.

Answers the question:

- Do any relationships exist between any pairs of variables in a large set?

When to Use | Purpose |
---|---|

Mid-project | Assess whether an input (X) has a strong linear relationship with an output (Y) to help eliminate non-critical X's from consideration. |

Mid-project | Evaluate two inputs to identify whether they duplicate information. For example, inputs of Degree Obtained and Years of School are likely to explain the same variation of the output, so one of them may be eliminated. This evaluation is used primarily in multiple regression with many variables. |

Your data must be two or more numeric variables (Xs, Ys, or any combination of these).

- The matrix plot is a collection of scatterplots. If you try to plot too many variables at once, the size of the graphics make the interpretation difficult. In this case, you can use individual scatterplots.
- The matrix plot does not include any measures of correlation. Therefore, you should use the matrix plot to look at a larger group of variables all at once, and identify the pairs that appear to have stronger relationships. Then, investigate these pairs one at a time using the fitted line plot or the correlation tool.

- Collect your numeric data and enter them in Minitab, one column per variable.

For more information, go to Insert an analysis capture tool.

Use a multi-vari chart to provide a method for visualizing the effects that one or more inputs (factors) have on the mean and variation of a process and on making subjective decisions about the process.

Answers the questions:

- If I systematically change the level (setting) of one or more inputs, what happens to the mean of the process?
- If I systematically change the level (setting) of one or more inputs, what happens to the variation of the process?

When to Use | Purpose |
---|---|

Mid-project | Helps to assess which inputs exert influence on either the mean or the variation of the process output. |

Mid-project | Good tool for communicating the effects of process inputs on the process output to project stakeholders. |

Your data must be a continuous value for Y, with one to four X variables called factors (numeric or categorical). If factors are numeric, they must be controlled at specific levels.

- For one factor, the multi-vari chart is the same as a main effects plot.
- For two factors, the multi-vari chart is basically the same as an interaction plot, although it displays the data in a slightly different manner.
- For three factors, the multi-vari chart is the best tool available for graphically exploring high-order interactions.
- While the multi-vari chart will handle four factors, it not recommended because it becomes very difficult to interpret.
- While the multi-vari chart does not show statistical significance of the effects, it is still a valuable tool for visually spotting differences that occur when you change one or more input variables.
- You should run the same data through one of the other tools for analyzing data from designed experiments to ensure what you see is really worth investigating (in other words, it is statistically significant).
- You typically use a multi-vari chart with data from a designed experiment with no restriction on the number of levels for each factor. Do not use it when you have collected response data and recorded the values of one or more uncontrolled inputs; instead, use regression for these cases.

- Verify the measurement systems for the Y data and the input X (or inputs) are adequate.
- Develop a data collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
- Enter the Y data in a single column.
- Enter factor levels into additional columns, one for each factor.
- All combinations of factor levels must have at least one data point.

For more information, go to Insert an analysis capture tool.

Use a scatterplot to provide a graphical means for assessing and communicating the relationship between two (or possibly three) variables.

Answers the questions:

- What is the nature of the relationship between two variables (usually a process output Y and a process input X; could also be two process inputs)?
- Is the relationship between the process output Y and a process input X the same for different levels (settings) of a second process input?

When to Use | Purpose |
---|---|

Start of project | Assists in developing alternatives measurement systems in cases where a variable is difficult or expensive to measure - you can use highly correlated and logically linked alternative variables as substitute variables. |

Mid-project | The first rule of data analysis is to graph the data before running any statistical tests. Use scatterplots along with any statistical tool that tests for relationships between variables, such as regression. |

Mid-project | Assess if an input (X) has a strong relationship with an output (Y) to help eliminate noncritical X's from consideration. |

Mid-project | Evaluate two inputs to eliminate inputs that duplicate the same information (for example, inputs of Degree Obtained and Years of School are likely to explain the same variation of the output). This case is common in multiple regression with many variables. |

End of project | If used earlier as part of the validation of the measurement system, it should be reapplied to the improved process to again validate the measurement system. |

Your data must be two numeric variables (both can be continuous or discrete), with optional categorical variables.

- You can use categorical (grouping) variables with scatterplots to show the effects of different levels of a factor. For example, if you are plotting yield (Y) versus temperature (X), you could use different catalysts as a group variable (factor) and see whether the correlation between yield and temperature is the same or different for the different levels of catalyst.
- Minitab usually allows up to three categorical (grouping) variables for most plot characteristics.

- Enter each variable into a single column.
- Place optional categorical variables in additional columns. You can use these variables to change visual aspects of the plot (for example, symbol types or colors) based on the value of the categorical variable.

For more information, go to Insert an analysis capture tool.

Use a surface plot to provide a three-dimensional view of the predicted process output (usually modeled through a DOE) versus two of the process inputs.

Answers the questions:

- If I change two process inputs (factors) simultaneously, what is the impact on the process output?
- How robust, or stable, is the optimum solution?
- What settings of the key inputs result in the optimal process output?

When to Use | Purpose |
---|---|

Mid-project | The three-dimensional surface plot helps to visualize the effects of two process inputs (factors) on the process output. The height of the surface is the predicted process output at various settings of the two factors included in the plot. |

Mid-project | Surface plots help you locate an optimal solution and assess the region around the optimal solution. If the region around the optimum is relatively flat, the optimum is robust to variation in the two factors. If the region is not relatively flat, any deviation of the one or more of the two factors could have serious consequences on the process output. |

Mid-project | Used as a graphical aid in regression, ANOVA, or DOE. |

Your data must be a continuous Y variable and two continuous X variables.

- Without significant data points between the high and low settings of the factors, the map may be seriously misleading because no interior data points exist to provide a basis for estimating the shape of the interior.
- The surface plot is often a key tool used to identify optimum process conditions when evaluating a quadratic model derived from a DOE. Quadratic or higher-order models require interior points, thus the surface plots from these models are quite accurate.
- A surface plot is a useful tool to evaluate robustness of an optimum solution for these higher-order models generated from a DOE, provided the experiment does not have many unusual observations. The DOE model uses a quadratic equation to model the surface. The quadratic equation tends to smooth out the abnormalities in the data. If the experiment has a large number of unusual observations, the smoothed-out surface may not accurately depict the Y data. Always check the model for a high r-squared value to be sure that it explains most of the variation in the Y data.

- Enter Y data in one column.
- Enter factor levels into additional columns, one for each factor.
- There are two ways to produce a surface plot:
- In Minitab, choose to create a 3D surface plot, specify the output as the Z-variable, select one factor to be the X-variable, and select a second factor to be the Y-variable. If you are plotting data from a DOE, you should use the second method, below, because it uses the model from the DOE to predict the process output.
- Choose to create a factorial type or response surface type of surface plot, specify the output as the response, select one factor to be the X-variable, and select a second factor to be the Y-variable. This version of the surface plot uses the DOE model to create the plot; therefore, if your model has additional factors, you must specify the levels at which to hold all other factors. Note that you cannot use this method to create a surface plot if you have a 2K factorial design with center points or a general full factorial (GFF) design.

- For both methods described above, if you have more than two X-variables, you must specify at what values to hold the additional X-variables. You can set additional variables at their minimum values, maximum values, or means, or you can specify a value.

For more information, go to Insert an analysis capture tool.

Use a time series plot to provide a graphical way to assess and communicate the dynamic (time-based) behavior of a process input (or process output) and to evaluate the dynamic effects on the output as process inputs change.

Answers the questions:

- What is the dynamic behavior of a process variable (input or output)?
- Does the mean of the process output change at different levels of a process input?
- Does the variation of the process output change at different levels of a process input?
- Do the dynamic patterns of the process output change at different levels of a process input?

When to Use | Purpose |
---|---|

Start of project | Assist in project selection by identifying process outputs that exhibit shifts, changes in variation, or changes in time-based patterns in response to changes in process inputs. |

Mid-project | Graph the data before running any statistical tests. This is the first rule of data analysis. Time series plots are often used to investigate the effects of making controlled changes to a process input, with the data collected at specified time intervals. |

Mid-project | Assess whether an input (X) has a strong relationship with an output (Y) to help eliminate noncritical X's from consideration. |

End of project | Graphically compare the dynamic behavior of the pre-project process with the dynamic behavior of the post-improvement process. |

Your data must be one numeric variable (continuous or discrete) with optional categorical variables.

- The data for a time series plot must be collected at equally spaced intervals in time. If the time intervals are not equally spaced, the patterns you observe in the plot can be very misleading.
- You can use categorical (group) variables with time series plots to show the effects of different levels of a factor. For example, if you want to examine hourly yield per FTE of a forms processing operation (Y) to detect differences between shifts, you can use the shift as a group variable (factor) and evaluate changes in the mean, variation, or within-shift patterns between the three shifts.
- Minitab allows up to three categorical (group) variables.
- If you have one factor (categorical variable) and the sample size within each level of the factor is at least 20 observations, you may also use an I chart (or I-MR chart) to display the dynamic behavior of the process output. The I chart (or I-MR chart) includes a center line and control limits for each level, allowing you to directly compare means and variation between levels much more easily than in a time series plot.

- Verify the measurement system for the Y data is adequate.
- Establish a data collection strategy to determine the best time interval for collecting data.
- Enter data for each Y-variable into a single column.
- Place optional categorical variables in additional columns. These categorical variables identify process segments with specific input conditions so you can make comparisons across segments. The resulting graph sequentially plots the data for each level (or combination of levels) of the X-variable.

For more information, go to Insert an analysis capture tool.