# DOE analysis

Use DOE analysis to investigate the effects of input variables (factors) on an output variable (response) at the same time.

## 2K factorial DOE

Use a 2K factorial DOE to provide a cost-effective methodology for conducting controlled experiments (DOEs) where all of the factors (process inputs) are held at one of two levels (settings) during each run of the experiment (plus optional center points).

A 2K factorial DOE has the following types:
• 2K full-factorial DOE: The experiment uses all possible combinations of factor settings with 8 runs for 3 factors, 16 runs for 4 factors, 32 runs for 5 factors, and so on. The goals of this type of experiment are usually focused on developing a full predictive model (Y = f(X)) describing how the process inputs jointly affect the process output and determining the optimal settings of the inputs.
• 2K fractional-factorial DOE: The experiment uses a fraction (one-half, one-fourth, and so on) of all possible combinations of factor settings, with a smaller number of runs than the 2K full-factorial DOE. The goals of this type of experiment can vary, from eliminating factors to developing a full predictive model (Y = f(X)) describing how the process inputs jointly affect the process output and determining the optimal settings of the inputs.
###### Note

When you insert this analysis capture tool into the Roadmap, you can use it to record the data analysis from your experiment. Use the DOE Planning form to help you design the experiment.

• Which process inputs (factors) have the largest effects on the process output (which inputs are the key inputs)?
• Do any important interactions between factors exist?
• Is the current testing space near an optimal condition for the process output?
• If no, what direction do you need to move to get closer to the area where the optimal condition can be found?
• If yes, what settings of the key inputs will result in the optimal process output?
• What is the equation (Y = f(X)) relating the process output to the levels of the factors?
• If I change a factor from its low setting to its high setting, how much will the process output change?
• How much of the variation in the process output can be explained by varying the process inputs?
When to Use Purpose
Mid-project Low resolution (III or IV) 2K fractional-factorial DOEs can be used as an early screening tool to perform a first-pass elimination of noncritical inputs, especially when you have many inputs (for example, more than five) and cost or time is a significant issue.
Mid-project You can use 2K full-factorial DOEs (especially for 3 or 4 factors) and resolution V or higher 2K factorial DOEs (for 5 or more factors) to model 2-way interactions and determine the settings for the key variables that result in the optimal process output.
Mid-project If all factors are numeric and no significant curvature is present, these designs can be used to determine the direction in which to continue experimenting (to locate an area closer to the optimal solution).
Mid-project If all factors are continuous and significant curvature is present, you can expand the 2K full-factorial DOE and resolution V or higher 2K fractional-factorial DOEs to allow the fitting of a quadratic model (3-dimensional modeling using central-composite designs) to find optimal settings.

### Data

Your data must be values for continuous Y and categorical X values or numeric X values tested at two discrete levels.

### Guidelines

• First, you should decide whether you want to run a full-factorial or fractional-factorial DOE.
• If the number of factors is less than 5, run the 2K full-factorial DOEs because they allow for modeling all 2-factor interactions with only 8 (3 factors) or 16 (4 factors) runs.
• If the number of factors is 5 or more, run the resolution V or higher 2K fractional-factorial DOEs because they reduce the number of runs while still allowing you to model all 2-factor interactions.
• Second, develop a sound data-collection strategy to ensure that your conclusions are based on truly representative data.
• Whenever possible, do the runs in the experiment in random order to prevent confusing a factor effect with the effect of an untested factor (sometimes called a lurking variable).
• All 2K factorial DOEs (full and fractional) rely on the assumption that the effects of the factors on the response are reasonably linear (can be modeled adequately with a straight line) in the inference space. You should include center points in your 2K factorial DOE whenever you doubt the linearity of the effects. The center points produce a test for curvature; in other words, they test the assumption of linearity. If the curvature is statistically significant, you must still decide, from a practical standpoint, whether the amount of curvature present is of concern.
• When adding center points to the DOE, the following procedures are often recommended:
• Use the current process factor settings as the center point to give the operators running the experiment a comfort level with familiar factor settings.
• Do not fully randomize the center points in the DOE. Instead, put one or two center points at the start of the experiment, one or two in the middle, and one or two at the end. This placement provides a check for trends during the experiment.
• The residuals of the final model must be independent, reasonably normal, and have reasonably equal variance. The residuals are usually analyzed by a histogram, normal probability plot, and plots of residuals versus fits and residuals versus order. You can create these plots simultaneously using the Four in one option. Note: Due to the small size of many DOEs you may find it difficult to check these assumptions.
• If you must evaluate any factor at more than two levels, you must use the general full-factorial DOE.
• Do not extrapolate beyond your inference space.
• You can expand the 2K full-factorial DOE and the resolution V or higher 2K fractional-factorial DOE easily and use it as the basis for a 3-dimensional DOE using central composite designs.
• Check for possible outliers in the table of unusual observations in Minitab's Session window.
• While this discussion focuses on designed experiments created by Minitab, you can use the "Analyze" portion of the factorial DOE to analyze any numeric experimental data (for example, 2 factors at each of 10 levels). To do this analysis, enter the Y and X data in Minitab and then using Stat > DOE > Factorial > Define Custom Factorial Design to define the factors. You can then analyze this newly defined custom design in the usual manner.
• If you have discrete numeric data from which you can obtain every equally spaced value, and you have measured at least 10 possible values, your data often are evaluated as though they are continuous.

### How-to

1. State your factors (typically less than eight factors) and their levels of interest (only two levels plus an optional center point allowed).
2. If you are using a 2K fractional-factorial DOE, determine your fraction (one-half, one-fourth, and so on) based on your budget and desired resolution.
3. Verify that the measurement systems for the Y data and the inputs (factors) are adequate.
4. Develop a data-collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
5. Run the experiment and reduce to a final model by eliminating terms with high p-values (typically greater than 0.05). Note: Eliminate terms in order with the more complex terms evaluated and eliminated first. For example, eliminate all nonsignificant 3-factor interactions before evaluating 2-factor interactions.
6. Use either the response optimizer or the main effects and interactions plots to determine optimal settings of significant factors.
7. Generate the prediction equation.

## General full factorial DOE

Use a general full factorial DOE to provide a methodology for conducting controlled experiments (DOEs) where the factors (process inputs) can be held at any number of levels (settings). The goals of this type of experiment are usually focused on obtaining a model that definitively selects the vital process inputs, investigating interactions between the vital inputs, and making predictions about the process output.

###### Note

When you insert this analysis capture tool into the Roadmap, you can use it to record the data analysis from your experiment. Use the DOE Planning form to help you design the experiment.

• Which process inputs (factors) have the largest effects on the process output (which inputs are the key inputs)?
• Did important interactions exist between factors?
• How much of the variation in the process output can be explained by varying the process inputs?
• What settings of the key inputs will result in the optimal process output?
When to Use Purpose
Mid-project This type of experiment is the only one that can accommodate categorical factors (process inputs) that must be investigated at more two levels.
Mid-project Models main effects and all possible interactions between factors, which is beneficial for determining the settings for the key inputs resulting in the optimal process output.

### Data

Your data must be values for continuous Y and categorical or numeric X values tested at two or more discrete levels.

### Guidelines

• First, you should develop a sound data collection strategy to ensure that your conclusions are based on truly representative data.
• The residuals of the final model must be independent, reasonably normal, and with reasonably equal variance. The residuals are usually analyzed by a histogram, normal probability plot, and plots of the residuals versus fits and residuals versus order. You can display these graphs at one time using the Four in one option.
• General full factorial (GFF) designs are not recommended for use in screening, or reducing, the number of potentially important inputs. The size of the experiment can be very large, thus the experiment can be very expensive. Also, for screening purposes, GFF designs provide much more information than you need. You should screen out all possible inputs using two levels, and then add inputs needing more than two levels to the screened design.
• If all factors can be evaluated at two levels plus an optional center point, the 2K factorial (fractional or full) DOE is generally preferred as it has the following benefits:
• Provides an efficient means for screening out unimportant factors.
• Provides an easy-to-use prediction equation.
• Provides analysis of saturated designs.
• May be expanded easily to a central composite design for fitting a quadratic model.
• In the event that 3-way interactions are deemed unlikely and unimportant, the 2K Fractional Factorial design with a minimum resolution V becomes the preferred design.
• When using GFF, strive to reduce the number of levels because the number of runs will quickly grow. For example, 3 X 4 X 5 X 6 = 360 runs. If we can eliminate one level from each factor, 2 X 3 X 4 X 5 = 120 runs.
• Check for possible outliers in the unusual observations table in the Session window output.
• While the discussion here focuses on designed experiments created by Minitab, you can use the Analyze portion of the GFF DOE to analyze any numeric experimental data (for example, two factors each at 10 levels). Do this by entering the Y and X data into Minitab and then define the factors. This newly defined custom design can be analyzed in the usual manner.
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

### How-to

1. State your factors (Minitab allows up to 15 factors) and their levels of interest (each factor may have up to 100 levels).
2. Verify the measurement systems for the Y data and the inputs (factors) are adequate.
3. Develop a data collection strategy (who should collect the data, as well as where and when; the preciseness of the data; how to record the data, and so on).
4. Run the experiment and reduce to a final model by eliminating terms with high p-values (typically > 0.05). Note: Terms are eliminated in order with the more complex terms evaluated and eliminated first. For example, eliminate all nonsignificant 3-factor interactions before evaluating 2-factor interactions.
5. Use either the response optimizer or interactions and main effects plots to determine optimal settings of significant factors.

## Mixture DOE

Use a mixture DOE to provide a cost-effective methodology for the evaluation of factors whose sum total volume or quantity cannot change. For example, if you wish to add more fruit filling to an 8-ounce fruit bar, another ingredient must be reduced. Such adjustments are common in packaged food and chemical formulations. The goals of this type of experiment are usually focused on developing a full predictive model (Y = f(X)) describing how the ingredients in the mixture jointly affect the process output and determining the optimal amounts of each ingredient.

• Which ingredients have the largest effects on the process output (which ingredients are the key ingredients)?
• Do important interactions exist between ingredients?
• Which process inputs have the largest effects on the process output (which inputs are the key inputs)?
• Do important interactions exist between process inputs?
• How much variation in the process output can be explained by varying the ingredients and process inputs?
• What is the optimal combination of ingredients?
• What are the optimal settings of the process inputs?
When to Use Purpose
Mid-project If you believe the desired characteristics of the mixture are a function of only the ingredients, use a pure mixture DOE to evaluate which ingredients have the largest influence on the characteristics, build a predictive model using the key ingredients, and find the optimal quantities of the ingredients.
Mid-project If you believe the desired characteristics of the mixture are a function of both the ingredients and the process, use a mixed model DOE (some factors are ingredients, some are process inputs) to evaluate which ingredients and process inputs have the largest influence on the characteristics. Then, build a predictive model using the key ingredients and key process inputs and find the optimal quantities of the ingredients along with the optimal settings of the process inputs.

### Data

Your data must be a continuous value for Y and continuous Xs (for a pure mixture design).

### Guidelines

• First, you should develop a sound data collection strategy to ensure you are basing your conclusions on truly representative data.
• Whenever possible, you should do the runs in the experiment in random order to prevent confusing a factor effect with the effect of an untested factor (sometimes called a lurking variable).
• The residuals of the final model must be reasonably normal and with reasonably equal variance. The residuals are usually analyzed by a histogram, normal probability plot, residuals versus fits, and residuals versus order, which can be run at one time using the Four in one option.
• Minitab allows the use of mixed model designs (mixture-process experiments) in which you use a combination of traditional and mixture DOE approaches. For example, a 1-pound cake recipe has six ingredients as part of its mixture component and has the process variables temperature and time as part of its standard DOE. Note: The process variables (X's) can be discrete (such as fan on or off).
• Minitab also allows a mixture DOE analysis in which the relative proportions of the components as well as the total volume of the mixture are analyzed in the same design (mixture-amounts experiments). For example, use the cake example from above, evaluate the results when you bake 1-pound, 2-pound, and 3-pound cakes.
• Check for possible outliers in the unusual observations table (Session window output).
• Do not extrapolate beyond your inference space.
• While the discussion here focuses on Minitab designed experiments, note that you can use the “analyze” portion of the mixture DOE to analyze any numeric experimental data (for example, two factors each at 10 levels). To do this, enter the Y and X data into Minitab, and then define the factors. You can then analyze this newly defined, custom design in the usual manner.
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

### How-to

1. State your factors (ingredients) and any constraints they may have (for example, fruit bars must have at least 30% peanuts but not more than 50%).
2. Verify the measurement systems for the Y data and the ingredients are adequate.
3. Develop a data collection strategy (who should collect the data, as well as where and when; how many data values are needed; the preciseness of the data; how to record the data, and so on).
4. Run the experiment and reduce to a final model by eliminating interaction terms (such as binary blending terms) with high p-values (typically greater than 0.05). All linear terms must stay in the model, because they are part of the formulation; removing a linear term would mean you remove an ingredient from the mixture.
5. Use either the response optimizer or mixture contour plots to determine optimal settings of the ingredients.
6. Generate the prediction equation.

## Multiple response optimization

Use multiple response optimization to determine the optimal settings in an experiment with a single output, or with multiple competing outputs. It also provides a graphical tool for exploring what-if alternative solutions. A desirability function is created for each process output, with multiple outputs combined into an overall desirability using adjustable weights for each output.

• In a designed experiment investigating the effects of process inputs on a single process output, what settings of the key inputs result in the optimal process output?
• In a designed experiment investigating the effects of process inputs on two or more potentially competing process outputs, what settings of the key inputs provides the best compromise solution for all outputs?
• If I change one or more inputs from the optimal solution, what happens to the individual outputs and how does it affect the compromise?
When to Use Purpose
Mid-project Very useful for determining the settings of key process inputs that result in the optimal value of a single process output.
Mid-project Very useful for determining the settings of key process inputs that result in the best compromise solution for satisfying the goals relative to two or more process outputs.
Mid-project Make adjustments to the initial optimal solution, determine the impact on the outputs and the compromise, and settle on a final optimal solution.

### Data

Your data must be a 2K DOE, response surface DOE, or a mixture DOE solved for one or more outputs.

### Guidelines

• Ensure all output models have been reduced to their final state.
• If you want to improve the mean and reduce variation (a common case of potentially competing outputs) with a 2K full or fractional factorial DOE, you can run the DOE for the mean response and then analyze the DOE for variation.
• If your DOE is a 2K full or fractional factorial, be careful how you move factor settings using the interactive graph produced by the optimizer. The optimizer allows you to select any value of a numeric factor (not just the low or high settings used in the experiment). If you have not tested for curvature (with center points), selecting a value between the low and high settings of a factor could be dangerous, as the model is relying on linearity to calculate a predicted Y at the selected setting. If you added center points to the DOE and the curvature test is negative, then it is okay to select factor settings between the low and high settings used in the experiment.
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

### How-to

1. Design your DOE and collect data on all outputs of interest for each run.
2. Analyze (reduce to final model) the DOE for each output. Minitab remembers the final model run for each output.
3. For each output, select the goal to be achieved: maximize, minimize, or set on target.
• If you are maximizing an output, specify a target (a goal) and a lower bound (the minimum acceptable value).
• If you are minimizing an output, specify a target (a goal) and upper bound (the maximum acceptable value).
• If you are setting an output to a target, specify the target, a lower bound (minimum acceptable value), and an upper bound (maximum acceptable value).
4. For each output, establish weights and importance, which are used to fine tune the algorithm for selecting the best balanced solution. Weights determine how close you must be to the target to obtain the maximum benefit for a particular output. Importance values reflect the relative values of achieving the goals for each of the competing outputs. A high importance value means that it is more important to achieve the goals for a particular output.
5. Adjust settings of the inputs as desired, until you settle on a final solution.

## Response surface DOE

Use a response surface DOE to provide a cost-effective methodology for conducting controlled experiments (DOEs) in cases where there is believed to be curvature and all the factors are continuous and can be tested at (usually) three to five levels. The goals of this type of experiment are usually focused on developing a full predictive model (Y = f(X)) describing how the process inputs jointly affect the process output and determining the optimal settings of the inputs.

• Which process inputs (factors) have the largest effects on the process output (which inputs are the key inputs)?
• Do important interactions exist between factors?
• Do important quadratic effects exist?
• How much of the variation in the process output can be explained by varying the process inputs?
• Is the current testing space near an optimal condition for the process output? If yes, what settings of the key inputs result in the optimal process output?
• What is the equation (Y = f(X)) relating the process output to the levels of the factors?
When to Use Purpose
Mid-project If the number of process inputs to be investigated is small (typically less than seven), you can run these designs by adding new test runs to an existing 2-level full or fractional factorial design when the 2-level factorial design shows evidence of curvature. All factors must be continuous.
Mid-project When all the factors are continuous and show significant curvature, these designs are used because they allow the fitting of quadratic terms to model the curvature, resulting in better interpolation between design points and an improved search for the optimal settings.

### Data

Your data must be a continuous value for Y and continuous Xs tested at three to five discrete levels.

### Guidelines

• First, you should develop a sound data collection strategy so that your conclusions are based on truly representative data.
• Whenever possible, you should do the runs in the experiment in random order to prevent confusing a factor effect with the effect of an untested factor (sometimes called a lurking variable).
• The residuals of the final model must be independent, be reasonably normal, and have reasonably equal variance. The residuals are usually analyzed by a histogram, normal probability plot, residuals versus fits, and residuals versus order, which you can run at one time using the Four in one option. Note: Due to the small size of many DOEs, these assumptions may not be easily checked.
• Minitab supports both central composite designs and Box-Behnken designs. Central composite designs are often preferred over the Box-Behnken designs because they can usually be built on prior 2k full or fractional (resolution V or higher) DOEs.
• Check for possible outliers in the unusual observations table (Session window output).
• Do not extrapolate beyond your inference space.
• While the discussion here focuses on designed experiments created by Minitab, note that you can use the Analyze portion of the response surface DOE to analyze any numeric experimental data (for example, two factors each at 10 levels). To do this, enter the Y and X data into Minitab and define the factors. You can analyze this newly defined custom design in the usual manner.
• If you have discrete numeric data from which you can obtain every equally spaced value and you have measured at least 10 possible values, you can evaluate these data as if they are continuous.

### How-to

1. Generally, you have already performed a series of DOEs and simple point evaluations to arrive at the general area of optimization. This is sometimes called response surface methodology or path of steepest ascent.
2. Once here, you add star points (CCD method) to a 2k full or 2k fractional (Res V+) design.
3. Run the additional points and reduce to a final model by eliminating terms with high p-values (typically greater than 0.05).
4. Use the response optimizer or the surface and contour plots to determine settings of final factors.
5. Generate the prediction equation.