Model summary table for Fit Model and Discover Key Predictors with TreeNet^® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Find definitions and interpretation guidance for the Model summary table.

Note

Minitab displays results for both the training data and for the validation results. The validation results indicate whether the model can adequately predict the response values for new observations, or properly summarize the relationships between the response and the predictor variables. Use the training results to evaluate overfitting of the model.

Root mean square error (RMSE)
Mean squared error (MSE)
Mean absolute deviation (MAD)
Mean absolute percent error (MAPE)

Total predictors

The number of total predictors available for the TreeNet® model. The total is the sum of the continuous and categorical predictors that you specify.

Important predictors

The number of important predictors in the TreeNet® model. Important predictors have importance scores greater than 0.0. You can use the Relative Variable Importance chart to display the order of relative variable importance. For instance, suppose 10 of 20 predictors are important in the model, the Relative Variable Importance chart displays the variables in importance order.

Number of trees grown

By default, Minitab grows 300 small CART® trees to produce the TreeNet® model. While this value works well for exploration of the data, consider whether to grow more trees to produce a final model. To change the number of trees grown, go to the Options subdialog box.

Optimal number of trees

The optimal number of trees corresponds to the highest R² value or the lowest MAD value.

When the optimal number of trees is close to the maximum number of trees that the model grows, consider an analysis with more trees. Thus, if you grow 300 trees and the optimal number comes back as 298, then re-build the model with more trees. If the optimal number continues to be close to the maximum number, continue to increase the number of trees.

R-squared

R² is the percentage of variation in the response that the model explains. Outliers have a greater effect on R² than on MAD and MAPE.

When you use a validation method, the table includes an R² statistic for the training data set and an R² statistic for the validation method. When the validation method is k-fold cross-validation, the validation uses each fold when the tree building excludes that fold. The R² statistic from the validation results is typically a better measure of how the model works for new data.

Interpretation

Use R² to determine how well the model fits your data. The higher the R² value, the better the model fits your data. R² is always between 0% and 100%.

You can graphically illustrate the meaning of different R² values. The first plot illustrates a simple regression model that explains 85.5% of the variation in the response. The second plot illustrates a model that explains 22.6% of the variation in the response. The more variation that is explained by the model, the closer the data points fall to the fitted values. Theoretically, if a model can explain 100% of the variation, the fitted values would always equal the observed values and all of the data points would fall on the line y = x.

A validation R² that is substantially less than the training R² indicates that the model might not predict the response values for new cases as well as the model fits the current data set.

Root mean square error (RMSE)

The root mean square error (RMSE) measures the accuracy of the model. Outliers have a greater effect on RMSE than on MAD and MAPE.

When you use a validation method, the table includes an RMSE statistic for the training data set and an RMSE statistic for the validation results. When the validation method is k-fold cross-validation, the validation uses each fold when the tree building excludes that fold. The validation RMSE statistic is typically a better measure of how the model works for new data.

Interpretation

Use to compare the fits of different models. Smaller values indicate a better fit. A validation RMSE that is substantially more than the training RMSE indicates that the model might not predict the response values for new cases as well as the model fits the current data set.

Mean squared error (MSE)

The mean square error (MSE) measures the accuracy of the model. Outliers have a greater effect on MSE than on MAD and MAPE.

When you use a validation method, the table includes an MSE statistic for the training data set and an MSE statistic for the validation results. When the validation method is k-fold cross-validation, the validation uses each fold when the model building excludes that fold. The validation MSE statistic is typically a better measure of how the model works for new data.

Interpretation

Use to compare the fits of different models. Smaller values indicate a better fit. A validation MSE that is substantially more than the training MSE indicates that the model might not predict the response values for new cases as well as the model fits the current data set.

Mean absolute deviation (MAD)

The mean absolute deviation (MAD) expresses accuracy in the same units as the data, which helps conceptualize the amount of error. Outliers have less of an effect on MAD than on R², RMSE, and MSE.

When you use a validation method, the table includes an MAD statistic for the training data set and an MAD statistic for the validation results. When the validation method is k-fold cross-validation, the validation uses each fold when the model building excludes that fold. The validation MAD statistic is typically a better measure of how the model works for new data.

Interpretation

Use to compare the fits of different models. Smaller values indicate a better fit. A validation MAD that is substantially more than the training MAD indicates that the model might not predict the response values for new cases as well as the model fits the current data set.

Mean absolute percent error (MAPE)

The mean absolute percent error (MAPE) expresses accuracy as a percentage of the error. Because the MAPE is a percentage, it can be easier to understand than the other accuracy measure statistics. For example, if the MAPE, on average, is 0.05, then the average ratio between the fitted error and the actual value across all cases is 5%. Outliers have less of an effect on MAPE than on R², RMSE, and MSE.

However, sometimes you may see a very large MAPE value even though the model appears to fit the data well. Examine the fitted vs actual response value plot to see if any data values are close to 0. Because MAPE divides the absolute error by the actual data, values close to 0 can greatly inflate the MAPE.

When you use a validation method, the table includes an MAPE statistic for the training data set and an MAPE statistic for the validation results. When the validation method is k-fold cross-validation, the validation uses each fold when the model building excludes that fold. The validation MAPE statistic is typically a better measure of how the model works for new data.

Interpretation

Use to compare the fits of different models. Smaller values indicate a better fit. A validation MAPE that is substantially more than the training MAPE indicates that the model might not predict the response values for new cases as well as the model fits the current data set.

Model summary table for Fit Model and Discover Key Predictors with TreeNet® Regression

Note

Note

In This Topic

Total predictors

Important predictors

Number of trees grown

Optimal number of trees

R-squared

Interpretation

Root mean square error (RMSE)

Interpretation

Mean squared error (MSE)

Interpretation

Mean absolute deviation (MAD)

Interpretation

Mean absolute percent error (MAPE)

Interpretation

Model summary table for Fit Model and Discover Key Predictors with TreeNet^® Regression