Minitab can use either least squared error or least absolute deviation as the criterion for splitting the nodes. The least squared error method minimizes the sum of the squared errors. The least absolute deviation method minimizes the sum of absolute values of errors.
Minitab initially presents results either for the optimal tree or for the smallest tree that has a criterion value within a number of standard errors of the criterion value of the optimal tree. By default, the results are for the smallest tree with an R2 value within 1 standard error of the maximum R2 value or the smallest tree with an absolute deviation value within 1 standard error of the minimum value, depending on the choice for the Node splitting method.
For many datasets, the criterion initially improves as the number of terminal nodes increases. The criterion then reaches an optimal value and worsens afterwards. If the optimal value is for a tree where adding a node makes little difference in the criterion value, you can consider whether to use a smaller tree that performs almost as well as the optimal tree. Smaller trees are easier to interpret.
Minitab can validate the performance of the tree with a test data set or with k-fold cross-validation. You can also choose not to validate the performance of the tree. When the analysis uses a test data set, this item shows the target proportions for the training and test data sets.
By default, Minitab uses k-fold cross-validation to validate the performance of the tree for data sets with 5,000 cases or less. For data sets with more than 5000 cases, Minitab uses a test data set. When the analysis uses a validation method, the criterion for the selection of the optimal tree is from the validation method. The use of the validation method to select the optimal tree prevents the tree from being overfit to the available data and presents a more realistic description of the tree's performance on new data.
By default, the analysis does not have a missing value penalty and this row is not present. The missing value penalty penalizes a competitor based on the proportion of missing values for each node. Thus, a competitor with many missing values in a node is less likely to have a role of primary splitter.
By default, the analysis does not have a high level category penalty and this row is not present. The high level category penalty penalizes a competitor based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels in a node is less likely to have a role of primary splitter.
Indicates the column that is used to weight the response.
Because of the way that analyses for predictive analytics handle missing data for predictors, the number of rows used is often the same size as the full data set. Some data can be invalid and excluded from the analysis. For example, the analysis excludes rows with missing response values, missing weights, weights of 0, or negative weights.
The number of missing response observations. This also includes missing values or zeros in the weight column.