The node statistics come from the data for individual nodes. When you use a validation method, the fit for a node is the same whether it is in the test data set or the training data set. The other statistics use the records for the node from the training or test data set.
These statistics appear in the table of the best or worst terminal nodes. In general, rows are in order by the size of the error, either MSE or MAD. When both values are less than 1, values within 1E-12 are ties. When either error value is greater than 1, values within 1E-12*(larger value) are ties. Minitab sorts ties by their weighted counts. If the weighted counts are also ties, then Minitab sorts ties by the node ID.
The fit depends on the criterion for the improvement of a node. When the criterion is least squares, then the fit is the mean:
When the criterion is least absolute deviation, then the fit is the median.
Term | Description |
---|---|
fitted value for the k^{th} node | |
y_{i} | i ^{th} observed response value in the k^{th} node |
mean response for the records in the k^{th} node | |
n_{k} | count of records in the k^{th} node |
n_{k, t} | count of records in the k^{th} node for observations in either the training data set or the test data set |
y_{i, t} | i ^{th} observed response value in the k^{th} node for either the training data set or the test data set |
mean response for the records in the k^{th} node in either the training data set or the test data set |