Any regression tree is a collection of splits. Each split provides improvement to the tree. Each split also includes surrogate splits that also provide improvement to the tree. The importance of a variable is given by all of its improvements when the tree uses the variable to split a node or as a surrogate to split a node when another variable has a missing value. The following formula gives the improvement at a single node:
The values of I(t), p_{Left}, and p_{Right} depend on the criterion for splitting the nodes. For more information, go to Node splitting methods in CART® Regression.
R^{2} is also known as the coefficient of determination.
Term | Description |
---|---|
y_{i} | i ^{th} observed response value |
mean response | |
i ^{th} fitted response | |
N | number of records |