Any regression tree is a collection of splits. Each split provides improvement to the tree. Each split also includes surrogate splits that also provide improvement to the tree. The importance of a variable is given by all of its improvements when the tree uses the variable to split a node or as a surrogate to split a node when another variable has a missing value. The following formula gives the improvement at a single node:
The values of I(t), pLeft, and pRight depend on the criterion for splitting the nodes. For more information, go to Node splitting methods in CART® Regression.
R2 is also known as the coefficient of determination.
Term | Description |
---|---|
yi | i th observed response value |
mean response | |
i th fitted response | |
N | number of records |