This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.
TreeNet® models are an approach to solving classification and regression problems that are both more accurate and resistant to overfitting than a single classification or regression tree. A broad, general description of the process is that we begin with a small regression tree as an initial model. From that tree come residuals for every row in the data which become the response variable for the next regression tree. We build another small regression tree to predict the residuals from the first tree and compute the resulting residuals again. We repeat this sequence until an optimal number of trees with minimum prediction error is identified using a validation method. The resulting sequence of trees makes the TreeNet® Classification Model.
For the classification case, we can add some more mathematical detail for an analysis with a binary response and for an analysis with a multinomial response.
 ,
	 takes the following values: {-1, 1}.
,
	 takes the following values: {-1, 1}. 
  
Where  is the number of events and
  is the number of events and  is the number of nonevents.
  is the number of nonevents. 
| Input | Symbol | 
|---|---|
| learn rate |  | 
| sampling rate |  | 
| maximum number of terminal nodes per tree |  | 
| number of trees |  | 
 :
:
	 

and  is a vector that represents the 
		ith row of the predictor values in the training data.
		is a vector that represents the 
		ith row of the predictor values in the training data. 
	 

| Term | Description | 
|---|---|
|  | number of events in terminal node m at tree j | 
|  | number of cases in terminal node m at tree j | 
|  | arithmetic mean of  for all cases in terminal node 
		m at tree 
		j | 


where  is the number of cases where the response value is 
  k and 
  N is the number of rows in the training data.
  is the number of cases where the response value is 
  k and 
  N is the number of rows in the training data. 
| Input | Symbol | 
|---|---|
| learn rate |  | 
| sampling rate |  | 
| maximum number of terminal nodes per tree |  | 
| number of trees |  | 
The calculation of the probabilities from the fits accounts for the dependent nature of these trees. Otherwise, the process is substantially the same as for the binary case.
 ,
,
	  ,
	 the number of trees in the analysis, and
,
	 the number of trees in the analysis, and  ,
	 the number of levels of the response variable:
,
	 the number of levels of the response variable: 
where

and  is a vector that represents the 
		ith row of the predictor values in the training data
		set.
		is a vector that represents the 
		ith row of the predictor values in the training data
		set. 
	 

 is the fit for the 
	 ith row at the 
	 j–1 tree for the 
	 kth level of the response variable.
	 is the fit for the 
	 ith row at the 
	 j–1 tree for the 
	 kth level of the response variable. 
  
where
| Term | Description | 
|---|---|
|  | number of cases for outcome k in terminal node m at tree j | 
|  | number of cases in terminal node m at tree j | 
|  | arithmetic mean of  for all cases in terminal node 
	 m at tree 
	 j. | 
