A team of researchers collects and publishes detailed information about
factors that affect heart disease. Variables include age, sex, cholesterol
levels, maximum heart rate, and more. This example is based on a public data
set that gives detailed information about heart disease. The original data are from
archive.ics.uci.edu.
The researchers want to create a classification tree that identifies important
predictors to indicate whether a patient has heart disease.
In
Response
event,
select
Yes
to indicate that heart disease has been identified in the patient.
In
Continuous predictors,
enter
Age,
Rest Blood Pressure,
Cholesterol,
Max Heart Rate,
and
Old Peak.
In
Categorical predictors,
enter
Sex,
Chest Pain Type,
Fasting Blood Sugar,
Rest ECG,
Exercise Angina,
Slope,
Major Vessels,
and
Thal.
Click
OK.
Interpret the results
By default, Minitab displays the smallest tree with a misclassification cost within 1 standard error of the tree that minimizes the
misclassification cost. This tree has 4 terminal nodes.
Before the researchers examine the tree, they look at the plot that shows the misclassification cost from the cross-validation and the number of terminal nodes. In this plot, the pattern where the misclassification cost decreases continues
after the 4-node tree. In a case like this, the analysts choose to explore some
of the other simple trees that have lower misclassification costs.
Select an alternative tree
In the output, click Select Alternative
Tree
In the plot, select the 7-node tree that has the least misclassification cost and the best ROC value.
Click Create
Tree.
Interpret the results
In the tree diagram, items that are blue are for the event level. Items that are red are for the nonevent level. In this output, the event
level is "Yes" and indicates that someone has heart disease. The nonevent level is "No" and indicates that someone does not have heart disease.
At the root node, there are 139 counts of the Yes event and 164 counts of the No event. The root node is split using the variable, THAL. When THAL
= Normal, go to the left node (Node 2). When THAL = Fixed or Reversible, go to the right node (Node 5).
Node 2: There are 167 cases when
THAL was Normal. Of the 167 cases, 38 or 22.8% are Yes and 129 or 77.2% are No.
Node 5: There are 136 cases when
THAL was Fixed or Reversible. Of the 136 cases, 101 or 74.3% are Yes and 35 or 25.7% are No.
The next splitter for both the left child node and the right child node is
Chest Pain Type, where pain is rated as 1, 2, 3, or 4.
Explore other
nodes to see which variables are most interesting. The nodes that are
mostly blue indicate a strong proportion of the event level. The nodes that are
mostly red indicate a strong proportion of the nonevent level.
7 Node CART® Classification: Heart Disease versus Age, Rest Blood Pressure, Cholesterol, Max Heart Rate, Old Peak, Sex, Fasting Blood Sugar, Exercise Angina, Rest ECG, Slope, Thal, Chest Pain Type, Major Vessels