Skip to content Skip to sidebar Skip to footer

How To Display The Path Of A Decision Tree For Test Samples?

I'm using DecisionTreeClassifier from scikit-learn to classify some multiclass data. I found many posts describing how to display the decision tree path, like here, here, and here.

Solution 1:

In order to get the path which is taken for a particular sample in a decision tree you could use decision_path. It returns a sparse matrix with the decision paths for the provided samples.

Those decision paths can then be used to color/label the tree generated via pydot. This requires overwriting the color and the label (which results in a bit of ugly code).

Notes

  • decision_path can take samples from the training set or new values
  • you can go wild with the colors and change the color according to the number of samples or whatever other visualization might be needed

Example

In the example below a visited node is colored in green, all other nodes are white.

enter image description here

import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree

clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()

clf = clf.fit(iris.data, iris.target)

dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=iris.feature_names,
                                class_names=iris.target_names,
                                filled=True, rounded=True,
                                special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)

# empty all nodes, i.e.set color to white and number of samples to zerofor node in graph.get_node_list():
    if node.get_attributes().get('label') isNone:
        continueif'samples = 'in node.get_attributes()['label']:
        labels = node.get_attributes()['label'].split('<br/>')
        for i, label inenumerate(labels):
            if label.startswith('samples = '):
                labels[i] = 'samples = 0'
        node.set('label', '<br/>'.join(labels))
        node.set_fillcolor('white')

samples = iris.data[129:130]
decision_paths = clf.decision_path(samples)

for decision_path in decision_paths:
    for n, node_value inenumerate(decision_path.toarray()[0]):
        if node_value == 0:
            continue
        node = graph.get_node(str(n))[0]            
        node.set_fillcolor('green')
        labels = node.get_attributes()['label'].split('<br/>')
        for i, label inenumerate(labels):
            if label.startswith('samples = '):
                labels[i] = 'samples = {}'.format(int(label.split('=')[1]) + 1)

        node.set('label', '<br/>'.join(labels))

filename = 'tree.png'
graph.write_png(filename)

Post a Comment for "How To Display The Path Of A Decision Tree For Test Samples?"