🌲🌳 Decision Tree 🌲🌳

🌲🌳 Decision Tree 🌲🌳

Decision Tree Algorithm has come under supervised Learning, it is used for both

Regression and Classification.

Important Terminology related to Decision tree:

Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.
Splitting: It is a process of dividing a node into two or more sub-nodes.
Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting.
Branch / Sub-Tree: A subsection of the entire tree is called a branch or sub-tree.
Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.

Before starting the decision tree let's understand these three topics:

a) Entropy

b) Information Gain

c) Gini impurity

I will explain all these things in simple words don't worry!

Entropy:- Entropy helps us to calculate the purity of the sub split.

>Entropy controls how a Decision Tree decides to split the data.

>Suppose we have 3 input features f1,f2,f3 out of these three features which feature, we have to select first to start the tree.

>By selecting the best feature will save time, memory, and model performance, we will get leaf node early, etc

> Entropy value ranges from "0-1".

Information Gain:-The information gain is the amount of information gained about a random variable or signal from observing another random variable.

An attribute with highest Information gain will tested/split first.

Gini impurity:-

Gini impurity is the same as Entropy with a small difference.

> Both are used for selecting the best feature for the best split (now the question is which one we have to select)

> Both the working is same the main difference is the range of entropy is 0-1 and Gini impurity is 0-0.5

> In entropy the computation time is more because the range starts from 0 and ends at 1

> In Gini impurity, the computation time is less compared to entropy because the range starts from 0 and ends at 0.5.

So better to choose Gini impurity.

It will save computation time!!

For the code part please check my github.

Unique Speed Data Science

Search This Blog

🌲🌳 Decision Tree 🌲🌳

Comments

Post a Comment

Popular posts from this blog

LSTM

Loss Functions | MSE | MAE | RMSE

KNN Interview Questions