Skip to main content

🌲🌳 Decision Tree 🌲🌳

                             ðŸŒ²ðŸŒ³ Decision Tree 🌲🌳

Decision Tree Algorithm has come under supervised Learning, it is used for both 

Regression and Classification.

Important Terminology related to Decision tree:

  1. Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.
  2. Splitting: It is a process of dividing a node into two or more sub-nodes.
  3. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.
  4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
  5. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting.
  6. Branch / Sub-Tree: A subsection of the entire tree is called a branch or sub-tree.
  7. Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.

Before starting the decision tree  let's understand these three topics:
a) Entropy
b) Information Gain
c) Gini impurity

I will explain all these things in simple words don't worry!

Entropy:- Entropy helps us to calculate the purity of the sub split.
>Entropy controls how a Decision Tree decides to split the data.
>Suppose we have 3 input features f1,f2,f3 out of these three features which feature, we have to select first to start the tree.
>By selecting the best feature will save time, memory, and model performance, we will get leaf node early, etc
> Entropy value ranges from "0-1".


Information Gain:-The information gain is the amount of information gained about a random variable or signal from observing another random variable.
  • An attribute with highest Information gain will tested/split first.


Gini impurity:-
Gini impurity is the same as Entropy with a small difference.
> Both are used for selecting the best feature for the best split (now the question is which one we have to select)
> Both the working is same the main difference is the range of entropy is 0-1 and Gini impurity is 0-0.5
> In entropy the computation time is more because the range starts from 0 and ends at 1 
> In Gini impurity, the computation time is less compared to entropy because the range starts from 0 and ends at 0.5. 

So better to choose Gini impurity.
It will save computation time!!

For the code part please check my github.






Comments

Popular posts from this blog

Loss Functions | MSE | MAE | RMSE

            Performance Metrics The various metrics used to evaluate the results of the prediction are : Mean Squared Error(MSE) Mean Absolute error(MAE) Root-Mean-Squared-Error(RMSE) Adjusted R² Mean Squared Error: Mean Squared error is one of the most used metrics for regression tasks. MSE is simply the average of the squared difference between the target value and value predicted by the regression model.  As it squares the differences and  penalizes (punish)even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better. in the above formulae, y=actual value and ( yhat) means predicted value by the model. RMSE(Root Mean Squared Error: This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model. It is preferred more in some cases because the errors are first...

LSTM

                                                                                  LSTM  we will discuss now each and every stage with the help of the above diagram. State1:Memory View The memory view is responsible for remembering and forget the information based on the context of an input. (you didn't get it, wait now you will understand). In the above diagram, the memory view is the top line.   key points( Ct-1, X, +,  and Ct) . The input is an old memory, X is multiplication which forgets the useless information from the old memory, and " +"   addition lets merge all these things. when we multiply the old memory with '0' the old memory will "0" or if we multiply with vector "1" The old memory won't change. ( what ...

KNN Interview Questions

                           KNN interview questions 1) Which of the following distance metric can not be used in k-NN? A) Euclidean Distance B) Manhatten Distance c) Hamming Distance E) Minkowski Distance F) Jaccard Distance G) All the above Answer:- G All of these distance metric can be used as a distance metric for KNN 2)Knn is for regression or classification? Answer:- Knn is used for both classification and regression problems. 3) When we use Manhatten Distance? Answer:-Manhatten distance is used for continuous variables. 4) You have given the following 2 statements, find which of these options is/are true in case of k-NN? In the case of very large value of k , we may include points from other classes into the neighborhood, so it leads to overfitting. In case of too small value of k the algorithm is very sensitive to noise.(it will affect our model performance). Answer:-The above two points are answers. 5...