K-Nearest Neighbour
The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
KNN means in short Similar things near to each other.
The KNN algorithm uses ‘feature similarity’ to predict the values of any new data points.
I am going to explain this knn with a simple example:-
In the above table, we have S.No, Height, Weight & Age in our table for S.No.5 the weight is missing, So now we need to predict the weight of the person based on his Height and Age.
graph example
in the above graph, I write 5 numbers, in that 4 values have output and one id not having output now see How KNN help us.
the 5th number I want to predict which is circled.
hint:-By seeing the above graph we got an idea i.e the 5 is near to 4 or 2.
YES.
for S.No 2 and 4, the weights are 63 and 78.
now we got an idea that is the weight of the 5th person is between 63-78.
now what about person 5 weight see here it takes the average of 63+78/2=70
THE WEIGHT OF 5th PERSON IS 70 FINALLY WE GOT IT.
Question:-How we calculate the distance between one point to another point?
To calculate the distance between unknown data from unknown data points below techniques is helped.
Ans:-Most common methods are Euclidian, Manhattan these techniques are used when we have a continuous variable.
Euclidean Distance: Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (y).
Manhattan Distance: This is the distance between real vectors using the sum of their absolute difference.
For categorical variable we use Hamming Distance:
Hamming Distance: It is used for categorical variables. If the value (x) and the value (y) are the same, the distance D will be equal to 0. Otherwise D=1.
KNN Algorithm:
step1:-Calculate the distance between the unknown data point and all known data points and store them in the same array.
Step2:- Sort the above array in the ascending of distance.
Step3:- Based on k neighbours selected by the user, select first k rows from the array.
Step4:- Perform voting of the selected records and select the label that wins the majority.
Step5:- Assigned winned label to the unknown data point.
How to Select K value in KNN?
if we choose less (k=1) value it leads to overfitting on the training data and it will lead to a high error on validation data.
Suppose if you choose high k-value the model performance is very poor on both the training dataset and testing dataset.
The K-value is always changed for every dataset. i.e it doesn't have any default value.
The best technique for selecting K-value is - plot the elbow method
it will help you definitely.
When to use KNN:
a)Knn works well with a small number of dataset
b) it won't work well with a large number of dataset
c)KNN needs scaling YES!! because when we are calculating the distance for some data points the distance may be very long if I train the model without scaling it will work but it takes more time to execute. So better to scale the Data.
Pros:
- Easy to use.
- Quick calculation time.
- Do not make assumptions about the data.
Cons: Accuracy depends on the quality of the data.
- Must find an optimal k value (number of nearest neighbors).
- Poor at classifying data points in a boundary where they can be classified one way or another.
finally, love your Neighbour....😉
FOR CODE PART PLEASE MY CHECK MY GITHUB.
Comments
Post a Comment