Skip to main content

LSTM

                                                                 LSTM



 we will discuss now each and every stage with the help of the above diagram.


State1:Memory View

The memory view is responsible for remembering and forget the information based on the context of an input. (you didn't get it, wait now you will understand).

In the above diagram, the memory view is the top line.  key points(Ct-1, X, +,  and Ct).

The input is an old memory, X is multiplication which forgets the useless information from the old memory, and "+"  addition lets merge all these things.

when we multiply the old memory with '0' the old memory will "0" or if we multiply with vector "1" The old memory won't change. ( what are all these things 0 and 1)

the in-memory view we have "X" it means multiplication if we need old memory for our next input we multiply with "1", or if we want to remove old memory in our next input we multiply with "0" so anything multiple by "0" answer will be "0". And in memory view, we have "+" it used for merging both old memory and new memory and give an output.


Forget gate:

It is controlled by a simple one-layer neural network.

The inputs for this network are

ht-1: is an output of the previous LSTM BLOCK.

Xt:  is an input for the current LSTM BLOCK.

Ct-1: is a memory of the previous block. 

0:  bias 0

 Here we are having a sigmoid function as an activation function and its output is the forget valve.

Now it is assigned to the old memory Ct-1 by element vise elimination(you know how the multiplication takes place).



You know we have done with the first stage, now move forward like the above diagram.

Now the second valve is called the new memory valve. Again, it is a one layer simple neural network that takes the same inputs as the forget valve. This valve controls how much the new memory should influence the old memory.

And there is one more new memory, it is also a simple neural network( all the above memory's used sigmoid as activation function) now in this memory part we are using tanh as the activation function. 

After applying the tanh activation function. The output of this network will element-wise multiple the new memory valve, and add to the old memory to form the new memory.



Now our aim is to produce the LSTM output.




If you see the above image the shaded is done and now we have to produce the LSTM output.





Comments

Popular posts from this blog

Loss Functions | MSE | MAE | RMSE

            Performance Metrics The various metrics used to evaluate the results of the prediction are : Mean Squared Error(MSE) Mean Absolute error(MAE) Root-Mean-Squared-Error(RMSE) Adjusted R² Mean Squared Error: Mean Squared error is one of the most used metrics for regression tasks. MSE is simply the average of the squared difference between the target value and value predicted by the regression model.  As it squares the differences and  penalizes (punish)even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better. in the above formulae, y=actual value and ( yhat) means predicted value by the model. RMSE(Root Mean Squared Error: This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model. It is preferred more in some cases because the errors are first...

KNN Interview Questions

                           KNN interview questions 1) Which of the following distance metric can not be used in k-NN? A) Euclidean Distance B) Manhatten Distance c) Hamming Distance E) Minkowski Distance F) Jaccard Distance G) All the above Answer:- G All of these distance metric can be used as a distance metric for KNN 2)Knn is for regression or classification? Answer:- Knn is used for both classification and regression problems. 3) When we use Manhatten Distance? Answer:-Manhatten distance is used for continuous variables. 4) You have given the following 2 statements, find which of these options is/are true in case of k-NN? In the case of very large value of k , we may include points from other classes into the neighborhood, so it leads to overfitting. In case of too small value of k the algorithm is very sensitive to noise.(it will affect our model performance). Answer:-The above two points are answers. 5...