Skip to main content

LSTM

                                                                 LSTM



 we will discuss now each and every stage with the help of the above diagram.


State1:Memory View

The memory view is responsible for remembering and forget the information based on the context of an input. (you didn't get it, wait now you will understand).

In the above diagram, the memory view is the top line.  key points(Ct-1, X, +,  and Ct).

The input is an old memory, X is multiplication which forgets the useless information from the old memory, and "+"  addition lets merge all these things.

when we multiply the old memory with '0' the old memory will "0" or if we multiply with vector "1" The old memory won't change. ( what are all these things 0 and 1)

the in-memory view we have "X" it means multiplication if we need old memory for our next input we multiply with "1", or if we want to remove old memory in our next input we multiply with "0" so anything multiple by "0" answer will be "0". And in memory view, we have "+" it used for merging both old memory and new memory and give an output.


Forget gate:

It is controlled by a simple one-layer neural network.

The inputs for this network are

ht-1: is an output of the previous LSTM BLOCK.

Xt:  is an input for the current LSTM BLOCK.

Ct-1: is a memory of the previous block. 

0:  bias 0

 Here we are having a sigmoid function as an activation function and its output is the forget valve.

Now it is assigned to the old memory Ct-1 by element vise elimination(you know how the multiplication takes place).



You know we have done with the first stage, now move forward like the above diagram.

Now the second valve is called the new memory valve. Again, it is a one layer simple neural network that takes the same inputs as the forget valve. This valve controls how much the new memory should influence the old memory.

And there is one more new memory, it is also a simple neural network( all the above memory's used sigmoid as activation function) now in this memory part we are using tanh as the activation function. 

After applying the tanh activation function. The output of this network will element-wise multiple the new memory valve, and add to the old memory to form the new memory.



Now our aim is to produce the LSTM output.




If you see the above image the shaded is done and now we have to produce the LSTM output.





Comments

Popular posts from this blog

Loss Functions | MSE | MAE | RMSE

            Performance Metrics The various metrics used to evaluate the results of the prediction are : Mean Squared Error(MSE) Mean Absolute error(MAE) Root-Mean-Squared-Error(RMSE) Adjusted R² Mean Squared Error: Mean Squared error is one of the most used metrics for regression tasks. MSE is simply the average of the squared difference between the target value and value predicted by the regression model.  As it squares the differences and  penalizes (punish)even a small error which leads to over-estimation of how bad the model is. It is preferred more than other metrics because it is differentiable and hence can be optimized better. in the above formulae, y=actual value and ( yhat) means predicted value by the model. RMSE(Root Mean Squared Error: This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model. It is preferred more in some cases because the errors are first...

SUPPORT VECTOR MACHINE

                 SUPPORT VECTOR MACHINE:- Support vector machine:-it is a type of supervised learning algorithm it is used to solve both classification and regression problem. Note :- It is mostly used for classification problems. what we are going to learn in SVM: a) Support vectors b) Hyperplane c) Marginal Distance d) Linear Separable e) Non-linear separable f) support kernels NOw we will discuss everything in detail. Hyper plane:- in the above diagram, we have drawn three lines(A, B, C) separating two data points (stars and reds) The lines (A, B, C) are called Hyperplanes. Note:- “Select the hyper-plane which segregates the two classes better” i.e  above there are three hyperplanes how to select the best hyperplane? b)Marginal Distance:- When we draw a hyperplane the plane creates two new(------) dotted lines one line above the hyperplane and one line below the hyperplane line. see the below image you will get an ...

Multi Linear Regression

                                 MULTI LINEAR REGRESSION Before going into MULTI LINEAR REGRESSION first look into Linear Regression. LINEAR REGRESSION:-It is all about getting the best line for the given data that supports linearity. for Linear regression please check my previous post. In Linear regression, we have only one independent variable and one dependent variable. In Multilinear Regression, we have more than one independent variable and one dependent variable. This is the main difference between Multilinear regression and Linear regression. Formulae for Linear regression and Multilinear Regression is listed below: Evaluation metrics for Multi-linear Regression problems are: a)Mean Absolute error b)Mean Squared error c)Root Mean Squared Error d)..... For Evaluation metrics I had posted another post please check it. For the code part please check my Github In ...