LSTM

LSTM

we will discuss now each and every stage with the help of the above diagram.

State1:Memory View

The memory view is responsible for remembering and forget the information based on the context of an input. (you didn't get it, wait now you will understand).

In the above diagram, the memory view is the top line. key points(Ct-1, X, +, and Ct).

The input is an old memory, X is multiplication which forgets the useless information from the old memory, and "+" addition lets merge all these things.

when we multiply the old memory with '0' the old memory will "0" or if we multiply with vector "1" The old memory won't change. ( what are all these things 0 and 1)

the in-memory view we have "X" it means multiplication if we need old memory for our next input we multiply with "1", or if we want to remove old memory in our next input we multiply with "0" so anything multiple by "0" answer will be "0". And in memory view, we have "+" it used for merging both old memory and new memory and give an output.

Forget gate:

It is controlled by a simple one-layer neural network.

The inputs for this network are

ht-1: is an output of the previous LSTM BLOCK.

Xt: is an input for the current LSTM BLOCK.

Ct-1: is a memory of the previous block.

0: bias 0

Here we are having a sigmoid function as an activation function and its output is the forget valve.

Now it is assigned to the old memory Ct-1 by element vise elimination(you know how the multiplication takes place).

You know we have done with the first stage, now move forward like the above diagram.

Now the second valve is called the new memory valve. Again, it is a one layer simple neural network that takes the same inputs as the forget valve. This valve controls how much the new memory should influence the old memory.

And there is one more new memory, it is also a simple neural network( all the above memory's used sigmoid as activation function) now in this memory part we are using tanh as the activation function.

After applying the tanh activation function. The output of this network will element-wise multiple the new memory valve, and add to the old memory to form the new memory.

Now our aim is to produce the LSTM output.

If you see the above image the shaded is done and now we have to produce the LSTM output.

Unique Speed Data Science

Search This Blog

LSTM

Comments

Post a Comment

Popular posts from this blog

Loss Functions | MSE | MAE | RMSE

KNN Interview Questions