# Implement a deep learning framework: Part 4 – Implement RNN, LSTM and Language Models

In this part, we continue to add basic components to the framework by implementing RNN/LSTM related operations. RNN/LSTM are neural network models which are widely used in various NLP and ML tasks such as pos-tagging and speech recognition.

### Embedding

Firstly, it is necessary to support the embedding operations for the model. In the previous posts, we have implemented the standard Variable for the linear model and MLP model. However, the standard variable cannot be used as an embedding variable efficiently.

The reason is that looking into an embedding variable is a sparse operation, so we only need partial information of the entire variable. In the following example, we only need two rows from the entire variable by looking up the index 1 and 3.

To support embedding lookup, we implemented the embedding operation in pytensor.ops.embedding_ops. The embedding variable is implemented as a list of standard variables in the Parameter class. To lookup the embedding for a specific word, we can just pick up the corresponding embedding from the list.

The related codes are as follows.

class Parameter:

def get_embedding(self, vocab_size, word_dim):

# get current embedding if it is created already
if self.embeddings != None:
return self.embeddings

# initialize the embedding
self.embeddings = []

# embedding is implemented as a list of variables
# this is for efficient update
for i in range(vocab_size):
embedding = Variable([np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), word_dim)])
self.embeddings.append(embedding)

return self.embeddings

class Embedding(Operation):

def forward(self, input_variables):
"""
get the embedding

:param input_variables: input variable is a LongVariable containing word ids
:return: embedding
"""
super(Embedding, self).forward(input_variables)

# embedding only takes 1 input variable
assert(len(input_variables) == 1)

word_id = input_variables[0].value[0]

assert (word_id < self.vocab_size)
output_variable = self.embedding_variables[word_id]

if self.trainable:

return output_variable



### RNN Operation

Equipped with the embedding operation, we can then continue to add the RNN operation. In our implementation, an RNN operation consists of multiple RNNCell operations. When we run RNN over a long sentence, each RNNCell will receive a word and update its internal state.

The RNNCell operation will compute its forward state by following equations.

$$Z_t = H_{t-1} \cdot W + I_t \cdot U$$

$$H_t = tanh(Z_t)$$

where $H_t$ is the hidden state at timestamp $t$, $I_t$ is the input at timestamp $t$.

The backward will propagate the gradient from $H_t$ into $Z_t$ with chain rules as follows.

$$\frac{\partial \mathcal{L}}{\partial Z_t} = \frac{\partial \mathcal{L}}{\partial H_t} \cdot \frac{\partial tanh(Z_t)}{\partial Z_t}$$

$$\frac{\partial \mathcal{L}}{\partial Z_t} = \frac{\partial \mathcal{L}}{\partial H_t} \cdot (1-H_t)^2$$

Then the gradient is backpropageted into each of the remaining variables.

$$\frac{\partial \mathcal{L}}{\partial H_{t-1}} = \frac{\partial \mathcal{L}}{\partial Z_t} \cdot W^\intercal$$

$$\frac{\partial \mathcal{L}}{\partial W} =H_{t-1}^\intercal \cdot \frac{\partial \mathcal{L}}{\partial Z_t}$$

$$\frac{\partial \mathcal{L}}{\partial I_{t}} = \frac{\partial \mathcal{L}}{\partial Z_t} \cdot U^\intercal$$

$$\frac{\partial \mathcal{L}}{\partial U} =I_{t}^\intercal \cdot \frac{\partial \mathcal{L}}{\partial Z_t}$$

Finally, the $\frac{\partial \mathcal{L}}{\partial I_{t}}$ is propagated into each embedding variables.

One of the common issues here is that $\frac{\partial \mathcal{L}}{\partial W}$ and $\frac{\partial \mathcal{L}}{\partial U}$ should accumulate their gradients without overwriting them by the new gradient. It took me a lot of time to debug this…

### LSTM Operation

RNN can also be extended into an LSTM model which can alleviate gradient vanishing and exploding problems. The LSTM is also comprised of its cells (LSTMCell) like the RNNCell. The difference between LSTMCell and RNNCell is that LSTMCell needs to remember two variables: hidden state and cell state.

The LSTMCell operation will update its forward states with the following equations.

$$f_t =\sigma(H_{t-1} \cdot W_{fh} + I_t \cdot W_{fi})$$

$$i_t =\sigma(H_{t-1} \cdot W_{ih} + I_t \cdot W_{ii})$$

$$o_t = \sigma(H_{t-1} \cdot W_{oh} + I_t \cdot W_{oi})$$

$$c_t = tanh(H_{t-1} \cdot W_{ch} + I_t \cdot W_{ci})$$

$$Cell_t = f_t*Cell_{t-1} + i_t*c_t$$

$$H_t = o_t*tanh(Cell_t)$$

where $f_t, i_t$ and $o_t$ are forget gate, input gate, and output gate respectively.

The backward operation of LSTM cell is too long to show. The corresponding code can be seen here.

### Penn Treebank  language model

Finally, we can use those components to create a language model. We will train a language model using sentences from Penn Treebank. The dataset is obtained from the word2vec script in Mikolov’s website.

We show a RNN language implementation as follows. LSTM can be implemented in the same style by replacing RNN with LSTM.

class RNNLM:

def __init__(self, vocab_size, input_size, hidden_size):

# embedding size
self.vocab_size = vocab_size
self.word_dim = input_size

# network size
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = vocab_size

# num steps
self.max_num_steps = 100
self.num_steps = 0

# graph
self.graph = Graph('RNN')

# word embedding
embed_argument = {'vocab_size': self.vocab_size, 'embed_size': self.input_size}
self.word_embedding = self.graph.get_operation('Embedding', embed_argument)

# rnn
rnn_argument = {'input_size': self.input_size, 'hidden_size': self.hidden_size, 'max_num_steps': self.max_num_steps}
self.rnn = self.graph.get_operation('RNN', rnn_argument)

# affines
affine_argument = {'input_size': self.hidden_size, 'hidden_size': self.output_size}
self.affines = [self.graph.get_operation('Affine', affine_argument, "Affine") for i in range(self.max_num_steps)]

# softmax
self.softmaxLosses = [self.graph.get_operation('SoftmaxLoss') for i in range(self.max_num_steps)]

def forward(self, word_lst):

# get num steps
self.num_steps = min(len(word_lst), self.max_num_steps)

# create embeddings
embedding_variables = []
for word_id in word_lst:
embedding_variables.append(self.word_embedding.forward([LongVariable([word_id])]))

# run RNN
rnn_variables = self.rnn.forward(embedding_variables)

# softmax variables
softmax_variables = []

for i in range(self.num_steps):
output_variable = self.affines[i].forward(rnn_variables[i])
softmax_variable = self.softmaxLosses[i].forward(output_variable)
softmax_variables.append(softmax_variable)

return softmax_variables

def loss(self, target_ids):

ce_loss = 0.0

for i in range(self.num_steps):
cur_ce_loss = self.softmaxLosses[i].loss(LongVariable([target_ids[i]]))
ce_loss += cur_ce_loss

return ce_loss



In the next post of this series, I hope to implement seq2seq model with RNN in this post.

xinjianl

1. Ban says:

Hello! only thanks to a friend I found your site with such interesting tutorials. can you write an article what you need to know to build your deep learning framework?

1. xinjianl says:

Hi Ban,

It is a good suggestion as some other people also look interested in the same question.
I am not sure when I will have time to write about it, but I will try writing one when I have time 🙂

Thanks!
Xinjian

1. Ban says:

I hope you will soon have free time. In the meantime, can you give general advice on developing a framework?

1. xinjianl says:

No problem. You might also have some hints from comments I posted on other articles if you are interested 🙂

1. xinjianl says:

I think finishing any of those large courses can give you enough hints about what you need to learn to create your own framework. However, I guess none of them can teach you enough to do it.
Reading codes after finishing one course is probably a good way to start. For example, the tinyflow you mentioned is probably a good startpoint.

1. Ban says:

thank you very much!
I’m looking forward to an article on creating a deep learning framework.
all the best!

1. xinjianl says:

Thanks 🙂

2. Ban says:

Hello!
Excuse for troubling.
reading the comments under the post posts I had a question. How can I implement a framework for recognizing objects using R-CNN (or other technology)? what would you recommend? and is it possible to add this feature to pytensor?

1. xinjianl says:

Hi Ban,
I guess you can modify this framework to implement your R-CNN.
The basic structure does not need to change a lot, you still need backward/forward interface and tensor objects.
However, there’s lots of things that were not included yet. For example, you need to implement CNN layer (ideally with GPU), region proposal.
I would recommend you to finish one of those large CV course, then you probably will understand what you need to implement.

1. Ban says:

Hello!
Thank you so much for the answer. Little by little, I started working on this. I’m waiting for an article on writing a framework.
thanks again!

1. xinjianl says:

You are welcome 🙂