In this part, we continue to add basic components to the framework by implementing RNN/LSTM related operations. RNN/LSTM are neural network models which are widely used in various NLP and ML tasks such as pos-tagging and speech recognition.
Embedding
Firstly, it is necessary to support the embedding operations for the model. In the previous posts, we have implemented the standard Variable for the linear model and MLP model. However, the standard variable cannot be used as an embedding variable efficiently.
The reason is that looking into an embedding variable is a sparse operation, so we only need partial information of the entire variable. In the following example, we only need two rows from the entire variable by looking up the index 1 and 3.
To support embedding lookup, we implemented the embedding operation in pytensor.ops.embedding_ops. The embedding variable is implemented as a list of standard variables in the Parameter class. To lookup the embedding for a specific word, we can just pick up the corresponding embedding from the list.
The related codes are as follows.
class Parameter: def get_embedding(self, vocab_size, word_dim): # get current embedding if it is created already if self.embeddings != None: return self.embeddings # initialize the embedding self.embeddings = [] # embedding is implemented as a list of variables # this is for efficient update for i in range(vocab_size): embedding = Variable([np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), word_dim)]) self.embeddings.append(embedding) return self.embeddings class Embedding(Operation): def forward(self, input_variables): """ get the embedding :param input_variables: input variable is a LongVariable containing word ids :return: embedding """ super(Embedding, self).forward(input_variables) # embedding only takes 1 input variable assert(len(input_variables) == 1) word_id = input_variables[0].value[0] assert (word_id < self.vocab_size) output_variable = self.embedding_variables[word_id] if self.trainable: self.graph.parameter.add_temp_variable(output_variable) return output_variable
RNN Operation
Equipped with the embedding operation, we can then continue to add the RNN operation. In our implementation, an RNN operation consists of multiple RNNCell operations. When we run RNN over a long sentence, each RNNCell will receive a word and update its internal state.
The RNNCell operation will compute its forward state by following equations.
$$ Z_t = H_{t-1} \cdot W + I_t \cdot U $$
$$ H_t = tanh(Z_t) $$
where $H_t$ is the hidden state at timestamp $t$, $I_t$ is the input at timestamp $t$.
The backward will propagate the gradient from $H_t$ into $Z_t$ with chain rules as follows.
$$ \frac{\partial \mathcal{L}}{\partial Z_t} = \frac{\partial \mathcal{L}}{\partial H_t} \cdot \frac{\partial tanh(Z_t)}{\partial Z_t} $$
$$ \frac{\partial \mathcal{L}}{\partial Z_t} = \frac{\partial \mathcal{L}}{\partial H_t} \cdot (1-H_t)^2 $$
Then the gradient is backpropageted into each of the remaining variables.
$$ \frac{\partial \mathcal{L}}{\partial H_{t-1}} = \frac{\partial \mathcal{L}}{\partial Z_t} \cdot W^\intercal $$
$$ \frac{\partial \mathcal{L}}{\partial W} =H_{t-1}^\intercal \cdot \frac{\partial \mathcal{L}}{\partial Z_t}$$
$$ \frac{\partial \mathcal{L}}{\partial I_{t}} = \frac{\partial \mathcal{L}}{\partial Z_t} \cdot U^\intercal $$
$$ \frac{\partial \mathcal{L}}{\partial U} =I_{t}^\intercal \cdot \frac{\partial \mathcal{L}}{\partial Z_t}$$
Finally, the $ \frac{\partial \mathcal{L}}{\partial I_{t}}$ is propagated into each embedding variables.
One of the common issues here is that $ \frac{\partial \mathcal{L}}{\partial W} $ and $ \frac{\partial \mathcal{L}}{\partial U} $ should accumulate their gradients without overwriting them by the new gradient. It took me a lot of time to debug this…
LSTM Operation
RNN can also be extended into an LSTM model which can alleviate gradient vanishing and exploding problems. The LSTM is also comprised of its cells (LSTMCell) like the RNNCell. The difference between LSTMCell and RNNCell is that LSTMCell needs to remember two variables: hidden state and cell state.
The LSTMCell operation will update its forward states with the following equations.
$$ f_t =\sigma(H_{t-1} \cdot W_{fh} + I_t \cdot W_{fi}) $$
$$ i_t =\sigma(H_{t-1} \cdot W_{ih} + I_t \cdot W_{ii}) $$
$$ o_t = \sigma(H_{t-1} \cdot W_{oh} + I_t \cdot W_{oi}) $$
$$ c_t = tanh(H_{t-1} \cdot W_{ch} + I_t \cdot W_{ci}) $$
$$ Cell_t = f_t*Cell_{t-1} + i_t*c_t $$
$$ H_t = o_t*tanh(Cell_t) $$
where $f_t, i_t $ and $o_t$ are forget gate, input gate, and output gate respectively.
The backward operation of LSTM cell is too long to show. The corresponding code can be seen here.
Penn Treebank language model
Finally, we can use those components to create a language model. We will train a language model using sentences from Penn Treebank. The dataset is obtained from the word2vec script in Mikolov’s website.
We show a RNN language implementation as follows. LSTM can be implemented in the same style by replacing RNN with LSTM.
class RNNLM: def __init__(self, vocab_size, input_size, hidden_size): # embedding size self.vocab_size = vocab_size self.word_dim = input_size # network size self.input_size = input_size self.hidden_size = hidden_size self.output_size = vocab_size # num steps self.max_num_steps = 100 self.num_steps = 0 # graph self.graph = Graph('RNN') # word embedding embed_argument = {'vocab_size': self.vocab_size, 'embed_size': self.input_size} self.word_embedding = self.graph.get_operation('Embedding', embed_argument) # rnn rnn_argument = {'input_size': self.input_size, 'hidden_size': self.hidden_size, 'max_num_steps': self.max_num_steps} self.rnn = self.graph.get_operation('RNN', rnn_argument) # affines affine_argument = {'input_size': self.hidden_size, 'hidden_size': self.output_size} self.affines = [self.graph.get_operation('Affine', affine_argument, "Affine") for i in range(self.max_num_steps)] # softmax self.softmaxLosses = [self.graph.get_operation('SoftmaxLoss') for i in range(self.max_num_steps)] def forward(self, word_lst): # get num steps self.num_steps = min(len(word_lst), self.max_num_steps) # create embeddings embedding_variables = [] for word_id in word_lst: embedding_variables.append(self.word_embedding.forward([LongVariable([word_id])])) # run RNN rnn_variables = self.rnn.forward(embedding_variables) # softmax variables softmax_variables = [] for i in range(self.num_steps): output_variable = self.affines[i].forward(rnn_variables[i]) softmax_variable = self.softmaxLosses[i].forward(output_variable) softmax_variables.append(softmax_variable) return softmax_variables def loss(self, target_ids): ce_loss = 0.0 for i in range(self.num_steps): cur_ce_loss = self.softmaxLosses[i].loss(LongVariable([target_ids[i]])) ce_loss += cur_ce_loss return ce_loss
In the next post of this series, I hope to implement seq2seq model with RNN in this post.
Hello! only thanks to a friend I found your site with such interesting tutorials. can you write an article what you need to know to build your deep learning framework?
Hi Ban,
Thanks for your comments!
It is a good suggestion as some other people also look interested in the same question.
I am not sure when I will have time to write about it, but I will try writing one when I have time š
Thanks!
Xinjian
thanks for the answer!
I hope you will soon have free time. In the meantime, can you give general advice on developing a framework?
No problem. You might also have some hints from comments I posted on other articles if you are interested š
HI
Thanks
you say there that CS231n will be almost enough is it so?
what do you think of these resources?
https://github.com/pjreddie/uwimg
https://course.fast.ai/part2
https://github.com/tqchen/tinyflow
https://end-to-end-machine-learning.teachable.com/p/write-a-neural-network-framework/?
I think finishing any of those large courses can give you enough hints about what you need to learn to create your own framework. However, I guess none of them can teach you enough to do it.
Reading codes after finishing one course is probably a good way to start. For example, the tinyflow you mentioned is probably a good startpoint.
thank you very much!
Iām looking forward to an article on creating a deep learning framework.
all the best!
Thanks š
Hello!
Excuse for troubling.
reading the comments under the post posts I had a question. How can I implement a framework for recognizing objects using R-CNN (or other technology)? what would you recommend? and is it possible to add this feature to pytensor?
Hi Ban,
I guess you can modify this framework to implement your R-CNN.
The basic structure does not need to change a lot, you still need backward/forward interface and tensor objects.
However, there’s lots of things that were not included yet. For example, you need to implement CNN layer (ideally with GPU), region proposal.
I would recommend you to finish one of those large CV course, then you probably will understand what you need to implement.
Hello!
Thank you so much for the answer. Little by little, I started working on this. I’m waiting for an article on writing a framework.
thanks again!
You are welcome š