Implement a deep learning framework Part 2 – Implement Graph, Optimizer and Linear Model

In this post, we continue to implement the basic components from the previous post. We will define Graph, Parameter and Optimizer.


Almost all deep learning frameworks have a concept of the computational graph. We also use the same terminology to define the architecture of the computation. The architecture of a typical graph is depicted in the following figure.


Another important class is Parameter. The Parameter class is used to manage a set of trainable variables. When we want to define a new trainable variable in the graph, we retrieve the variable from Parameter class instead of instantiating the variable directly. The relationship between graph and parameter is shown in the following graph.

This separation offers two advantages:

  • It provides a convenient way for optimizer to update variables as the Parameter class holds all the trainable variables
  • It allows variable to be shared in the different parts in the graph. This is a very crucial point for RNN and LSTM because we want to apply same weight variables during different time stamps.
class Parameter:
    Parameter is a structure to manage all trainable variables in the graph.

    Each trainable variable should be initialized using Parameter

    def __init__(self):

        # a dictionary mapping names to variables
        self.variable_dict = dict()

    def get_variable(self, name, shape):
        retrieve a variable with its name

        :param name: name of the variable
        :param shape: desired shape

        if name in self.variable_dict:
            # if the variable exists in the dictionary,
            # retrieve it directly
            return self.variable_dict[name]
            # if not created yet, initialize a new variable for it
            value = np.random.standard_normal(shape) / np.sqrt(shape[0])
            variable = Variable(value, name=name)

            # register the variable
            self.variable_dict[name] = variable

            return variable

    def clear_grads(self):
        clear gradients of all variables


        for k, v in self.variable_dict.items():


Finally, we define the optimizer class which can update variables in the parameter.

The optimizer can iterate all trainable variables in the parameter and update its value based on the gradient. In this article, we will implement the stochastic gradient descent optimizer.

class SGD:
    def __init__(self, parameter, lr=0.001):
        self.parameter = parameter = lr

    def update(self):

        for param_name in self.parameter.variable_dict.keys():
            # update param value
            param = self.parameter.variable_dict[param_name]

            # update
            if param.trainable:
                param.value -= * param.grad

        # clear all gradients

Linear Regression Model

Equipped with all components defined previously, we can now implement a linear regression model as an example. A typical model should define the forward, backward and loss function.

A linear regression model can be implemented as the following code.

class LinearModel:

    def __init__(self, input_size, output_size):
        a simple linear model: y = w*x

        :param input_size:
        :param output_size:

        # initialize size
        self.input_size = input_size
        self.output_size = output_size

        # initialize parameters
        self.parameter = Parameter()
        self.W = self.parameter.get_variable('weight', [self.input_size, self.output_size])

        # ops and loss
        self.matmul = Matmul()
        self.loss_ops = SoftmaxLoss()

    def forward(self, input_variable):
        output_variable = self.matmul.forward([input_variable, self.W])

        return output_variable

    def loss(self, target_variable):
        loss_val = self.loss_ops.loss(target_variable)
        return loss_val

    def backward(self):

The model basically computes $y$ using following equation.

$$ y = w \cdot x $$

The $w$ is the only trainable variable in the model and gets retrieved from the Parameter class. When the model receives an input variable, it will forward the matmul operation to get the output value. Then it computes the loss using square error loss function and the target variable. Finally, the loss will back propagated through the entire graph.

To test the model, we use the handwritten digits recognition dataset provided inside the scikit-learn package. The dataset contains about 1800 gray-scale images and each of the images corresponds to a single digit.

Here is an image example in the dataset which we want to recognize as 0.

We use the cross-entropy as the loss function for this model and it converges fast with SGD optimizer.

=== Epoch  0  Summary ===
test accuracy  0.8933333333333333
=== Epoch  1  Summary ===
test accuracy  0.9088888888888889

=== Epoch  39 Summary ===
test accuracy  0.9688888888888889

The code for this linear model is available here


    1. Hi Deran,

      Thanks for your comments.
      Wordpress requires an admin to approve the comments to show. That’s probably the reason you saw an error. I replied to you in that article.

Leave a Comment

Your email address will not be published. Required fields are marked *