Implement a deep learning framework: Part 1 – Implement Variable and Operation

Recently deep learning frameworks have attracted a lot of interest as they offer an easy way to define both static graphs (e.g. tensorflow, CNTK) and dynamic graphs (e.g. pytorch, dynet), in addition they save much time by doing automatic differentiations instead of the users.

However, those sophisticated frameworks have wrapped logics so deeply that it is hard to grasp what is happening inside them. In addition, it seems to be an arduous task when prototyping a new native operation for those frameworks.

As a result, I decided to create a new deep learning framework. This is for two purposes:  (1) to create a lightweight deep learning framework to deepen my understanding (2) to make it easier when prototyping a new operation. In this series of blogs, I will describe my ongoing project pytensor to create a deep learning framework with pure numpy. The following figure shows a typical architecture in the framework.

In this first article, I will describe the basic modules of the framework such as variable, operation and optimizer.

Modules of the framework

Prior to implementing pieces of the framework, we should define several modules and important concepts in the framework.

Tensor

Tensor is the lowest concept in this framework, it is just a numpy array in this framework. We will denote green square as a tensor in this series.

Variable

Variable is the basic class in the computation graph. It will be used to pass through values in the graph, feed inputs into the graph and update gradients during training.

The difference between a Variable and a numpy array is that Variable has two numpy arrays: one for the forward value and the other for the backward gradient as following code.

class Variable:
    """
    Variable is the basic structure in the computation graph
    It holds value for forward computation and grad for
    and grad for backward computation
    """

    def __init__(self, value, name='Variable', trainable=True):
        """
        :param value: numpy val
        :param name: name for the variable
        :param trainable: whether the variable can be trained or not
        """

        # value for forward computation
        self.value = np.array(value)

        # value for backward computation
        self.grad = np.zeros(self.value.shape)

        self.name = name

        self.trainable = trainable

The value of each variable will be set during the forward computation, and the grad will be updated during the backward computation.

There are three ways to create variables:

  • Variables can be instantiated directly by users as inputs or targets.
  • Variables can be retrieved using Parameter (it will be defined later). This is for managing trainable variables easily.
  • Variables can be created as a result of an operation (as the output variable)

Operation

To make variables pass through the graph, we clarify an interface that every operation should implement.

class Operation:
    """
    An interface that every operation should implement
    """

    def forward(self, input_variables):
        """
        forward computation

        :param input_variables: input variables
        :return: output variable
        """
        raise NotImplementedError

    def backward(self):
        """
        backprop loss and update

        :return:
        """
        raise NotImplementedError

The Operation should define two methods: forward and backward. In the forward method, it will compute its inside operation and generate a new output variable. In the backward method, we assume that the gradient has already been back-propagated into the output variable. The operation should continue to back-propagate the gradient into its input variables.

In the following diagram, forward will update green tensor (value) on the left side in each variable, then the backword will update tensor (grad) on the right side.

We show an implementation of typical add operation here.

class Add(Operation):

    def __init__(self, name='add', argument=None, graph=None):
        super(Add, self).__init__(name, graph, argument)


    def forward(self, input_variables):
        """
        Add all variables in the input_variable

        :param input_variables:
        :return:
        """
        super(Add, self).forward(input_variables)

        # value for the output variable
        value = np.zeros_like(self.input_variables[0])

        for input_variable in self.input_variables:
            value += input_variable.value

        self.output_variable = Variable(value)

        return self.output_variable

    def backward(self):
        """
        backward grad into each input variable

        :return:
        """

        for input_variable in self.input_variables:
            input_variable.grad += self.output_variable.grad

The forward method is basically doing following math.
$$ V_{out} = \sum_{i}{V_i} $$

According to the chain rule, the gradient of each input variable is equal to its output gradient, so we just add output variable’s gradient to each input variable’s gradient.

$$ \frac{\partial \mathcal{L}}{\partial V_i} = \frac{\partial \mathcal{L}}{\partial V_{out}} \cdot \frac{\partial V_{out}}{\partial V_i} = \frac{\partial \mathcal{L}}{\partial V_{out}} $$

Loss

The Loss class is a special type of operation. It should implement a loss function in addition to the forward and backward functions, the loss function will take the target variable as an input and return a scalar loss value. When backward is called, it will compute the gradient against this target and the gradient can then be back-propagated through all the graph subsequently.

class Loss(Operation):

    def forward(self, input_variables):
        raise NotImplementedError

    def backward(self):
        raise NotImplementedError

    def loss(self, target):
        raise NotImplementedError

Using this interface, we can define a simple square error loss function.
$$ \mathcal{L}_{square} = \frac{1}{2}\sum_{i}{(y_i – t_i)^2} $$

Where $y_i$ denotes the i-th output and $t_i$ denotes the i-th target.

class SquareErrorLoss(Loss):

    def __init__(self, name="SquareErrorLoss"):
        self.name = name

    def forward(self, input_variable):
        self.input_variable = input_variable

    def loss(self, target):
        self.target = target
        loss_val =  mean_squared_error(self.input_variable.value, self.target.value)
        return loss_val

    def backward(self):
        # update grad
        self.input_variable.grad = self.input_variable.value - self.target.value

        # back prop
        self.input_variable.backward()