Tensors, The Building Block For Deep Learning

Designing a tensor class in C++ that can be used for a deep neural network library.

Neural networks require data in order to learn anything about the world. This data consists of a set of inputs and outputs that we will use to train the system. The inputs could be anything from images, to raw text, to audio waveforms, to plain old numeric values representing the data. The outputs are commonly numeric values, class labels, or a generated sample of the input. We want a common data structure that can handle encoding any of these input or output formats, so that we have a simple interface to train our neural networks. Most neural network libraries have settled on a data structure called a Tensor to feed through the networks. In our previous examples, we have been using vectors of floating point values to store our data. This works fine for simple examples, but can be cumbersome to manage when we get to larger datasets with higher dimensionality.

To describe exactly what a tensor is, and how it stores data for inputs to a neural network, let's take a concrete example of creating a system that learns to match dogs with future owners. People have preferences of types of dogs they like. Some like fluffy dogs, some like dogs that will alert you when there is an intruder, some like big dogs, some like small dogs... You get the picture.

When a dog comes into a shelter, the shelter could fill out some characteristics about the dog and put it into a spreadsheet. Each characteristic could be a column in the spreadsheet, with each dog being a row. If we just looked at one dog in the set, it may look like the following. The first element 5.0 could be the age of the doggo, the second element 2.0 could the how much fur the dog has on a scale of 1-10, 3.4 could be how much the dog barks, 70.0 the weight, and 2.5 out of 10.0 for the energy.

We could easily represent this single dog in code as a vector of floats.

Then since we have a bunch of dogs in the spreadsheet, why not concatenate them all into a matrix? This will be easy to represent in code as well, we can just hold each vector of floats in another vector, giving us a vector of vectors of floats.

In this case we have 4 rows each containing 5 pieces of information about each dog, giving us a 4x5 matrix.

There are many shelters though, each with their own spreadsheet of dogs. So in order to represent this, we need another vector, to hold each matrix, where each matrix is a single shelter. As you can see, this starts to get more complicated each time we need to add a new container.

This is where tensors come in. Tensors can represent data of arbitrary dimensions and size. For example, our dog shelter data could simply be a tensor of 3 dimensions. This tensor has data about 3 different shelters, all with 4 dogs, each with 5 characteristics, giving us a 3x4x5 tensor. Often we feed data to neural networks in mini batches, where each "batch" is a set of vectors, or a matrix, so it is nice to store the entire dataset in a tensor you can index.

Though the example above is a tensor of 3 dimensions, they can have an arbitrary number of dimensions. For example, if we are storing a dataset of images, it is common to have a 4 dimensional tensor to store the data, say 4 x 32 x 32 x 3. The first dimension for the batch size (4), the second dimension for image width (32), the third dimension for image width (32), and the fourth dimension for image channels ie. red, green, blue (3). In this case we define the "shape" of the tensor as 4x32x32x3. Photo Creds: My dog Finn

Instead of separately handling cases where the data is 1 dimensional as
vector<float>
2 dimensional as
vector<vector<float>>
3 dimensional as
vector<vector<vector<float>>>
4 dimensional as
vector<vector<vector<vector<float>>>>
etc ...

We are going to write a C++ class that can handle all of these cases, and call it Tensor. Let's start with the codebase from this post, which is already set up with a simple feed forward neural network and unit tests. If you do not have the code, feel free to grab it from github.

In order to make this code a little more flexible and extensible we will add our tensor class. Start by creating our header and cpp files.

#!/bin/bash
touch include/neural/math/tensor.h
touch src/tensor.cpp
touch tests/tensor_test.cpp

The tensor class is going to store all of the data in a single compact vector of floats, and keep track of its dimensions with a vector of sizes. Let's fill out the header file with some constructors and member variables to see what this looks like.

Here we have a vector of floats called m_data that will store all of our values in our tensor. We will organize it so that we can do some simple math to get elements in arbitrary dimensions. We also store the shape as m_shape, a vector of sizes (long integers), in order to know how to organize our data in a flat 1 dimensional vector. There are a few helper functions here to aid our unit tests.

Before we dive into the implementation, let's get our unit tests set up by editing tests/tensor_test.cpp.

In the first test, we pass in a tensor shape to the constructor, and make sure it allocates enough memory to hold onto all of our values. In the second test, we pass in a shape and the actual values we want to store.

In order to calculate the amount of memory we need to allocate in our flat 1D vector, to store a tensor of arbitrary dimensions and shape, we simply multiply all the dimensions together. In our example a 3x4x5 tensor, will simply have 3*4*5 = 60 entries.

Let's implement the cpp file to make this happen.

Everything is simply copying and storing data here, except for the p_CalcSize function which does the multiplication to calculate the vector size as we talked about before. Compiling and running our new tests and we see that we are off to the races. Accessing Data

It may seem odd that we are storing our data in a single dimensional vector m_data when we are trying to represent multi-dimensional data. Let's dive deeper into how this works.

In order to test out the lower level implementation, let's define an accessor function called "At" in our Tensor class that takes in a vector of indices, and returns the value stored in that location.

Next let's make stub implementation of the function that just return 0.0 for now, so that our code compiles.

To test this function, we are going to make a dummy dataset of 2x2 color images that we are going to store in a Tensor. Although this is probably a dataset we would never use in the wild, it is good for testing and visualization purposes. This tensor is going to hold a batch of 4 images, each with a width and height of 2, and 3 channels for the r,g,b colors of the pixels, leaving us with a 4x2x2x3 tensor.

I have tried to organize the code in a way so we can visually see how the multi-dimensional data is laid out in the single dimensional vector. As we can see, the first pixel of the first image is stored in elements 0, 1, and 2 of the vector. Pretty simple. The next three values (3,4,5) are the next pixel, in row 0, column 1 of the image. Values 6,7, and 8 would be rgb values the pixel in row 1, column 0 of the first image, etc. This continues throughout the vector with the same pattern as we add more images until we complete our batch size of 4. Now how do we go from a multidimensional index describing the value we want in the tensor, to the index in our m_data vector?

The key is using something we will call "strides". We can compute the distance between items at each dimension by striding over the vector given the dimension sizes. For our example above, of a 4x3x3x2 tensor, the stride sizes will be 12, 6, 3, and 1 respectively.

If you notice in the vector of data we passed in to create our tensor, it takes exactly 12 steps to get to the next image in the dataset, 6 steps to get to the next row in the image, 3 steps to get to the next pixel in the row, and 1 step to get to the next channel value of that pixel. Hence these "sets of steps" are called "strides". Now how do we compute these strides for a tensor of arbitrary size and dimensions? It is pretty intuitive if you followed the example above. You start at the first dimension, and find it's stride by multiplying the remaining dimensions together. So in the example above, if we want to get to the {1,0,0,0} index, or the second image in the dataset, we are going to need to move 3*3*2 = 12 steps to get there.

Now let's say we wanted to get to the start of the third image in the dataset, or {2,0,0,0}. It is the same logic, just * 2, or 24 steps. What about the first pixel in the second row of the third image? ie {2,1,0,0}. Well then it is 2 * (3*3*2) to get to the third image, then add 1 * (2 * 3) to get to the next row in that image. (2 * (3 * 3 * 2)) + (1 * (2 * 3)) = 42 steps.

Let's define a function to do this for us. In our file include/neural/math/tensor.h header in the private section let's add a couple definitions to the header: "p_ComputeStrideSizes" and "p_DataOffsetFromIdx"

First let's implement p_ComputeStrideSizes() given a shape:

Since the stride sizes will stay the same while the shape of the tensor stays the same, we will precompute these on construction and store them in a vector called m_strideSizes. Add this vector to the member variables of your Tensor class after m_data.

Now let's update our constructors to use our new function to calculate the strides.

Now that we have the stride sizes, we can use them in our p_DataOffsetFromIdx function to get the actual offsets into our data vector. Implement the p_DataOffsetFromIdx function we have defined earlier as follows:

Now that we have precomputed our stride sizes it is pretty simple to use them. First we check the input to make sure it is a valid index. After this, we can just walk through the indices and multiply them by the stride sizes, while we sum up a total. Pretty simple right? We can use this new function to make our At() implementation very simple:

Before we compile you may have notices a new convenient function called ShapeStr(), as well as the use of stringstream. First add the sstream header to the top of your cpp file.

Now let's define our public function to turn shape vectors into a string in the header:

And implement it in the cpp:

Let's run the tests again and see if we succeeded...

cmake ..
make
./tests Great! Now we can construct our tensors with data and access the elements, but we should also be able to set data at these indices. Let's define SetAt(idx) as follows:

We will use our same p_DataOffsetFromIdx from before to figure out our data offset, but "set" the value instead of "get" it this time.

In the full implementations I have added more unit tests for SetAt(), but for now we'll skip them. I think we can feel pretty confident in our implementation since it uses the same logic as At() to compute the offsets.

Pointers and Initializations

Next we are going to add some static functions for constructing tensors with different types of data. First let's create two typedefs for tensor pointers. The tensors we use in neural networks often hold a lot of data, so we do not want to be copying these objects everywhere. It will be much more efficient to have pointers to them. At the top of our header add these typedefs, as well as the include.

In most of our neural network code we will write later we will be using the TTensorPtr typedef, which is a shared pointer to a const tensor. This is because in most cases we don't want functions modifying our tensors as we pass them in, but we do want to share the data between modules without copying it.

We also add our TMutableTensorPtr typedef in cases where we are going to need to modify the data. It is nice to explicitly call this datatype "Mutable" so that people calling functions with a MutablePtr can expect the data might be modified.

Next let's add the definitions for some common initializations of tensors:

Hopefully these definitions are pretty self explanatory. The New() functions are just wrappers around our other constructors that return Ptrs. Often we will initialize the parameters of our neural networks with a random tensor (more on this in a later post). It is nice to create tensors with all of the same Constant value such as 1 or 0 (this is often used for a bias term). And there are times when we are handed a non-mutable tensor, and will need to modify and return the values, hence ToMutable(). SetAll(float a_val) is a convenience function that the other methods will use.

Now let's implement these functions:

Most of these functions should be pretty easy to understand, all they are doing is initializing the tensor data with different values. For the Random() function, we will need to add some new includes, specifically chrono and random.

If you want to, go ahead and write some simple tests to make sure they work, and that no bugs get introduced into our code later. I have left them out of here for the sake of the posts length.

Math

Up til now we have just been creating and initializing our data structure. Now it's time to get math-y, after all neural networks are mainly composed of simple math functions on tensors.

Let's start by multiplying the tensor by a constant. This operation is straight forward, you just multiply each element by the constant. Add this function to your header:

We can do the same for dividing, adding, and subtracting a scalar from a tensor.

Using Tensors instead of Matrices

Scalar operations on tensors are useful for initialization and manipulating the values, but it is more common that we will be performing operations on tensors with other tensors. One of the most common operations with tensors will be matrix multiplication, assuming a 2 dimensional tensor.

We already implemented matrix multiplication earlier when implementing the linear layer in a simple feed forward neural network here.

For a refresher, let's look at our simple 2x3 * 3x2 example. Assuming "i" represents an index into a row, and "j" represents an index into a column, to get the (i, j) element when multiplying two matrices, you walk down the row i, and column j, and take the product (multiply) each pair of elements, then sum up all the products, to get the new value at (i,j).

You may remember we already implemented this functionality in our MatrixMath class from before. Let's modify our implementation before to work with our Tensor class instead of TMatrix (our typedef of vector>). Since this change will cascade throughout the rest of the codebase if we simply changed the interface for MatrixMath to take a TTensorPtr instead of a TMatrix, let's define a new class called TensorMath and port over the implementations one by one.

Create the new file include/neural/math/tensor_math.h, and add our first function Multiply().

The implementation will pretty much the same as MatrixMath::Multiply, except it will check to make sure the tensors that are passed in have the correct size (ie. they are matrices). The rest of the changes are for using our Tensor class instead of a vector of vectors.

Let's also copy over MatrixMathTest to TensorMathTest to make sure we didn't mess anything up.

We are well on our way! Let's port over the rest of our functions from the MatrixMath class so that eventually we can get rid of our TMatrix typedef and use tensors as our data structure everywhere.

Next let's port over Transpose().

And then port over our TestTranspose.

Next up is the AddCol() function.

This function needed a little love seeing as it was pretty simple to add a new column to a vector of vectors. Never fear, it isn't too bad. All we have to do is compute the offsets into the single dimensional data vector by multiplying the row by the number of columns, and go to the end. Then if we are at the end of the vector we push back, or if we are in the middle we just insert.

Let's move over our test to make sure it still works.

Great, let's port over our last function RemoveCol().

This has the same logic for calculating offsets as AddCol(), but instead starts erasing indices from the back, so that the vector data shifts to do mess with the offset. The test will just make sure the last column gets removed and shape is updated.

Now we have all the functionality ported over to work with 2D tensors instead of our TMatrix struct. Ideally these functions would work with n-D tensors, but for now we just want to get our code working with tensors at all. We will add the more robust functionality when we need it.

If you run all your tests you should see that you now have 17 passing! But don't get too excited, now we have to rip out TMatrix and MatrixMath... Let's start with getting rid of TMatrix and watch our code compilation explode. Let's remove the typedefs.h header that defines TMatrix.

rm include/neural/typedefs.h
rm include/math/matrix_math.h
rm src/matrix_math.cpp
cd build
cmake ..
make

Unfortunately, this will not be as easy as a simple find and replace, since the constructors and accessors are different between or Tensor class and TMatrix vector of vectors. Don't worry, it won't be that painful, and we can be confident in using and extending our Tensor data structure going forward.

The easiest way to do this refactor is going to be trying to compile without our TMatrix and fix the resulting error messages. The layer interface will be a good place to start, because it defines how data flows through the layers of our network. Change include/neural/layers/layer.h to include our new tensor.h header, and use TTensorPtr's as inputs and outputs instead of TMatrix.

This leaves us with:

If you compile again, you should get errors for the LinearLayer class which inherits from our Layer base class. Let's update the interfaces use TTensorPtr here as well.

Notice our weights are now also tensors. We want m_weights to be a TMutableTensorPtr, because we will be updating them as we learn the data. Other than that we have simply changed TMatrix to TTensorPtr.

In the linear_layer.cpp file, we can also start by replacing all the occurrences of TMatrix with TTensorPtr. You will notice that the constructor is adding a row to the weights matrix if there is a bias term. This seems like functionality that is missing from our TensorMath class, since we already have the AddCol() function defined there. Ideally our new LinearLayer constructor should be changed to look like this:

Let's go ahead and add this function to our TensorMath class.

Since we are assuming this only works for matrices and the data is stored in row major order, the implementation is pretty simple. We simply add the values to the end of our data and increase the shape size.

To make sure it works you can add the following test.

With that out of the way, the only other difference in the implementation will be changing MatrixMath:: to TensorMath::, and updating the getters/setters to work with TTensorPtr's instead of vector>. The full implementation is as follows.

Next up is our ReLULayer, which will simply be updating the data structure and the getter/setter interfaces as well.

Finally we can update our tools/feedforward_neural_net/main.cpp tool to use TTensorPtr instead of TMatrix.

Running our tool for 99 epochs over our simple dataset we see that it still works as intended! Predicting 0.9 for target = 1, 0.01 for target = 0 and -0.99 for target = -1. Our tests are not yet compiling, but I'll leave that as an exercise for the reader. If you have been following along it shouldn't be too hard to update these interfaces to work with TTensorPtrs with the same data. If you want the full source & tests you can find it on github.

We now have a pretty extensible data structure to use for inputs and outputs of our neural networks! This is great. Now we can move onto real world datasets and problems such as image recognition, text classification, and speech recognition, without having to change our interface much. In the next post we will dive into how to create a data loader for the MNIST dataset so that we can identify handwritten digits with a feed forward neural network.