Writing The Code For Linear Regression With A Single Neuron
A C++ implementation of learning a linear model given a simple dataset
Hopefully you have a good sense of how a single neuron can learn given some data from the last post. In this post, we are going to write some code to make it concrete. At the end you will be able to experiment with different learning rates, and training data, to see how a simple linear system can model data with one input.
This code will be the starting point for a C++ framework for neural networks. We will start simple, and expand the interface to look more like that of PyTorch but written in C++. It may not have the performance, or optimizations that some of these modern neural network libraries do, but following along will give you priceless knowledge of what is going on under the hood of most neural network libraries.
Why C++ you may ask? Isn't Python the hot language for machine learning? Yes Python is definitely very good for rapid prototyping, and there are a lot of good libraries out there for neural networks. In fact, most of these libraries have a C and C++ backend that simply have a Python interface. I think of this series of posts as if we were dropped on a desert island with only g++ and the standard library. How could we write a neural network library from scratch? Also, there will not be very much C++ specific code, so feel free to follow along in any programming language you choose! It is actually a good learning exercise to port example code to another language, so that you are not simply copying and pasting, but actually thinking about the steps.
Let's start with our main function, defining our dataset, and setting up the training loop. Create a file called main.cpp, and add the following:
We have vectors of floats representing our input and output, the first value will be the input x, and the second the output y. We define the number of times we want to go over the data as numEpochs, and simply print out the training example in the inner loop.
Let's compile and run this example, just to make sure our environment is set up properly. Add this simple build script to get us going, call it build.sh
With build.sh and main.cpp in the same directory, make build.sh an executable and run it.
chmod u+x build.sh
This will make an executable called linear_regression you can run. So far all it does it print out the dataset, but this is a good start. Running the executable you should see output like this:
Great. Let's start on our first module, the LinearLayer module. If you read the last two posts, you should be familiar with the math for this module. Here is a refresher of what it looks like:
Let's start with calculating the linear output y=w_0x+w_1b. Add this new class above your main function:
We are using a vector of floats for the two weights, and a float for the input to the bias that will always be 1.0. Eventually this module should be able to support an arbitrary number of weights, but for now there will just be the two that we hard code. The "m_" notation simply stands for "member variable" and helps us keep track of which variables in our class are internal vs. external to a class.
With this simple class we can now start getting predictions in our main loop. Define your layer above the main loop:
Then add the prediction in the inner loop:
Compile and run, and we should see the predictions being made:
The predictions will be the same for each epoch since there is no learning process implemented yet, but that's ok, they line up with what we expect from this initial weight and bias. The input of x=2 gives output of y=1.5, and input x=4 gives output y=0.5:
Before we get to making this prediction better, let's define our Error module, that will calculate the mean squared error. Below the LinearLayer class and above your main function add this module:
All this module does right now is calculate the squared error between the output of the linear module and the target output. We will calculate the mean over the entire dataset for the total mean squared error.
Define an instance of this module in the main function below your linear layer:
Then use it in the inner loop to calculate error:
Now we can see the error for each of the examples, 2.25 for x_0 and 20.25 for x_1:
This is all the code for our forward pass! Now we have to add the equations for the derivatives and the backward pass. Let's start with the derivative of the error function with respect to it's input.
This is the first derivative we need in our backward pass, we want to know in which direction the error needs to be changed to be minimized. For more intuition on the derivatives we are calculating in this post you can refer back to the last post. Add a simple function to do this in our error module.
Then add the implementation below.
Pretty straight forward, we should have all these variables ready to go in our training loop, so let's add the derivative calculation below where we calculate error.
Now onto the derivatives for the linear layer. First we need to calculate the derivative of the linear function with respect to each of the weights. This is an easy one because each weight is simply multiplied by an input, and we know that the derivative of a linear function will be the slope of the line (ie the value that we are multiplying each weight by). So the derivative of the linear function with respect to w_0 will be x, and the derivative with respect to w_1 will be b.
We are interested in how each weight effects the total error though, not just the output of this module. In order to do this we can follow the chain rule, and multiply each of the derivatives for each weight, by the derivative calculated in the error module.
Let's combine this all into one function called Backward add it to our LinearLayer class.
We will use this gradient to update our weights at the end of each epoch. We want to keep track of the average gradient over the training data, as well as the average error to see how well we are doing. Each epoch we should see the average error go down, after we apply the gradient to the weights. Let's add the code to average, and see everything in action.
Our final training loop will look like this:
There are a few new methods and variables here. At the start of each epoch we define vectors to hold onto the errors and gradients as we iterate through the dataset.
At the end of each epoch, we calculate the average error, and average gradient given these vectors.
I've added two convenience functions to do the averaging above the main loop, CalcAverage, and CalcAverageGrad:
After this, we need to apply the gradients to the weights, and multiply them by a learning rate. Define the learning rate at the top of our main function.
Add a method to get the weights from our layer:
And then simply subtract the gradient multiplied by the learning rate to get the new set of weights.
Not too bad! In less than 200 lines of C++ code we have a system that can learn to fit a line. In fact, this code is already pretty modular, and without adding much more, we will be able to extend it to handle neural networks with more neurons, as well as deeper neural networks.
If you build and run the code, you will see the average error dropping with each epoch through the data.
Feel free to play with the learning rate at the top and see how fast the error drops. If you make the learning rate too large, you will see we miss the local minimum and the error explodes off into infinity.
This code should be a good starting point for a neural network library. I have put it all into one main.cpp file, but as the codebase grows we will organize it a little better, and extend it to work with arbitrary size and depth neural networks. If you want access to the whole implementation it is on github here: https://github.com/curiousinspiration/Linear-Regression-With-Single-Neuron.git. Hopefully you are excited to see what larger neural networks can do, because we are going to dive into it in the next post!