What Is A Neural Network?
A high level overview of the inspiration behind neural networks and how they work.
Today, state of the art artificial intelligence algorithms commonly incorporate neural networks in order to help them make decisions, but what is a neural network? How do neural networks process information? In this post we will dive into the inspiration behind artificial neural networks, and give a high level overview of how they work.
When studying neural networks, it is important to start with the basics. We are going to break down the individual components needed for a neural network, so that hopefully there is nothing under the hood you do not understand. Although some of the examples at the beginning may seem trivial, and you may question whether they are "intelligent," don’t worry, they lay the groundwork for more complex systems and tasks.
Let’s start with a simple model of a neuron. Then, we'll work our way up to more complicated neural networks. We know from biology that a single neuron in our brain looks roughly like the following.
It has dendrites which receive signals, it processes them, passes them through the axon, to the axon terminal, which in turn passes them to the next neuron through the synapses.
The brain has about 100 billion of these little neurons, all connected and firing in parallel processing inputs and making decisions about things we see, hear, taste, smell etc.
Usually a neuron is in a resting state, passing no signal to it’s neighboring neurons. When we receive inputs from our environment, such as photons bouncing off the world and into our eyes, audio waves traveling into our ears, or that tasty ice cream hitting our tongue, these inputs spark small amounts of energy that activates our neurons.
Neurons are connected to one another in a many to many fashion, meaning the input of one neuron comes from many neurons, and the output goes to many other neurons. If the total energy from all the synapses that input into a neuron pass a certain threshold, then this neuron also activates. This causes a chain reaction with all the neurons it is connected to and so on.
Neurons are either in an active state or an inactive state depending the input energy and time between activations.
This process encodes our thoughts and provokes our actions. It is still somewhat of a mystery how and why this works, but we have managed to loosely model artificial neural networks on this framework, and they seem to work fairly well.
Artificial neural networks try to model neurons as described above. Below is a diagram of a small artificial neural network.
Although this is a very small neural network, it can actually make some sophisticated computations, and is a good jumping off point for explaining how artificial neural networks work.
Looking at the diagram, there are some inputs on the left (x_0, x_1, and b_0), which are connected to neurons in the middle (h_0, h_1, h_2, h_3) via a set of weights (w_0-w_12), which are connected to outputs on the right (o_0, o_1) via another set of weights (w_13-w_22).
The inputs on the left could be anything from pixels in an image, to values from an audio waveform, to characters in a sentence. The outputs on the right could be labels for the image, the word that was spoken in the audio waveform, or the sentiment of the sentence.
The weights in-between the neurons are supposed to represent the amount of signal sent from one neuron to another. The neurons receive their signal as the sum of all their inputs multiplied against the weights that are connected to them. This weighted sum is passed through an activation function to see if the neuron should “fire” or not. In mathematical terms, to compute if the hidden neuron h_0 is going to fire, we use the following equation:
Where sigma represents the sigmoid activation function:
The summation inside the sigmoid function to calculate h_0 is pretty straight forward. We just multiply the weights by their inputs and then add up the results. If this sum is a larger number, more signal passes through, where if the sum is smaller, less signal passes through.
The sigmoid function is a little more complicated, and is the gate keeper of how much signal passes through. Although it looks complicated, we can graph it to see how it works.
One nice trick when working with functions you are unsure about is to let Google graph it for you.
The sigmoid function is pretty cool. It asymptotes at y=0.0 and y=1.0, so that the values can never exceed this range, and works for infinite x values. Notice it hits y=0.99 around x=4.62 and y=0.01 at x=-4.62.
As the weighted sum gets large, the output of this function will be close to 1.0, and, if it is small, the output will be close to 0.0. This isn't binary like our brain, which can only have neurons be active or inactive, but it is pretty close. It is also a convenient function mathematically when we get to how neurons “learn” later.
We equip each hidden unit with this nonlinear activation, that squashes the input between 0 and 1, and sends it along to the next neurons in the network. If we have learned the proper weights between the neurons, then the network can make some intelligent predictions, similar to how the biological chain of neurons in our head does.
The output of this artificial neural network is just a set of numbers, which can be interpreted multiple ways. Two common interpretations are called regression, and classification. These two interpretations get you a long way in making predictions about the world.
Regression is the method of predicting a scalar value from a set of input variables. For example, if you had many facts about a home as input, you could predict its price (a scalar value) as the output.
Classification is taking the input data and classifying it into buckets. The output would indicate the bucket that the input belongs in. An example of classification could be deciding whether an image contains a cat or a dog. The neural network could output 0 if it thinks the image is a dog and 1 if it thinks it is a cat.
In order to classify images, and most kinds of data for that matter, we would need a lot more inputs than the two in our diagram above (plus the bias term b_0 which we will get to later). A neural network suited for image classification might have 784 inputs, one for each pixel in the image, two hidden layers, each with hundreds of neurons, and 10 outputs, one for each image type.
A great video explaining neural networks using a neural network that classifies hand written digits is on the youtube channel “three blue one brown” and can be found here.
Before we can get to more complicated tasks like regression over many input variables, or image classification, we need to dive deeper into how each individual neuron works and how they can learn simple functions on their own. Follow me to the next post which will go over how a single neuron can learn when given information about the world.