Why Machine Learning And Neural Networks Matter
A high level overview of why it is hard for computers to do the simplest things.
If you are interested in the field of Artificial Intelligence, Machine Learning, or Deep Learning you have probably heard the term "Neural Network" or "Artificial Neural Network”. Why is there the sudden increase in attention? What is so great about neural networks? Not only does the name inspire curiosity into how we could model the human brain, but in the past few years (2012-2018) there have been many breakthroughs in the field of artificial intelligence due to advances in the neural networks. Access to large datasets, faster computing, and the implementation of Deep Learning algorithms (such as neural networks) have been able to achieve human like performance in fields such as speech recognition, natural language processing, and image classification.
Many of these breakthroughs in Deep Learning might seem like they shouldn’t be that hard for a computer. In fact, many of us could do the things in the diagram above when we were young children. When I was five I knew whether that picture over there was of an airplane, a car, or a bird. I could understand when my parents called me for dinner. I could read simple sentences. Why has it taken so long for computers to be able to do such simple tasks?
One of the reasons is the computers ability to deal with ambiguities. Computer programs are good at breaking things down to "true" or "false". 1s and 0s. A classic example in speech recognition is the phrase "wreck a nice beach". Not a common phrase, but if you sound it out phonetically it has the same exact sounds as "recognize speech". To a computer this is hard to distinguish, because they sound very similar. As humans, throughout our lives we have conversations with one another, understand context, and understand that "recognize speech" is more likely in some cases, and "wreck a nice beach" is more likely in others.
Natural language processing (fancy jargon for understanding what these sentences mean) is another example of this. Many words are ambiguous, and do not fit into neat buckets of 1s and 0s or conditional logic. Take for example the word "fire". On the surface it is a pretty common word, but if you sit down to think about it, it can be used in many different contexts to mean different things.
"They went to the shooting range to fire a gun”
"They started a fire in the woods”
"The announcer played the fight song to fire up the crowd”
If you wanted to find all the articles about "forest fires" on the internet, it should be obvious now that simply searching for the word "fire" is not enough. We need to teach the computers context to understand which meaning of the word we are looking for.
It may not be obvious how to write a computer program to solve these problems above. It is hard to lay out a set of rules for a computer to make these decisions. Think about teaching a computer to distinguish images of cats and dogs. Ok let's see... Dogs have fur, paws, ears, walk on all fours, they bark, they walk on leashes ... The list goes on and on. Not only are a lot of these qualities similar to those of cats, but a computer only sees a grid of pixels when it displays an image, how do you turn these pixels into a description for the computer? It would take a lot of work to simply lay out rules for "there is fur in this picture", which is only one of the characteristics I listed above.
Photo creds: http://cs231n.github.io/classification/
For many years researchers focused on breaking the world down into rules that a computer can understand, but this could be a never ending list of rules, and conditions for which each rule applies. This is where machine learning comes in. Instead of manually writing down the rules, let's let the machine discover these patterns on it's own. After all, maybe it will pick up on things we never thought to write down.
An artificial neural network is simply one tool in the machine learning toolbox that can pick up on these patterns. There are others such as logistic regression, support vector machines, decision trees, k nearest neighbors classification, random forests, etc. All of these algorithms learn from the data they are given, rather than rules people write. While many of these algorithms have applications in the real world, we will be focused mainly on neural networks in this series of posts.
In the next post we will dive deeper into what a neural network is, and why it is useful.