# Six most influential ideas in the history of AI

AI, or artificial intelligence, is one of the more exciting fields of modern research. Neuron shares his pick of the most influential ideas that shaped the industry.

When dealing with modern, unbelievably complex technology of today it is important to remember that inventors always stand on the shoulders of giants, the great minds that lived and worked before them. Today, I'll show you my pick of the six most influential techniques that have shaped the field of applied **statistical analysis and artificial intelligence**.

## 1. Method of Least Squares, 1805

In late 18th century, Adrien-Marie Legendre, a great mathematician of his time, felt a particular obsession with predicting the future locations of comets. People would attribute all sorts of good and bad omens to seeing a comet and simply called it an act of God. Legendre, however, was sure that it is possible to predict when the next comet will be seen on the sky.

He couldn't deduct the laws of comets' movements without collecting some real world data about observations. And once he started aggregating these data he came up with a new method:

- start with guessing the future location of a comet
- observe its actual location
- remake the guess to reduce the sum of the squared errors.

This was the seed for further developing linear regression. Today, linear regression is a simple yet invaluable tool that is taught in schools and is used in a variety of fields, like marketing, chemistry, biology and finance. It is still widely used because of its simplicity and interpretability. It's still **the best way to identify a trendline**.

## 2. Gradient Descent, 1909

Legendre’s method of manually trying to reduce the error rate was time-consuming. Peter Debye, a Nobel prize winner from The Netherlands, formalized a solution for this process a century later.

Let’s imagine that Legendre had one parameter to worry about - we'll call it **X**. The **Y** axis represents the error value for each value of **X**. Legendre was searching for where **X** results in the lowest error. In this graphical representation, we can see that the value of X that minimizes the error **Y** is when **X = 1.1**.

*Positive and negative gradients of a function*

Peter Debye noticed that the slope to the left of the minimum is negative, while it’s positive on the other side. Thus, if you know the value of the slope at any given **X** value, you can guide **Y** towards its minimum.

Today, this technique is used in virtually all **neural networks and decision-tree based models**.

## 3. Linear Regression, 1922

*Linear Regression on one year of Bitcoin price in USD*

By combining the method of least squares and gradient descent you get linear regression. In the 1950s and 1960s, a group of experimental economists implemented versions of these ideas on early computers. The logic was implemented on physical punch cards - truly handmade software programs. It took several days to prepare these punch cards and up to 24 hours to run one regression analysis through the computer.

If you've always wondered what happens under the hood when you plot a trendline in Excel, or in R or python, you can play with this simulator.

## 4. The Perceptron, 1958

Enter Frank Rosenblatt - the guy who dissected rat brains during the day and searched for signs of extraterrestrial life at night. In 1958, he hit the front page of New York Times: "New Navy Device Learns By Doing" with a machine that mimics a neuron.

If you showed Rosenblatt's machine 50 sets of two images, one with a mark to the left and the other on the right, it could make the distinction without being pre-programmed. The public got carried away with the possibilities of a true learning machine. And for a good reason: based on Rosenblatt's statements, The New York Times reported the perceptron to be "**the embryo of an electronic computer that [ the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.**"

One year after the initial hype, Marvin Minsky and Seymour Papert destroyed the idea. At the time, Minsky and Papert ran the AI lab at MIT. They wrote a book proving that the perceptron could only solve linear problems. They also debunked claims about the multi-layer perceptron. Sadly, Frank Rosenblatt died in a boat accident two years later.

Just a year after the Minsky and Papert book was released, a Finnish master student discovered the theory to solve non-linear problems with multi-layered perceptrons. Because of the mainstream critic of the perceptron, the funding of AI dried up for more than a decade. This was known as the first AI winter.

The power of Minsky and Papert’s critique was the XOR problem. The logic is the same as the OR logic with one exception - when you have two true statements (1 & 1), you return False (0).

In the OR logic, it’s possible to divide the true combination from the false ones. But as you can see, you can’t divide the XOR logic with one linear function.

And thus, humanity already knew that perceptrons can only solve linear problems, multi-layered perceptrons could solve complex non-linear problems but were too expensive to train, required insurmountable amounts of data, and required complex feature-engineering steps (in the context of stock prices the word *features* can be replaced by *signals*, *indicators* or *patterns*).

## 5. Artificial Neural Networks, 1986

By 1986, several experiments proved that **neural networks** could solve complex nonlinear problems. At the time, computers were 10,000 times faster compared to when the theory was developed. This is how Rumelhart et al. introduced their legendary paper:

We describe a new learning procedure, back-propagation, for networks of neuron-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.

Nature 323, 533 - 536 (09 October 1986)

Backpropagation was the new, revolutionary method of automatically discovering predictive features in data. You no longer had to examine the data and find predictive patterns yourself, such as what is the best indicator of a hot-growing stock - the computer could do it for you if you could provide enough data.

Even though Geoffrey Hinton, the inventor of backpropagation, is saying that backpropagation is not the panacea, and we should work on other computer intelligence techniques, it proved to be extremely effective. The problem with this approach was the time it took to train such a model - you would need to spend ungodly amount of electricity on supercomputers to train these at the time.

## 6. Deep Neural Networks, 2012

*AlexNet, the 2012 paper that ended the AI winter*

In 2012, Geoffrey Hinton's student, Alex Krizhevsky, was finishing up his PhD in Artificial intelligence, and he happened to stumble upon an NVidia GPU. What if we rewrite the code for training neural networks for the GPU - then we would be able to compute way faster and finish training the neural network in time for the competition! It took around **5-6 days** and two GTX 580 3GB GPUs to train this beast of a neural network.

Their “large, deep convolutional neural network” ended up winning the 2012 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge). For those that aren’t familiar, this competition can be thought of as the annual Olympics of computer vision, where teams from across the world compete to see who has the best computer vision model for tasks such as classification, localization, detection, and more. 2012 marked the first year where a **deep neural network** was used to achieve a top 5 test error rate of 15.4% (Top 5 error is the rate at which, given an image, the model does not output the correct label with its top 5 predictions). The next best entry achieved an error of 26.2%, which was an astounding improvement that pretty much shocked the computer vision community.

And today, we're only starting to scratch the surface of **what is possible with AI**. There are countless industries where artificial intelligence will be able to assist humans in sifting through the data and empowering us to make better decisions.