A Brief History of Deep Learning
By Kai-Fu Lee
My discussion in previous posts about developments in machine learning and AI that were demonstrated by a computer beating a world-class player in the complicated, ancient game of Go, led me to explore the world of deep learning and its consequences for the future of AI. In this and the next few posts, I will delve into deep learning to see where it came from and where I believe it's going.
Machine learning—the umbrella term for the field that includes deep learning—is a history- altering technology but one that is lucky to have survived a tumultuous half-century of research.
Ever since its inception, artificial intelligence has undergone a number of boom-and-bust cycles. Periods of great promise have been followed by “AI winters,” when a disappointing lack of practical results led to major cuts in funding. Understanding what makes the arrival of deep learning different requires a quick recap of how we got here.
Back in the mid-1950s, the pioneers of artificial intelligence set themselves an impossibly lofty but well-defined mission: to recreate human intelligence in a machine. That striking combination of the clarity of the goal and the complexity of the task would draw in some of the greatest minds in the emerging field of computer science: Marvin Minsky, John McCarthy, and Herbert Simon.
As a wide-eyed computer science undergrad at Columbia University in the early 1980s, all of this seized my imagination.
I was born in Taiwan in the early 1960s but moved to Tennessee at the age of eleven and finished middle and high school there. After four years at Columbia in New York, I knew that I wanted to dig deeper into AI. When applying for computer science Ph.D. programs in 1983, I even wrote this somewhat grandiose description of the field in my statement of purpose: “Artificial intelligence is the elucidation of the human learning process, the quantification of the human thinking process, the explication of human behavior, and the understanding of what makes intelligence possible. It is men’s final step to understand themselves, and I hope to take part in this new, but promising science.”
That essay helped me get into the top-ranked computer science department of Carnegie Mellon University, a hotbed for cutting-edge AI research. It also displayed my naïveté about the field, both overestimating our power to understand ourselves and underestimating the power of AI to produce superhuman intelligence in narrow spheres.
By the time I began my Ph.D., the field of artificial intelligence had forked into two camps: the “rule-based” approach and the “neural networks” approach. Researchers in the rule-based camp (also sometimes called “symbolic systems” or “expert systems”) attempted to teach computers to think by encoding a series of logical rules: If X, then Y. This approach worked well for simple and well-defined games (“toy problems”) but fell apart when the universe of possible choices or moves expanded. To make the software more applicable to real-world problems, the rule-based camp tried interviewing experts in the problems being tackled and then coding their wisdom into the program’s decision-making. (Hence the “expert systems” moniker.)
The “neural networks” camp, however, took a different approach. Instead of trying to teach the computer the rules that had been mastered by a human brain, these practitioners tried to reconstruct the human brain itself. Given that the tangled webs of neurons in animal brains were the only thing capable of intelligence as we knew it, these researchers figured they’d go straight to the source. This approach mimics the brain’s underlying architecture, constructing layers of artificial neurons that can receive and transmit information in a structure akin to our networks of biological neurons. Unlike the rule-based approach, builders of neural networks generally do not give the networks rules to follow in making decisions. They simply feed lots and lots of examples of a given phenomenon—pictures, chess games, sounds—into the neural networks and let the networks themselves identify patterns within the data. In other words, the less human interference, the better.
Differences between the two approaches can be seen in how they might approach a simple problem, identifying whether there is a cat in a picture. The rule-based approach would attempt to lay down “if-then” rules to help the program make a decision: “If there are two triangular shapes on top of a circular shape, then there is probably a cat in the picture.” The neural network approach would instead feed the program millions of sample photos labeled “cat” or “no cat,” letting the program figure out for itself what features in the millions of images were most closely correlated to the “cat” label.
I continue this look at the history of the different approaches to machine learning in my next post. In the meantime, I welcome your comments. I mentioned above how naïve I was in overestimating our power to understand ourselves and underestimating the power of AI to produce superhuman intelligence in narrow spheres. How about you? Have you had similar thoughts about AI, where it's going and how it compares to our own intelligence? Thank you for sharing.