It is becoming increasingly frequent to see introductions to deep learning where the subject is presented as a subtopic of Machine Learning, that is in turn described as a particular field of Artificial Intelligence.
While this presentation can indubitably have some didactic relevance, in particular for explaining what components of a software algorithm the machine is supposed to learn, from a more historical perspective it is quite confusing and it does not help to understand the mutual relations between the different fields and their evolution.
First of all, it is important to understand that Artificial Intelligence has not always been as popular as it is at present but, similarly to most scientific topics, traversed periods of mixed fortunes, alternating waves of love and disaffection. In particular, there are a couple of exceptionally gloom periods, known as AI-winters, during years 1974–1980 and 1987–1993, the first one following the famous Lighthill’s report, and the second one as a consequence of the failure of the “5th-generation computers” project.
As it is normal, during favourable periods many research fields tend to rally under the highly mediatic banner of AI, while in gloomy days they tend to better mark and confine their territory, making distinctions, and entrenching themselves on specific, highly specialized topics, and clearly identifiable methodologies.
The situation at the beginning of the century
Let’s have a look at the state of the discipline at the beginning of the century.
AI is still slowly recovering from its second winter, also suffering from the emerging (separated) field of Machine Learning (more on it later). Traditional AI is dominated by the historical topics: knowledge representation, expert systems and (constraint) logic programming. Neural Networks are indeed a part of AI, but play an extremely marginal role. We are still in an epoch of shallow networks, with most of the attention focused on recurrent NN and the recent LSTM models; networks are slow, difficult to train, and not particularly effective. The general perception is that NN is a dying topic, with no perspectives. In addition, due to their biological inspiration, they are looked at with particular suspicion from the majority of AI researchers, in view of the on going and somewhat surreal discussion about “strong” versus “weak” AI.
The lively, emerging field is Machine Learning. In this case, the trendy topic is Support Vector Machines (SVM) , followed, at great distance, by Bayesian models. Machine Learning is not interested in presenting itself as a subfield of AI; on the contrary, it tries to emphasize its distinctive methodologies, and the more solid, scientific background. It is instructive to see that in the book “Pattern Recognition and Machine Learning” by Bishop, one of the pillars of the discipline, there is not a single mention of Artificial Intelligence (but for the names of a few conferences in references). In this case too, Neural Networks are a subfield of ML (in Bishop’s book there is a full chapter on these models): they share with ML terminology, methodologies and relevant techniques. However, again, their role is absolutely marginal, and shallow networks are systematically outperformed by different techniques.
The two communities of AI and ML ignored (read, cordially detested) each other; moreover, they jointly hardly tolerated people working on Neural Networks, both for the already mentioned reasons, and the guiltless habit of “keeping foot in two worlds”.
A myriad of specialized topics
In addition to the above areas of research, further specialization on particular domains of application – vision, natural language processing, optical character recognition, speech comprehension, data mining, decision theory, robotics, … – contributed to fragment AI into a myriad of sub-fields that had essentially nothing to say to each other, and no interest in exchanging knowledge.
The case of Natural Language Processing (NLP) is paradigmatic. During the sixties, NLP was one of the main areas of application of AI, aggressively supported by government, mostly interested in the potential military applications of this line of research. However, results were modest and in 1964, the National Research Council created a commission – the Automatic Language Processing Advisory Committee (ALPAC) – to investigate the problem. The famous report, delivered in 1966, was extremely negative on the perspective of the field.
While the ALPAC report caused the end of the NRC direct financial support, more money arrived through the Defense Advanced Research Projects Agency (DARPA, previously known as ARPA). Another famous failure is the Speech Understanding Research program of the Carnegie Mellon University (CMU), generating a progressive frustration of DARPA (1971-74) that finally resulted in the cancellation of an annual grant of three million dollars (1974).
After the first winter of AI, the paths of AI and NLP start to diverge, giving rise to distinct communities, with little or no communication between them. In the ’80, NLP is mostly dominated by symbolic methods, but during the period between 1990 and 2010 there is a progressive use of machine learning algorithms, giving rise to the so called Statistical NLP. With the advent of the web at the beginning of the new century, increasing amounts of raw (unannotated) language data are finally available, favouring the development of statistical approaches, and particularly stimulating research on unsupervised and semi-supervised learning algorithms.
The advent of Deep Learning
What changed everything, was the advent of Deep Learning.
Research on deep neural networks has been going on for years, however the starting date of the Deep Learning “Revolution” is usually fixed in 2012, when a number of unexpected and remarkable results focused back the interest of many different communities on Neural Networks. An emblematic event is the ImageNet competition in October 2012, where the so called “Alex” net, a deep convolutional NN by Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton won the challenge by a significant margin over shallow machine learning methods. According to many authors, the ImageNet victory in 2012 is a sort of landmark for the new era of “deep learning”.
Since then, Deep Learning techniques rapidly fagocitated many of the fields that had previously departed from AI, imposing itself as a unifying and comprehensive framework.
To make an example, since around 2015, NLP has essentially abandoned statistical methods in favour of Deep Neural Networks. This shift entailed substantial changes in the design of NLP systems, usually characterized by end-to-end learning of high-level tasks, in contrast to the typical pipeline of statistical techniques. For instance, in Neural Machine Translation (NMT) the network is directly trained to learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation.
Computer vision is another field where Deep Learning algorithms have completely revolutionized the state-of-the-art. Object detection, semantic segmentation, face recognition, image denoising, super-resolution, 3D shaping, are just some of the many research areas where deep neural networks have replaced traditional techniques. In many cases, Deep Learning is proposing innovative challenges, as in the case of “panoptic” segmentation, or the key-point detection task for pose estimation.
Robotics too is currently dominated by Deep Reinforcement Learning (DRL) algorithms. In this case, the use of Deep Neural Networks allows to address the scalability problem of most of the traditional RL techniques, avoiding to explicitly model the state space, and using Neural Networks as function approximators for the relevant functions.
In conclusion, the fact of presenting DL as a subfield of ML that is a subfield of AI is somewhat reductive and does not completely reflect the complexity and amplitude of the phenomenon. Deep Learning, being based on neural network, is indeed an historical component of AI (differently from ML). On the other side, its techniques are surely closer to ML than to other fields of AI. In any case, DL is the real novelty of the renewed AI, and its beating heart. When you read news about new AI achievements in newspapers or other media, in the large majority of cases, those results have been obtained by deploying Deep Learning techniques. The diffusion of DL is pervasive, and for the moment the trend shows no signs of slowing down.