No Generalization without Understanding and Explanation

submited by
Style Pass
2024-10-22 14:30:08

John McCarthy, one of the founders of (and the one who supposedly coined the term) artificial intelligence (AI), stated on several occasions that if we insist on building AI systems based on empirical methods (e.g., neural networks or evolutionary models), we might be successful in building “some kind of an AI,” but even the designers of such systems will not understand how such systems work (see, for an example, [1]). In hindsight, this was an amazing prediction, since the deep neural networks (DNNs) that currently dominate AI are utterly unexplainable, and their unexplainability is paradigmatic: there are no concepts and human-understandable features in distributed connectionist architectures but microfeatures that are conceptually and cognitively hollow, not to mention that these microfeatures are semantically meaningless. In contrast with the bottom-up data-driven approach, McCarthy spent his career advocating logical approaches that are essentially top-down model driven (or theory-driven) approaches. McCarthy’s observation about the unexplainability of empirical data-driven approaches is rooted in the long philosophical tradition of linking explainability, understanding, and generalization (see [4], [5], and [6] for discussions on the relationship between understanding and explanation in science). I will briefly discuss these concepts below, before I come back and suggest what, in our opinion, is wrong with insisting on reducing AI to a data-driven machine learning paradigm.

To consider the relationship between generalization, explanation, and understanding, let us consider an example. Suppose we have two intelligent agents, AG1 and AG2, and suppose we ask both to evaluate the expression “3 * (5 + 2).” Let us also suppose that AG1 and AG2 are two different “kinds” of robots. AG1 was designed by McCarthy and his colleagues, and thus it follows the rationalist approach in that its intelligence was built in a top-down, model- (or theory-) driven manner, while AG2’s intelligence was arrived at in a bottom-up data-driven (i.e., machine learning) approach (see Figure 1). Presumably, then, AG2 “learned” how to compute the value of arithmetic expressions by processing many training examples. As such, one can assume that AG2 has “encoded” these patterns in the weights of some neural network. To answer the query, all AG2 has to do is find something “similar” to 3 * (5 + 2) that it has encountered in the massive amount of data it was trained on. For simplicity, you can think of the memory of AG2 (with supposedly billions of parameters/weights) as a “fuzzy” hashtable. Once something ‘similar’ is detected, AG2 is ready to reply with an answer (essentially, it will do a look-up and find the most “similar” training example). AG1, on the other hand, has no such history, but has a model of how addition and multiplication work (perhaps as some symbolic function) and it can thus call the relevant modules to “compute” the expression according to the formal specification of the addition and multiplication functions.

Leave a Comment