Alison Gropnik on Deep Learning - Dictionary of Arguments
Brockman I 224
A. Bottom-up deep learning: In the 1980s, computer scientists devised an ingenious way to get computers to detect patterns in data: connectionist, or neural-network, architecture (the “neural” part was, and still is, metaphorical). The approach fell into the doldrums in the 1990s but has recently been revived with powerful “deep-learning” methods like Google’s DeepMind. E.g., give the program a bunch of Internet images labeled “cat” etc. The program can use that information to label new images correctly.
Unsupervised learning: can detect data with no labels at all; these programs simply look for clusters of features (factor analysis.)
Reinforcement learning: In the 1950s, B. F. Skinner, building on the work of John Watson, famously programmed pigeons to perform elaborate actions (…) by giving them a particular schedule of rewards and punishments. The essential idea was that actions that were rewarded would be repeated and those that were punished would not, until the desired behavior was achieved. Even in Skinner’s day, this simple process, repeated over and over, could lead to complex behavior. >Conditioning.
E.g., researchers at Google’s DeepMind used a combination of deep learning and reinforcement learning to teach a computer to play Atari video games. The computer knew nothing about how the games worked.
Brockman I 225
These bottom-up systems can generalize to new examples; they can label a
Brockman I 226
new image as a cat fairly accurately over all. But they do so in ways quite different from how humans generalize. Some images almost identical to a cat image won’t be identified by us as cats at all. Others that look like a random blur will be.
B. Top-down Bayesian Models: The early attempts to use this approach faced two kinds of problems.
1st Most patterns of evidence might in principle be explained by many different hypotheses: It’s possible that my journal email message is genuine, it just doesn’t seem likely.
2nd Where do the concepts that the generative models use come from in the first place? Plato
and Chomsky said you were born with them. But how can we explain how we learn the latest concepts of science?
Solution: Bayesian models combine generative models and hypothesis testing. (>Bayesianism). A Bayesian model lets you calculate how likely it is that a particular hypothesis is true, given the data. And by making small but systematic tweaks to the models we already have, and testing them against the data, we can sometimes make new concepts and models from old ones.
Brockman I 227
VsBaysianism: The Bayesian techniques can help you choose which of two hypotheses is more likely, but there are almost always an enormous number of possible hypotheses, and no system can efficiently consider them all. How do you decide which hypotheses are worth testing in the first place?
Top-Down method: E.g., Brenden Lake a New York University and colleagues used top-down methods to solve a problem that is easy for people but extremely difficult for computers: recognizing unfamiliar handwritten characters.
Bottom-up method: this method gives the computer thousands of examples (…) and lets it pull out the salient features.
Top-down method: Lake et al. gave the program a general model of how you draw a character: A stroke goes ether right or left; after you finish one, you start another; and so on. When the program saw a particular character, it could infer the sequence of strokes that were most likely to have led to it (…). Then it could judge whether a new character was likely to result from that sequence or from a different one, and it could produce a similar set of strokes itself. The program worked much better than a deep-learning program applied to exactly the same data, and it closely mirrored the performance of human beings.
Brockman I 228
Bottom-up: here, the program doesn’t need much knowledge to begin with, but it needs a great deal of data, and it can generalize only in a limited way.
Top-down: here, the program can learn from just a few examples and make much broader and more varied generalizations, but you need to build much more into it to begin with.
Learning in Children/Gropnik: (…) the truly remarkable thing about human children is that they somehow combine the best features of each approach and then go way beyond them. Over the past fifteen years, developmentalists have been exploring the way children learn structure from data. Four-year-olds can learn by taking just one or two examples of data, as a top-down system does, and generalizing to very different concepts. But they can also learn new concepts and models from the data itself, as a bottom-up system does. Young children rapidly learn abstract intuitive theories of biology, physics, and psychology in much the way adult scientists do, even with relatively little data.
Gropnik, Alison “AIs versus Four-Year-Olds”, in: Brockman, John (ed.) 2019. Twenty-Five Ways of Looking at AI. New York: Penguin Press._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Possible Minds: Twenty-Five Ways of Looking at AI New York 2019