Marvin Minsky on Reinforcement Learning - Dictionary of Arguments
Reinforcement Learning/Minsky: In the course of solving some problem, certain agents must have aroused certain other agents. So let's take reward to mean that if agent A has been involved in arousing agent B, the effect of reward is, somehow, to make it easier for A to arouse B in the future and also, perhaps, to make it harder for A to arouse other agents.
Problem: [such a machine] quickly learned to solve easy problems but never could learn to solve hard problems like building towers or playing chess.
Solution: (…) in order to solve complicated problems, any machine of limited size must be able to reuse its agents in different ways in different contexts.
Reinforcement: Problem: In the course of solving a hard problem, one will usually try several bad moves before finding a good one (…).
To avoid learning those bad moves, we could design a machine to reinforce only what happened in the last few moments before success. But such a machine would be able to learn only to solve problems whose solutions require just a few steps. Alternatively, we could design the reward to work over longer spans of time; however, that would not only reward the bad decisions along with the good but would also erase other things that it had previously learned to do. We cannot learn to solve hard problems by indiscriminately reinforcing agents or their connections.
Solution: [distinguishing local and global schemes]: The global scheme requires some way to distinguish not only which agents' activities have helped to solve a problem, but also which agents helped with which subproblems. For example, in the course of building a tower, you might find it useful to push a certain block aside to make room for another one. Then you'd want to remember that pushing can help in building a tower — but if you were to conclude that pushing is a generally useful thing to do, you'd never get another tower built.
>Goals/Minsky, >Memory/Minsky, >Intentions/Minsky._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Concept/Author], [Author1]Vs[Author2] or [Author]Vs[term] resp. "problem:"/"solution:", "old:"/"new:" and "thesis:" is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
The Society of Mind New York 1985
Semantic Information Processing Cambridge, MA 2003