How does RL relate to Neuro-Dynamic Programming?

Article from www.avaye.com


How does RL relate to Neuro-Dynamic Programming?

To a first approximation, Reinforcement Learning and Neuro-Dynamic Programming are synonomous. The name "reinforcement learning" came from psychology (although psychologists rarely use exactly this term) and dates back to the eary days of cybernetics. For example, Marvin Minsky used this term in his 1954 thesis, and Barto and Sutton revived it in the early 1980's. The name "neuro-dynamic programming" was coined by Bertsekas and Tsitsiklis in 1996 to capture the idea of the field as a combination of neural networks and dynamic programming.

In fact, neither name is very descriptive of the subject, and I recommend you use neither when you want to be technically precise. Names such as this are useful when referring to a general body of research, but not for carefully distinguishing ideas from one another. In that sense, there is no point in trying to draw a careful distinction between the referents of these two names.

The problem with "reinforcement learning" is that it is dated. Much of the field does not concern learning at all, but just planning from complete knowledge of the problem (a model of the environment). Almost the same methods are used for planning and learning, and it seems off-point to emphasize learning in the name of the field. "Reinforcement" does not seem particularly pertinent either.

The problem with "neuro-dynamic programming" is similar in that neither neural networks nor dynamic programming are critical to the field. Many of the methods, such as "Monte Carlo", or "rollout" methods, are completely unrelated to dynamic programming, and neural networks are just one choice among many for method of function approximation. Moreover, one could argue that the component names, "neural networks" and "dynamic programming", are each not very descriptive of their respective methods.


< Back

All content copyrighted by Avaye.com