Homepage



  » Welcome
  » About
  » Hosting

  General AI



  » Introductions
  » Finite State Machines
  » Ameliorated Future
  » Miscellaneous

  Neural Networks



  » Introductions
  » Backpropagation
  » Optimization
  » Simulators
  » Applied NNs
  » Sample Code
  » Image Recognition
  » Image Compression

  Artificial Life



  » Tutorials
  » Sample Code

  Genetic Algorithms



  » Libraries
  » Sample Code

  Fuzzy Logic



  » FAQ

  Games



  » Sample Code

  Reinforcement Learning



  » Tutorials
  » FAQ

You are in: Reinforcement Learning  /  FAQ  /  Nuts and Bolts of RL  /  Are RL methods stable with function approximation?
Are RL methods stable with function approximation?

The situation is a bit complicated and in flux at present. Stability guarantees depend on the specific algorithm and function approximator, and on the way it is used. This is what we knew as of August 2001:

  • For arbitrary nonlinear parameterized function approximation (FA), any temporal-difference (TD) learning method (including Q-learning and Sarsa) can become unstable (parameters and estimates going to infinity). [Tsitsiklis & Van Roy 1996]
  • TD(lambda) with linear FA converges near the best linear solution when trained on-policy... [Tsitsiklis & Van Roy 1997]
  • ...but may become unstable when trained off-policy (updating states with a different distribution than that seen when following the policy). [Baird 1995]
  • From which it follows that Q-learning with linear FA can also be unstable. [Baird 1995]
  • Sarsa(lambda), on the other hand, is guaranteed stable, although only the weakest of error bounds has been shown. [Gordon 2001]
  • New linear TD algorithms for the off-policy case have been shown convergent near the best solution. [Precup, Sutton & Dasgupta 2001]

Since then, the new Perkins and Precup result from NIPS 2002 has appeared, which may have at last resolved the question positively by proving the convergence of Sarsa with linear function approximation and an appropriate exploration regime.






Download Article
Printer Friendly
Back


All content copyrighted by Avaye.com