Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Are RL methods stable with function approximation?

April 26, 2017approximation function methods RL STABLE

0

Posted

Are RL methods stable with function approximation?

1 Answer

0

Posted

The situation is a bit complicated and in flux at present. Stability guarantees depend on the specific algorithm and function approximator, and on the way it is used. This is what we knew as of August 2001: • For some nonlinear parameterized function approximators (FA), any temporal-difference (TD) learning method (including Q-learning and Sarsa) can become unstable (parameters and estimates going to infinity). [Tsitsiklis & Van Roy 1996] • TD(lambda) with linear FA converges near the best linear solution when trained on-policy… [ Tsitsiklis & Van Roy 1997] • …but may become unstable when trained off-policy (updating states with a different distribution than that seen when following the policy). [ Baird 1995] • From which it follows that Q-learning with linear FA can also be unstable. [ Baird 1995] • Sarsa(lambda), on the other hand, is guaranteed stable, although only the weakest of error bounds has been shown. [Gordon 2001] • New linear TD algorithms for the off-policy case have