 























|
| You are in: Reinforcement Learning / FAQ / Nuts and Bolts of RL / Most RL work assumes the action space is discrete; what about continuous actions? |
| Most RL work assumes the action space is discrete; what about continuous actions? | |
It is true that most RL work has considered discrete action spaces, but this was usually done for convenience, not as an essential limitation of the ideas; and there are exceptions. Nevertheless, it is often not obvious how to extend RL methods to continuous, or even large discrete, action spaces. The key problem is that RL methods typically involve a max or sum over elements of the action space, which is not feasible if the space is large or infinite. The natural approach is to replace the enumeration of actions with a sample of them, and average (just as we replace the enumeration of possible next states with a sample of the, and average). This requires either a very special structure for the action-value function, or else a stored representation of the best known policy. Actor-critic methods are one approach.
With no attempt to be exhaustive, some of the earlier RL research with continuous actions includes:
- Williams, R.J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229--256.
- Millington, P.J. (1991). Associative Reinforcement Learning for Optimal Control, M.S. Thesis, Massachusetts Institute of Technology, Technical Report CSDL-T-1070.
- Baird, L. C., & Klopf, A. H. (1993). Reinforcement learning with high-dimensional, continuous actions. Technical Report WL-TR-93-1147. Wright-Patterson Air Force Base Ohio: Wright Laboratory.
- Santamaria, J.C., Sutton, R.S., Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces, Adaptive Behavior 6(2): 163-218.
See also:
I would be glad to include new or other work in this list as well. Please send me pointers!
| |