In this article, a phased reinforcement learning algorithm for controlling complex systems is proposed. The key element of the proposed algorithm is a shaping function defined on a novel position-direction space. The shaping function is autonomously constructed once the goal is reached, and constrains the exploration area to optimize the policy. The efficiency of the proposed shaping function was demonstrated by using a complex control problem of positioning a 2-link planar underactuated manipulator.
- Human exploration-exploitation strategy
- Promising zone
- Reinforcement learning
- Shaping function