TY - JOUR
T1 - Reinforcement Q-Learning Control With Reward Shaping Function for Swing Phase Control in a Semi-active Prosthetic Knee
AU - Hutabarat, Yonatan
AU - Ekkachai, Kittipong
AU - Hayashibe, Mitsuhiro
AU - Kongprawechnon, Waree
N1 - Funding Information:
This work was partly funded by the G-7 Scholarship Foundation and was also supported by the Data Sciences Program II (DSP II) of Tohoku University.
Publisher Copyright:
© Copyright © 2020 Hutabarat, Ekkachai, Hayashibe and Kongprawechnon.
PY - 2020/11/26
Y1 - 2020/11/26
N2 - In this study, we investigated a control algorithm for a semi-active prosthetic knee based on reinforcement learning (RL). Model-free reinforcement Q-learning control with a reward shaping function was proposed as the voltage controller of a magnetorheological damper based on the prosthetic knee. The reward function was designed as a function of the performance index that accounts for the trajectory of the subject-specific knee angle. We compared our proposed reward function to a conventional single reward function under the same random initialization of a Q-matrix. We trained this control algorithm to adapt to several walking speed datasets under one control policy and subsequently compared its performance with that of other control algorithms. The results showed that our proposed reward function performed better than the conventional single reward function in terms of the normalized root mean squared error and also showed a faster convergence trend. Furthermore, our control strategy converged within our desired performance index and could adapt to several walking speeds. Our proposed control structure has also an overall better performance compared to user-adaptive control, while some of its walking speeds performed better than the neural network predictive control from existing studies.
AB - In this study, we investigated a control algorithm for a semi-active prosthetic knee based on reinforcement learning (RL). Model-free reinforcement Q-learning control with a reward shaping function was proposed as the voltage controller of a magnetorheological damper based on the prosthetic knee. The reward function was designed as a function of the performance index that accounts for the trajectory of the subject-specific knee angle. We compared our proposed reward function to a conventional single reward function under the same random initialization of a Q-matrix. We trained this control algorithm to adapt to several walking speed datasets under one control policy and subsequently compared its performance with that of other control algorithms. The results showed that our proposed reward function performed better than the conventional single reward function in terms of the normalized root mean squared error and also showed a faster convergence trend. Furthermore, our control strategy converged within our desired performance index and could adapt to several walking speeds. Our proposed control structure has also an overall better performance compared to user-adaptive control, while some of its walking speeds performed better than the neural network predictive control from existing studies.
KW - magnetorhelogical damper
KW - Q-learning
KW - reinforcement learning
KW - reward shaping
KW - semi-active prosthetic knee
UR - http://www.scopus.com/inward/record.url?scp=85097363323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097363323&partnerID=8YFLogxK
U2 - 10.3389/fnbot.2020.565702
DO - 10.3389/fnbot.2020.565702
M3 - Article
AN - SCOPUS:85097363323
SN - 1662-5218
VL - 14
JO - Frontiers in Neurorobotics
JF - Frontiers in Neurorobotics
M1 - 565702
ER -