TY - JOUR

T1 - Spiking Neural Network Discovers Energy-Efficient Hexapod Motion in Deep Reinforcement Learning

AU - Naya, Katsumi

AU - Kutsuzawa, Kyo

AU - Owaki, Dai

AU - Hayashibe, Mitsuhiro

N1 - Publisher Copyright:
© 2013 IEEE.

PY - 2021

Y1 - 2021

N2 - In Deep Reinforcement Learning (DRL) for robotics application, it is important to find energy-efficient motions. For this purpose, a standard method is to set an action penalty in the reward to find the optimal motion considering the energy expenditure. This method is widely used for the simplicity of implementation. However, since the reward is a linear sum, if the penalty is too large, the system will fall into local minima and no moving solution can be obtained. In contrast, if the penalty is too small, the effect may not be sufficient. Therefore, it is necessary to adjust the amount of the penalty so that the agent always moves dynamically, and the energy-saving effect is sufficient. Nevertheless, since adjusting the hyperparameters is computationally expensive, we need a learning method that is robust to the penalty setting problem. We investigated on the Spiking Neural Network (SNN), which has been attracting attention for its computational efficiency and neuromorphic architecture. We conducted gait experiments using a hexapod agent while varying the energy penalty settings in the simulation environment. By applying SNN to the conventional state-of-the-art DRL algorithms, we examined whether the agent could explore for an optimal gait with a larger penalty variation and obtain an energy-efficient gait verified with Cost of Transport (CoT), a metric of energy efficiency for gait. Soft Actor-Critic (SAC)+SNN resulted in a CoT of 1.64, Twin Delayed Deep Deterministic policy gradient (TD3)+SNN resulted in a CoT of 2.21, and Deep Deterministic policy gradient (DDPG)+SNN resulted in a CoT of 2.08 (1.91 for normal SAC, 2.38 for TD3, and 2.40 for DDPG). DRL combined with SNN succeeded in learning more energy efficient gait with lower CoT.

AB - In Deep Reinforcement Learning (DRL) for robotics application, it is important to find energy-efficient motions. For this purpose, a standard method is to set an action penalty in the reward to find the optimal motion considering the energy expenditure. This method is widely used for the simplicity of implementation. However, since the reward is a linear sum, if the penalty is too large, the system will fall into local minima and no moving solution can be obtained. In contrast, if the penalty is too small, the effect may not be sufficient. Therefore, it is necessary to adjust the amount of the penalty so that the agent always moves dynamically, and the energy-saving effect is sufficient. Nevertheless, since adjusting the hyperparameters is computationally expensive, we need a learning method that is robust to the penalty setting problem. We investigated on the Spiking Neural Network (SNN), which has been attracting attention for its computational efficiency and neuromorphic architecture. We conducted gait experiments using a hexapod agent while varying the energy penalty settings in the simulation environment. By applying SNN to the conventional state-of-the-art DRL algorithms, we examined whether the agent could explore for an optimal gait with a larger penalty variation and obtain an energy-efficient gait verified with Cost of Transport (CoT), a metric of energy efficiency for gait. Soft Actor-Critic (SAC)+SNN resulted in a CoT of 1.64, Twin Delayed Deep Deterministic policy gradient (TD3)+SNN resulted in a CoT of 2.21, and Deep Deterministic policy gradient (DDPG)+SNN resulted in a CoT of 2.08 (1.91 for normal SAC, 2.38 for TD3, and 2.40 for DDPG). DRL combined with SNN succeeded in learning more energy efficient gait with lower CoT.

KW - Spiking neural network

KW - deep reinforcement learning

KW - energy efficiency

KW - hexapod gait

KW - spatiooral backpropagation

UR - http://www.scopus.com/inward/record.url?scp=85119595470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85119595470&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2021.3126311

DO - 10.1109/ACCESS.2021.3126311

M3 - Article

AN - SCOPUS:85119595470

SN - 2169-3536

VL - 9

SP - 150345

EP - 150354

JO - IEEE Access

JF - IEEE Access

ER -