TY - GEN
T1 - PPMC Training Algorithm
T2 - 2nd International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020
AU - Blum, Tamir
AU - Jones, William
AU - Yoshida, Kazuya
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - In the pursuit of a fully autonomous learning agent able to interact, move, and be useful in the real world, two fundamental problems are path planning and motion control, and user-agent interaction. We address these through reinforcement learning using our Path Planning and Motion Controller (PPMC) Training Algorithm, which uses a combination of observable goals and randomization of goals during training, with a customized reward function, to teach a simulated quadruped agent to respond to user commands and to travel to designated areas. In this regard, we identified two critical components of path planning and motion control: the first is region enabled travel, or the ability to travel towards any location within a prescribed area; the second is multi-point travel, or the ability to travel to multiple points in succession. An important open ended question is how many tasks should be handled by a single policy and if a single policy can even learn to manage several tasks. We demonstrate that it is possible to contain both a maples path planner and motion controller on a single neural network, which could prove promising in future work due to their interlinked and synergistic nature. Using control group policies and various test cases and using ACKTR and PPO, we empirically validate our algorithm teaches the agent to respond to user commands as well as path planning and motion control.
AB - In the pursuit of a fully autonomous learning agent able to interact, move, and be useful in the real world, two fundamental problems are path planning and motion control, and user-agent interaction. We address these through reinforcement learning using our Path Planning and Motion Controller (PPMC) Training Algorithm, which uses a combination of observable goals and randomization of goals during training, with a customized reward function, to teach a simulated quadruped agent to respond to user commands and to travel to designated areas. In this regard, we identified two critical components of path planning and motion control: the first is region enabled travel, or the ability to travel towards any location within a prescribed area; the second is multi-point travel, or the ability to travel to multiple points in succession. An important open ended question is how many tasks should be handled by a single policy and if a single policy can even learn to manage several tasks. We demonstrate that it is possible to contain both a maples path planner and motion controller on a single neural network, which could prove promising in future work due to their interlinked and synergistic nature. Using control group policies and various test cases and using ACKTR and PPO, we empirically validate our algorithm teaches the agent to respond to user commands as well as path planning and motion control.
KW - ACKTR
KW - Autonomous Systems
KW - Control and Decision Systems
KW - Human Commanded Systems
KW - Machine Learning
KW - Path Planning
KW - PPO
KW - Reinforcement Learning
KW - Robotics
KW - Teleoperations
KW - Training Algorithm
UR - http://www.scopus.com/inward/record.url?scp=85084052779&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084052779&partnerID=8YFLogxK
U2 - 10.1109/ICAIIC48513.2020.9065237
DO - 10.1109/ICAIIC48513.2020.9065237
M3 - Conference contribution
AN - SCOPUS:85084052779
T3 - 2020 International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020
SP - 193
EP - 198
BT - 2020 International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 February 2020 through 21 February 2020
ER -