The following is the graduation project of Dor Bitton & Yuval Goshen, 2 Computer engineering Bsc. students from Technion Insitute - Haifa.
The Goal of the project is to use solve problem of long term planning. We built an environment for robots to navigate in a goal conditioned maze from an arbitrary start point to an arbitrary goal. The agent has to control the robot joint motors to move to the goal. This task is challenging because the robot has to plan and navigate through the maze from motor control, that problem has long horizon for planning.
We used a Deep Reinforcement Learning algorithms to solve the problem, but we divided the problem into 2 sub-problems that are solved independently by hierarchical agents.
- Trained in a separate environment to walk to a nearby subgoal (up to two times it's body size)
- No obstacles, just learn the task of "walking"
- Dense reward, but independent of the robot type. Reward is a function of distance from the goal plus an indicator that goal achieved.
- trained with DDPG algorithm
- trained to generate sub-goals for the stepper, which makes the horizon much shorter for the navigation part of the task.
- It is still different from solving a point robot maze, because the next state depends on the provided goal, and on the stepper which is not perfect
- the robot state is not fully observable for the navigator
- We tried using one of the following algorithms:
- TD3
- RRT planner on the maze map where the robot is a point robot
- RRT planner on the maze map, with extended walls to the robot size
- TD3-MP similar to DDPG-MP with demonstrations planned by RRT.
Dor Bitton - Linkedin - [email protected]
Yuval Goshen - Linkedin - [email protected]
Out work is mainly based on the following papers