Goat Planner: Robot Control with Multimodal Perception

Goat Planner is a system for planning and executing high-level tasks with a robot. It is mainly design to be used with rove, the CAPRA's student club robot. The system uses a fine-tuned version of LLaMA3 to generate behavior trees for robot task planning and execution and multiple AI pipelines to perform perception, mapping, and voice processing. This work builds on top of multiple recent advances in LLM-based robotics research (e.g. [1,2,3]) and multimodal perception (e.g. [4]).

Demonstration: GoatBrain Video

Overview

Goat Planner bridges the gap between natural language commands and robot execution by:

Translating natural language instructions into structured behavior trees
Providing real-time execution monitoring and control
Supporting both standalone operation and ROS2 integration
Offering multiple interfaces (CLI, Web UI) with voice capabilities
Integrating with 3D mapping and perception systems

Image	Description
	The Goat Planner web interface provides an intuitive dashboard for controlling robot behavior. Users can interact with the robot through natural language commands, view the available objects and visualize the current behavior tree.
	Spatial querying capabilities allow the robot to understand and reason about its environment with the help of Shepherd. In this image, the robot is asked to show an picture of the nature, which is corresponding to no real label.
	The simulator use is Rove in the aws environment using Gazebo.

Features

Language Model Integration: Uses a fine-tuned Ollama model for task planning and behavior tree generation
Behavior Tree Generation: Automatically generates plans compatible with BehaviorTree.CPP
ROS2 Integration: Distributed node architecture for robotic control
Real-time Pipeline: Continuous monitoring and execution of generated plans
Web Interface: React-based frontend for user interaction and plan visualization
Voice Capabilities: Integrated speech-to-text and text-to-speech functionality
3D Perception: Support for multimodal 3D mapping and scene understanding

TODO

Improve shepherd object detection speed
Fine-tune goat planner model to improve tree generation
Add querying from the web interface
Add voice detection (segmentation, wake word, etc.)
Add unit tests to make sure the system is working as expected
Add more robust logging
Perform LLM BLIP querying
Monitoring behavior tree execution

Installation

Standalone Installation

git clone https://github.com/SimonR99/goat-planner.git
cd goat-planner
pip install -r requirements.txt
pip install -e .

Web Interface Setup

For the frontend part :

cd src/goat_planner/frontend
npm install
npm start

For the backend part :

cd src/goat_planner/flask_server
python main.py

CLI Interface

python -m goat_planner.cli_chat --tts

ROS2 Integration

Clone the repository in your ROS2 workspace and build the package:

colcon build --packages-select goat_behavior

Launch the ROS2 services:

source install/setup.bash
ros2 launch goat_behavior goat_behavior.launch.py

References

[1] Jeong, H., Lee, H., Kim, C., & Shin, S. (2024). A Survey of Robot Intelligence with Large Language Models. Applied Sciences.

[2] Song, C. H., Wu, J., Washington, C., Sadler, B. M., Chao, W. L., & Su, Y. (2024). LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models.

[3] Lykov, A., & Tsetserukou, D. (2024). LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model.

[4] Jatavallabhula, K. M., Kuwajerwala, A., Gu, Q., Omama, M., Chen, T., Maalouf, A., Li, S., Iyer, G., Saryazdi, S., Keetha, N., Tewari, A., Tenenbaum, J. B., de Melo, C. M., Krishna, M., Paull, L., Shkurti, F., & Torralba, A. (2023). ConceptFusion: Open-set Multimodal 3D Mapping. Robotics: Science and Systems (RSS).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Goat Planner: Robot Control with Multimodal Perception

Overview

Features

TODO

Installation

Standalone Installation

Web Interface Setup

CLI Interface

ROS2 Integration

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Goat Planner: Robot Control with Multimodal Perception

Overview

Features

TODO

Installation

Standalone Installation

Web Interface Setup

CLI Interface

ROS2 Integration

References