This repository contains the code for the paper: "Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation". You can visit the Project Page and check out the ArXiv Paper.
AV-ALOHA builds upon the ALOHA 2 system and introduces active vision for bimanual robotic manipulation. This repository includes:
- Teleoperation and data collection
- Training models with LeRobot
- Evaluation on both simulated and real-world AV-ALOHA setups
For the VR teleoperation and stereo camera passthrough functionality, refer to the Unity App Repo.
Note: The code is under active development, and a more organized codebase will be available in future updates.
AV-ALOHA extends ALOHA 2 by adding another ViperX 300 S robot arm. To install the additional arm, we used two 840mm 2020 extrusions with 4 L brackets. The ZED Mini serves as the active vision camera, attached using custom 3D-printed parts available in assets/3D_printed_parts
.
-
Install ROS Noetic and follow the ALOHA Setup Instructions for software and hardware setup, excluding their repo.
-
Bind the active vision robot arm to
/dev/ttyDXL_puppet_middle
. -
Clone this repository:
cd ~/interbotix_ws/src git clone https://github.com/Soltanilara/av-aloha git submodule init git submodule update # build ROS packages cd ~/interbotix_ws catkin_make
-
Set up the Conda environment:
conda create -y -n lerobot python=3.10 conda activate lerobot conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
-
Install the ZED Python API by following these instructions.
-
Install additional dependencies:
pip install -e gym_guided_vision pip install -e lerobot pip install -r requirements.txt
-
Create a Firebase project and set up a Firestore database at Firebase Console.
-
In your Firestore database, set the rules as follows:
rules_version = '2'; service cloud.firestore { match /databases/{database}/documents { match /<your_password_for_webrtc>/{document=**} { allow read, write: if true; } } }
-
In Project Settings -> Service Accounts, generate a new private key and name it
serviceAccountKey.json
. Place this file in thedata_collection_scripts
directory. -
Create a file named
signalingSettings.json
indata_collection_scripts
and paste the following:{ "robotID": "<robot id for your robot (e.g. robot_1)>", "password": "<your password same as in firestore rules>", "turn_server_url": "<turn url>", "turn_server_username": "<turn username>", "turn_server_password": "<turn password>" }
# in data_collection_scripts/
python record_sim_episodes --task_name sim_insert_peg --episode_idx 0
python replay_sim_episode --task_name sim_insert_peg --num_arms <2 or 3>
-
In one terminal, launch the robot:
# in data_collection_scripts/ source launch_robot.sh
-
In another terminal, activate the environment:
# in data_collection_scripts/ source activate.sh python record_episodes --task_name occluded_insertion --episode_idx 0
# in data_collection_scripts/
python visualize_episodes.py --hdf5_path path/to/your/hdf5
# in repo root
huggingface-cli login
python lerobot/lerobot/scripts/push_dataset_to_hub.py \
--raw-dir path/to/your/dataset \
--repo-id <hf_id>/<dataset_name> \
--raw-format aloha_hdf5
# in repo root
python lerobot/lerobot/scripts/visualize_dataset.py \
--repo-id <hf_id>/<dataset_name> \
--episode-index 0
Ensure the config names are set correctly by modifying lerobot/lerobot/configs
. Start training with:
# in repo root
python lerobot/lerobot/scripts/train.py \
hydra.run.dir=outputs/train/sim_sew_needle_3arms_zed_static_wrist_act \
hydra.job.name=sim_sew_needle_3arms_zed_static_wrist_act \
device=cuda \
env=sim_sew_needle_3arms \
policy=zed_static_wrist_act \
wandb.enable=true
# in repo root
python lerobot/lerobot/scripts/eval.py \
-p outputs/train/sim_hook_package_2arms_wrist_act/checkpoints \
--out-dir outputs/eval/sim_hook_package_2arms_wrist_act \
eval.n_episodes=50 \
eval.batch_size=10 \
--save-video
-
Save your model to Hugging Face:
# in eval_scripts/ python save_policy.py \ --repo_id iantc104/sim_slot_insertion_3arms_zed_wrist_act \ --checkpoint_dir outputs/train/sim_slot_insertion_3arms_zed_wrist_act/checkpoints/014000/pretrained_model
-
Evaluate using the script in
eval_scripts
:# in eval_scripts/ python eval.py \ --policy iantc104/sim_slot_insertion_3arms_zed_wrist_act \ --episode_len 300 \ --num_episodes 50 \ --sim_env gym_guided_vision/SlotInsertion-3Arms-v0
# in eval_scripts/ python eval.py \ --policy iantc104/real_occluded_key_insertion_3arms_zed_act \ --episode_len 700 \ --num_episodes 50
@misc{chuang2024activevisionneedexploring,
title={Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation},
author={Ian Chuang and Andrew Lee and Dechen Gao and Iman Soltani},
year={2024},
eprint={2409.17435},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.17435},
}