Vedant Raval*, Enyu Zhao*, Hejia Zhang, Stefanos Nikolaidis, Daniel Seita
University of Southern California
This repository is a python implementation of the paper "GPT-Fabric: Folding and Smoothing Fabric by Leveraging Pre-Trained Foundation Models", submitted to IROS 2024. This repository contains the code used to run the GPT-fabric simulation experiments for fabric folding. The code for performing fabric smoothing can be found in the repo GPT-Fabric-Smoothing
Important Update: According to OpenAI, some of the model checkpoints used by us during our experiments (for instance - gpt-4-vision-preview) is going to be deprecated. It is encouraged to use the recent models from OpenAI instead
- Installation
- Pre-requisites
- GPT-Fabric, zero-shot
- GPT-Fabric, in-context
- License
- Acknowledgements
- Contact
This simulation environment is based on SoftGym. You can follow the instructions in SoftGym to setup the simulator.
-
Clone this repository.
-
Follow the SoftGym to create a conda environment and install PyFlex. A nice blog written by Daniel Seita on using Docker may help you get started on SoftGym.
-
Install the following packages in the created conda environment:
- pytorch and torchvision:
pip install torchvision
orconda install torchvision -c pytorch
- einops:
pip install einops
- tqdm:
pip install tqdm
- yaml:
pip install PyYaml
- pytorch and torchvision:
-
Before you use the code, you should make sure the conda environment activated(
conda activate softgym
) and set up the paths appropriately:export PYFLEXROOT=${PWD}/PyFlex export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
The provided script
prepare_1.0.sh
includes these commands above.
-
To get started with GPT-Fabric for folding, you will need initial cloth configurations for evaluating the system as well as demonstration sub-goals for the folding tasks which would be used by the LLM to generate instructions.
-
To be consistent with prior work, we use the initial evaluation configurations for square and rectangular shaped fabric used by Foldsformer. You can find these in
cached configs/
. -
You can also generate configurations yourself by running
python generate_configs.py --num_cached 100 --cloth_type square
where
--num_cached
specifies the number of configurations to be generated, and--cloth_type
specifies the cloth type (square | rectangle | random). These generated initial configurations will be saved incached configs/
-
To get the demonstration sub-goals to be used by GPT in zero-shot setting, we used the same demonstration sub-goal images as provided by Foldsformer. You can find these in
data/demo
.
-
In order to evaluate the folds achieved by GPT-fabric, we need to compare the results with folds obtained by some expert system. This expert system can be found in the
Demonstrator
directory and can be ran usingpython generate_demonstrations.py --gui --task DoubleTriangle --img_size 128 --cached square python generate_demonstrations.py --gui --task DoubleStraight --img_size 128 --cached rectangle python generate_demonstrations.py --gui --task AllCornersInward --img_size 128 --cached square python generate_demonstrations.py --gui --task CornersEdgesInward --img_size 128 --cached square
where
--task
specifies the task name,--img_size
specifies the image size captured by the camera in the simulator, and--cached
specifies the filename of the cached configurations. You can remove--gui
to run headless. These generated demonstrations will be saved indata/demonstrations
. -
Note that since the same folding task could be achieved in various ways for the same cloth configuration, we consider all the different possible final cloth configurations corresponding to such a successful cloth fold as per the expert driven heuristic (aka the
Demonstrator
). -
For each cloth configuration,
0.png
is the top-down image corresponding to the initial state.{step#}-{fold#}.png
is the top-down image corresponding to the given step number{step#}
for the given specifgic way of achieving the successful fold represented as{fold#}
. The final cloth configuration will be saved as a pickle file given asinfo-{fold#}.pkl
. TO compute the mean particle position error (in mm) for evaluation, we consider the distances for all the possible final cloth configurations from the acheived final cloth configuration by GPT-Fabric and take the minimum of those.
-
To reproduce the results obtained by GPT-Fabric (GPT-4, zero-shot):
python eval.py --task DoubleTriangle --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type zero-shot python eval.py --task DoubleStraight --img_size 128 --gpt_model gpt-4-1106-preview --cached rectangle --total_runs 5 --eval_type zero-shot python eval.py --task AllCornersInward --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type zero-shot python eval.py --task CornersEdgesInward --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type zero-shot
-
To reproduce the results obtained by GPT-Fabric (GPT-3.5, zero-shot):
python eval.py --task DoubleTriangle --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type zero-shot python eval.py --task DoubleStraight --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached rectangle --total_runs 5 --eval_type zero-shot python eval.py --task AllCornersInward --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type zero-shot python eval.py --task CornersEdgesInward --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type zero-shot
-
The above script would run GPT-fabric for fabric folding corresponding to the specified
--task
for each initial cloth configuration in the saved--cached
configurations. The choice of which GPT model do we wanna use is specified by--gpt_model
and the--eval_type
corresponding to zero-shot would correspond to the zero-shot version of GPT-Fabric for fabric folding. In our reported results, we ran our system for each initial cloth configuration for a total of--total_runs
. If you just wish to test how the system performs without a specific need to reproduce our results then you can simply set--total_runs
as 1. We run our system for five times to account for the randomness in the LLM's responses. Log files will be generated corresponding to each test run in thelogs
directory. The evaluation results for each test runs are saved ineval result/
. For the sake of convenience, we organised the saved directories based on the date of the program execution, configuration type etc. The results can be organised in a different directory structure by trivial changes. The mean particle position errors for all the cloth configurations across all the total runs will be saved as a 2D numpy array inposition errors/
. -
In order to save the videos of the generated simulations, you can run the script with
--save_vid
as:python eval.py --task DoubleTriangle --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type zero-shot --save_vid True
The directory for the saved simulation videos can be changed via
--save_video_dir
-
In order to perform in-context learning for GPT-4V before generating instructions, we need to generate some expert demonstrations consisting of the demonstration sub-goal images
python generate_configs.py --num_cached 100 --cloth_type square python generate_configs.py --num_cached 100 --cloth_type rectangle
The above script will generate new configurations
square100
andrectangle100
, different from the evaluation configurations ofsquare
andrectangle
.python training-examples/generate_demonstrations.py --gui --task DoubleTriangle --img_size 224 --cached square100 python training-examples/generate_demonstrations.py --gui --task DoubleStraight --img_size 224 --cached rectangle100 python training-examples/generate_demonstrations.py --gui --task AllCornersInward --img_size 224 --cached square100 python training-examples/generate_demonstrations.py --gui --task CornersEdgesInward --img_size 224 --cached square100
The above script will generate expert demonstrations corresponding to the previously generated cloth configurations in
data/gpt-4v-incontext-demonstrations/
. These demonstrations are hypothesized to improve the instructions generated by analyzing the sub-goal images by GPT-4V. -
In order to perform in-context learning for GPT-4 or GPT-3.5 before generating actions, we need to get some expert demonstrations represented in natural language. This information can be found in
utils/gpt-demonstrations
. Note that the indices of actions taken for each config (represented as integer keys) in these json files start from 1 and the information corresponding to the 0 index action is simply a placeholder. -
This above mentioned demonstrations json, along with their visual action visualization, can be generated via (note that these will overwrite the existing jsons)
python training-examples/generate_gpt_demonstrations.py --gui --task DoubleTriangle --img_size 128 --cached square100 python training-examples/generate_gpt_demonstrations.py --gui --task DoubleStraight --img_size 128 --cached rectangle100 python training-examples/generate_gpt_demonstrations.py --gui --task AllCornersInward --img_size 128 --cached square100 python training-examples/generate_gpt_demonstrations.py --gui --task CornersEdgesInward --img_size 128 --cached square100
-
Having obtained a set of expert demonstrations for the two GPT models to choose from, we randomly select some of them (see
gpt_v_demonstrations
ingpt_utils.py
) to perform the in-context learning.
-
To reproduce the results obtained by GPT-Fabric (GPT-4, in-context):
python eval.py --task DoubleTriangle --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type in-context python eval.py --task DoubleStraight --img_size 128 --gpt_model gpt-4-1106-preview --cached rectangle --total_runs 5 --eval_type in-context python eval.py --task AllCornersInward --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type in-context python eval.py --task CornersEdgesInward --img_size 128 --gpt_model gpt-4-1106-preview --cached square --total_runs 5 --eval_type in-context
-
To reproduce the results obtained by GPT-Fabric (GPT-3.5, in-context):
python eval.py --task DoubleTriangle --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type in-context python eval.py --task DoubleStraight --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached rectangle --total_runs 5 --eval_type in-context python eval.py --task AllCornersInward --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type in-context python eval.py --task CornersEdgesInward --img_size 128 --gpt_model gpt-3.5-turbo-0125 --cached square --total_runs 5 --eval_type in-context
A lot of this code has been adapted from the repository used for Foldsformer. Feel free to check that out!
For any additional questions, feel free to email [email protected]