CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images,
Sookwan Han, Hanbyul Joo
International Conference on Computer Vision (ICCV), 2023
Initial code release, including the codes for training and evaluation.
To set up the necessary environments for running CHORUS, please refer to the instructions provided here.
Demo colab notebook is coming soon! (eta October 2023)
CHORUS is trained on a generated dataset of human-object interaction images. Here, we provide an example for running the entire dataset generation pipeline for the surfboard
category. To create this dataset of images, please activate the appropriate environment beforehand using the following command:
conda activate chorus_gen
CHORUS initially produces multiple HOI prompts for the given category surfboard
using ChatGPT. You can find example prompts for the surfboard
category under prompts/demo
directory. If you wish to generate prompts for other categories or create your own, follow the steps outlined below.
-
The OpenAI API relies on API keys for authentication. To generate prompts on your own, it is essential to have access to your own API key. If you don't have one already, please refer to this link.
-
After successfully configuring your API keys, execute the following command:
python scripts/generation/generate_prompts.py --categories 'surfboard'
to generate plausible HOI prompts for the specified
surfboard
category. By default, the results will be saved under theprompts/demo
directory.
Please note that the OpenAI API does not support random seeding, as mentioned here; hence, the results of prompt generation are not reproducible. To address this, we also provide the prompts used in our paper under the prompts/chorus
directory, as they may not be reproducible.
To generate the dataset of images from HOI prompts for the surfboard
category, execute the following command:
sh scripts/demo_surfboard_gen.sh $NUM_BATCH_PER_AUGPROMPT $BATCH_SIZE # Default: 20, 6
Please note the following details:
- The generation process typically requires around 9~10 hours when running on a single RTX 3090 GPU.
- If you wish to reduce the generation time (i.e., number of generated images), you can consider modifying the
$NUM_BATCH_PER_AUGPROMPT
argument (default: 20). - In case you encounter a CUDA Out Of Memory (OOM) error, you can alleviate this issue by reducing the batch size. Adjust the
$BATCH_SIZE
argument (default: 6) accordingly. - To resume the generation process, simply rerun the command. The program will automatically skip existing samples during the process.
- The generated images will be saved under the
results/images
directory by default.
Once the dataset generation is complete, CHORUS aggregates the information from images for 3D HOI reasoning. To execute the aggregation pipeline, please activate the appropriate environment beforehand using the following command:
conda activate chorus_aggr
With the generated dataset in place, you can execute the complete aggregation pipeline for the surfboard
category by running the following command:
sh scripts/demo_surfboard_aggr.sh
Please note the following details:
- To resume the aggregation process, simply rerun the command. The program will automatically skip existing samples during the process.
- After running the command successfully, you can check the visualizations under the
results_demo
directory!
For quantitative evaluation, we utilize the extended COCO-EFT dataset as our test dataset. To set up the test dataset, please follow the steps below.
-
Download the COCO dataset and the COCO-EFT dataset by running the following command:
sh scripts/download_coco_eft.sh
By default, the COCO dataset will be downloaded to the
imports/COCO
directory, and the COCO-EFT dataset will be downloaded to theimports/eft
directory. -
After downloading the datasets, preprocess and extend the dataset by executing the following command:
python scripts/evaluation/extend_eft.py
This script will prepare the dataset for evaluation.
To replicate the results in the paper, you have two options:
You can download the pretrained results using the following script:
sh scripts/download_pretrained_chorus.sh
This option is recommended if you have limited storage space or want to quickly access the pretrained results.
Note: This process requires at least 5TB of storage to save the generated results.
To fully reproduce the results, follow the steps below.
-
Activate the
chorus_gen
environment:conda activate chorus_gen
-
Generate the dataset for all categories used in quantitative evaluation by running:
sh scripts/run_quant_gen.sh
Please note that this step may require significant storage space.
-
Activate the
chorus_aggr
environment:conda activate chorus_aggr
-
Aggregate the information from images by running:
sh scripts/run_quant_aggr.sh
Please note that this step may require significant storage space.
We perform quantitative evaluations for COCO categories using the proposed Projective Average Precision (PAP) metrics. To compute PAP for the reproduced results, run the following command:
python scripts/evaluation/evaluate_pap.py --aggr_setting_names 'quant:full'
This command will calculate PAP for each category based on the reproduced results. To report the mean PAP (mPAP) averaged over all categories, execute the following command:
python scripts/evaluation/report_pap.py
If you find our work helpful or use our code, please consider citing:
@inproceedings{han2023chorus,
title = {Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images},
author = {Han, Sookwan and Joo, Hanbyul},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
year = {2023}
}
-
Our codebase builds heavily on
Thanks for open-sourcing!
-
We thank Byungjun Kim for valuable insights & comments!
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. However, please note that our code depends on other libraries (e.g., SMPL), which each have their own respective licenses that must also be followed.