Introduction

We present an open world 3D object detector that does not require 3D annotations.

Documentation

Preliminaries

We provide our conda environment to facilitate ease of reproducibility of our results. To build the required dependencies, run

conda env create --file environment.yml
conda activate zs3d

Datasets

Store your datasets in the datasets/ directory. To benchmark on nuScenes, you may extract the nuScenes minival data as datasets/nuScenes-mini/. Note that if you already have the data downloaded on another directory, you do not need to create a copy. You may create a symlink:

cd datasets
ln -s <path/to/dataset> nuScenes-mini

GroundingSAM

grounding_sam.py defines a pipeline for running inference on GroundingSAM. This pipeline is defined in the GroundingSAM class, and the inference functionality is exposed via a public method which can be directly called on the class object.

Example:

gsam = GroundingSAM()
box, score, label, mask = gsam("path/to/image", ["categories", "to", "detect"])

Additionally, grounding_sam.py offers a function for visualizing object bounding boxes, instance segmentation masks and category labels in plot_masks_and_boxes().

Benchmarking on nuScenes

infer_nuscenes.py runs GroundingSAM on each of the validation sequences in nuScenes mini. We store the model predictions in a json format similar to the COCO format for 2D object detection, defined as follows:

[
    {
        "sample_data_token" : str       // nuScenes sample data token
        "category"          : str       // category label 
        "mask"              : RLE       // binary mask encoded in RLE format, similar to COCO
        "score"             : float     // detector confidence
    },

    .
    .
    .
]

The sample data token uniquely identify a sample, i.e., a keyframe in a nuScenes sequence along with the appropriate modality. This serves as an analog to image_id used in the COCO prediction format.

Instead of storing category_id, we store the raw category string owing to the open-world problem setting.

By default, the predictions are stored in outputs/nuScenes-mini/mask_results_preds.json.

Inferring the GroundingSAM pipeline on nuScenes minival takes 12 minutes on one NVIDIA A100 GPU.

To backproject LiDAR onto the masks produced by GroundingSAM and create 3D bboxes, run:

python load_lidar_nuscenes.py

Change the variables at the top of load_lidar_nuscenes.py according to your environment.

To evaluate the predictions, run:

python eval_custom.py outputs/nuScenes-mini/predictions_naive.json --eval_set mini_val --version v1.0-trainval --dataroot /data2/mehark/nuScenes/nuScenes/ --verbose 10

Acknowledgement

Contributors

Atharv Goel | contact

Mehar Khurana | contact

Prakhar Gupta | contact

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
2DRotatingCaliper		2DRotatingCaliper
Grounded-Segment-Anything		Grounded-Segment-Anything
PCADetection		PCADetection
outputs/nuScenes-mini		outputs/nuScenes-mini
utils		utils
.gitignore		.gitignore
OV3D_Project_Report.pdf		OV3D_Project_Report.pdf
README.md		README.md
environment.yml		environment.yml
eval_custom.py		eval_custom.py
grounding_sam.py		grounding_sam.py
infer_nuscenes.py		infer_nuscenes.py
infer_unidepth_on_nuscenes.py		infer_unidepth_on_nuscenes.py
load_lidar_nuscenes.py		load_lidar_nuscenes.py
load_pseudolidar_nuscenes.py		load_pseudolidar_nuscenes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Documentation

Preliminaries

Datasets

GroundingSAM

Benchmarking on nuScenes

Acknowledgement

Contributors

About

Releases

Packages

Contributors 3

Languages

Zynade/open-world-3D-det

Folders and files

Latest commit

History

Repository files navigation

Introduction

Documentation

Preliminaries

Datasets

GroundingSAM

Benchmarking on nuScenes

Acknowledgement

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages