Skip to content

Latest commit

 

History

History
48 lines (43 loc) · 2.21 KB

format.md

File metadata and controls

48 lines (43 loc) · 2.21 KB

Talk2BEV-Base Format

Talk2BEV consists 3 parts: crops, cam_imgs, and scene. The folder structure should look like this:

- Talk2BEV/
    - TOKEN_NUM
        - cam_imgs/
            - 1_cimg.npy: perspective 1
            - 2_cimg.npy: perspective 2
            - ...
        - crops/
            - 1_matched_imgs.npy: object 1 crop
            - 2_matched_imgs.npy: object 2 crop
            - ...
        - scene/
            - answer_blip2_gt.json: ground truth scene object with captions using blip2
            - answer_blip2_pred.json: predicted truth scene object with captions using blip2
            - answer_minigpt4_gt.json: ground truth scene object with captions using minigpt4
            - answer_minigpt4_pred.json: predicted scene object with captions using minigpt4
            - answer_instructblip2_gt.json: ground truth scene object with captions using instructblip2
            - answer_instructblip2_gt.json: predicted truth scene object with captions using instructblip2
        - bev_gt.png: ground truth bev
        - bev_pred.png: predicted bev

The TOKEN is the NuScenes scene token ID, and NUM is the number of the scene. The folder crops contains the crop images of the objects. The folder cam_imgs contains the perspective images. The folder scene contains the ground truth and predicted scene objects. The files bev_gt.png and bev_pred.png are the ground truth and predicted BEV images. The BEV images are RGB images, with Blue (0, 0, 255) as background

Scene-Object

This is how a scene is encoded within an object:

[
    {
        "object_id": 1, # ID of this object
        "bev_centroid": [5, -5], # BEV centroid of this object",
        "bev_area": 10, # BEV area of this object in pixels
        "matched_coords": [[-5, -5], [6, 6],.. [12, 10]], # matched coordinates of this object
        "matched_cam": "CAM_FRONT" # matched camera for this object
        "matched_point": [800, 900], # matched point in the matched camera
        "annotation": {...}, # nuscenes annotation for this object - containing token ID, category etc.
        "MODEL_crop_lights_1": "th
        "MODEL_crop_lights_1": "thi",
    },
    ...
]