This is the latest version of our datasets, and is built upon GTA-V for expressive human pose and shape estimation. It features multi-person scenes with SMPL-X annotations. In addition to color image sequences, 3D bounding boxes and cropped point clouds (generated from synthetic depth images) are also provided.
A small sample of GTA-Human II can be downloaded from here. To download the full dataset, please see below.
The full set is currently hosted on OpenXLab. We recommend download files using CLI tools:
openxlab dataset download --dataset-repo OpenXDLab/GTA-Human --source-path /gta-human_v2_release --target-path /home/user/
You can selectively download files that you need, for example:
openxlab dataset download --dataset-repo OpenXDLab/GTA-Human --source-path /gta-human_v2_release/images_part_1.7z --target-path /home/user/gta-human_v2_release/
The dataset is also hosted on Hugging Face.
Hugging Face uses git-lfs
to manage large files.
Please make sure you have git-lfs installed. Then, follow the instructions below:
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/caizhongang/GTA-Human # do not pull any large files yet
cd GTA-Human
You may pull all files in GTA-Human II:
git lfs pull --include "gta-human_v2_release/*"
Similarly, you can also selectively download files that you need, for example:
git lfs pull --include "gta-human_v2_release/images_part_1.7z"
Please download the .7z files and place in the same directory. Note that you may not need the point clouds if you are working on image- or video-based methods.
gta-human_v2_release/
├── images_part_1.7z
├── images_part_2.7z
├── images_part_3.7z
├── images_part_4.7z
├── images_part_5.7z
├── point_clouds_1.7z
├── point_clouds_2.7z
├── point_clouds_3.7z
├── point_clouds_4.7z
└── annotations.7z
Then decompress them:
7z x "*.7z"
The file structure should look like this:
gta-human_v2_release/
├── images/
│ └── seq_xxxxxxx/
│ ├── 00000000.jpeg
│ ├── 00000001.jpeg
│ └── ...
│
├── point_clouds/
│ └── seq_xxxxxxx/
│ ├── bbox_aaaaaa_0000.ply # (bbox of person ID aaaaaa at frame 0)
│ ├── bbox_aaaaaa_0001.ply # (bbox of person ID aaaaaa at frame 1)
│ ├── ...
│ ├── bbox_bbbbbb_0000.ply # (bbox of person ID bbbbbb at frame 0)
│ ├── bbox_bbbbbb_0001.ply # (bbox of person ID bbbbbb at frame 1)
│ ├── ...
│ ├── pcd_aaaaaa_0000.pcd # (point cloud of person ID aaaaaa at frame 0)
│ ├── pcd_aaaaaa_0001.pcd # (point cloud of person ID aaaaaa at frame 1)
│ ├── ...
│ ├── pcd_bbbbbb_0000.pcd # (point cloud of person ID bbbbbb at frame 0)
│ ├── pcd_bbbbbb_0001.pcd # (point cloud of person ID bbbbbb at frame 1)
│ └── ...
│
└── annotations/
└── seq_xxxxxxx/
├── aaaaaa.npz
├── bbbbbb.npz
└── ...
To read the images:
import cv2
color_bgr = cv2.imread('/path/to/xxxxxxx.jpeg')
color_rgb = cv2.cvtColor(color_bgr, cv2.COLOR_BGR2RGB) # if RGB images are used
To read the 3D bounding boxes and cropped point clouds:
import open3d as o3d
import numpy as np
point_cloud_o3d = o3d.io.read_point_cloud('/path/to/xxxxxx.pcd') # geometry::PointCloud with n points.
point_cloud = np.array(point_cloud_o3d.points) # np.ndarray of (n, 3)
bounding_box = o3d.io.read_line_set('/path/to/xxxxxx.ply') # geometry::LineSet with 12 lines
Notes:
- The point clouds are cropped from the scene point cloud (generated from a synthetic depth image) using the 3D bounding boxes. The original depth image or scene point cloud are very large and are hence excluded from the dataset release.
- Only subjects with valid SMPL-X annotation have their point clouds released.
- We truncate the point clouds more than 10 m away from the camera as the typical maximum range of commercial depth sensors does not exceed 10 m. This means if subjects are more than 10 m away, their bounding boxes and point clouds are not recorded.
To read the annotations:
import numpy as np
annot = dict(np.load('/path/to/xxxxxxx.npz'))
for key in annot:
if isinstance(annot[key], np.ndarray) and person[key].ndim == 0:
annot[key] = annot[key].item()
Each .npz consists of the following:
{
'is_male': bool,
'ped_action': str,
'fov': float,
'keypoints_2d': np.array of shape (n, 100, 3),
'keypoints_3d': np.array of shape (n, 100, 4),
'occ': np.array of shape (n, 100),
'self_occ': np.array of shape (n, 100),
'num_frames': int,
'weather': str,
'daytime': tuple,
'location_tag': str,
'bbox_xywh': np.array of shape (n, 4),
'is_valid_smplx': bool,
'betas': np.array of shape (n, 10),
'body_pose': np.array of shape (n, 69),
'global_orient': np.array of shape (n, 3),
'transl': np.array of shape (n, 3),
'left_hand_pose': np.array of shape (n, 24),
'right_hand_pose': np.array of shape (n, 24),
}
Notes:
is_valid_smplx
indicates if the subject's annotation has valid SMPL-X parameters.- Valid SMPL-X annotations are those with sufficient movement and high-quality fitting.
- If invalid, SMPL-X parameters are not provided, but other annotations are still available.
- 3D bounding boxes and cropped point clouds are only available for subjects with valid SMPL-X.
- There are 35,352 valid sequences and 13,168 invalid sequences.
fov
has a constant value of 50.0.- keypoints
keypoints_3d
are 3D keypoints provided by the games' API, format is (x, y, z, 1.0).keypoints_2d
are projeced 3D keypoints on the image plane, format is (u, v, 1.0).- Definition of the 100 keypoints can be found in MMHuman3D.
occ
indicates if a keypoint is occluded.self_occ
indicates of a keypoint is occluded by the person's own body parts.daytime
uses a (hour, minute, second) convention.
We provide a pyrender-based visualization tool to overlay 3D SMPL-X annotations on 2D images. A small sample of GTA-Human II can be downloaded from here.
python visualizer_2d.py <--root_dir> <--seq_name> <--body_model_path> <--save_path>
- root_dir (str): root directory in which data is stored.
- seq_name (str): sequence name, in the format 'seq_xxxxxxxx'.
- body_model_path (str): directory in which SMPL body models are stored.
- save_path (str): path to save the visualization video.
Example:
python visualizer_2d.py --root_dir /home/user/gta-human_v2_release --seq_name seq_00087011 --body_model_path /home/user/body_models/ --save_path /home/user/seq_00087011_2dvisual.mp4
We also provide a visualization tool for 3D bounding boxes and cropped point clouds.
python visualizer_3d.py <--root_dir> <--seq_name> <--save_path> [--virtual_cam] [--visualize_smplx] [--body_model_path]
- root_dir (str): root directory in which data is stored.
- seq_name (str): sequence name, in the format 'seq_xxxxxxxx'.
- save_path (str): path to save the visualization video.
- virtual_cam (str, optional): path to load virtual camera pose config. Defaults to assets/virtual_cam.json.
- visualize_smplx (flag, optional): whether to visualize SMPL-X 3D mesh model.
- body_model_path (str, optional): directory in which SMPL-X body models are stored.
Example:
python visualizer_3d.py --root_dir /home/user/gta-human_v2_release --seq_name seq_00087011 --save_path /home/user/seq_00087011_3dvisual.mp4
Note that the SMPL-X model path should consist the following structure:
body_models/
└── smplx/
└── SMPLX_NEUTRAL.npz
The body models may be downloaded from the official website.