We follow Pytorch3D
cameras. The camera extrinsic matrix is defined as the camera to world transformation, and uses right matrix multiplication, whereas the intrinsic matrix uses left matrix multiplication. Nevertheless, our interface provides opencv
convention that defines the camera the same way as an OpenCV
camera, would be helpful if you are more familiar with that.
-
Slice cameras:
In mmhuman3d, the recommended way to initialize a camera is by passing
K
,R
,T
matrix directly. You can slice the cameras by index. You can also concat the cameras in batch dim.from mmhuman3d.core.cameras import PerspectiveCameras import torch K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(100, 3) # Batch of K, R, T should all be the same or some of them could be 1. The final batch size will be the biggest one. cam = PerspectiveCameras(K=K, R=R, T=T) assert cam.R.shape == (100, 3, 3) assert cam.K.shape == (100, 4, 4) assert cam.T.shape == (100, 3) assert (cam[:10].K == cam.K[:10]).all()
-
Build cameras:
Wrapped by mmcv.Registry. In mmhuman3d, the recommended way to initialize a camera is by passing
K
,R
,T
matrix directly, but you also have the options to passfocal_length
andprinciple_point
as the input.Take the usually used
PerspectiveCameras
as examples. IfK
,R
,T
are not specified, theK
will use defaultK
bycompute_default_projection_matrix
with defaultfocal_length
andprincipal_point
andR
will be identical matrix,T
will be zeros. You can also specify by overwriting the parameters forcompute_default_projection_matrix
.from mmhuman3d.core.cameras import build_cameras # Initialize a perspective camera with given K, R, T matrix. # It is recommended that the batches of K, R, T either the same or be 1. K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(10, 3) height, width = 1000 cam1 = build_cameras( dict( type='PerspectiveCameras', K=K, R=R, T=T, in_ndc=True, image_size=(height, width), convention='opencv', )) # This is the same as: cam2 = PerspectiveCameras( K=K, R=R, T=T, in_ndc=True, image_size=1000, # single number represents square images. convention='opencv', ) assert cam1.K.shape == cam2.K.shape == (10, 4, 4) assert cam1.R.shape == cam2.R.shape == (10, 3, 3) assert cam1.T.shape == cam2.T.shape == (10, 3) # Initialize a perspective camera with specific `image_size`, `principal_points`, `focal_length`. # `in_ndc = False` means the intrinsic matrix `K` defined in screen space. The `focal_length` and `principal_point` in `K` is defined in scale of pixels. This `principal_points` is (500, 500) pixels and `focal_length` is 1000 pixels. cam = build_cameras( dict( type='PerspectiveCameras', in_ndc=False, image_size=(1000, 1000), principal_points=(500, 500), focal_length=1000, convention='opencv', )) assert (cam.K[0] == torch.Tensor([[1000., 0., 500., 0.], [0., 1000., 500., 0.], [0., 0., 0., 1.], [0., 0., 1., 0.]]).view(4, 4)).all() # Initialize a weakperspective camera with given K, R, T. weakperspective camera support `in_ndc = True` only. cam = build_cameras( dict( type='WeakPerspectiveCameras', K=K, R=R, T=T, image_size=(1000, 1000) )) # If no `K`, `R`, `T` information provided # Initialize a `in_ndc` perspective camera with default matrix. cam = build_cameras( dict( type='PerspectiveCameras', in_ndc=True, image_size=(1000, 1000), )) # Then convert it to screen. This operation requires `image_size`. cam.to_screen_()
-
Perspective:
format of intrinsic matrix: fx, fy is focal_length, px, py is principal_point.
K = [ [fx, 0, px, 0], [0, fy, py, 0], [0, 0, 0, 1], [0, 0, 1, 0], ]
Detailed information refer to Pytorch3D.
-
WeakPerspective:
format of intrinsic matrix:
K = [ [sx*r, 0, 0, tx*sx*r], [0, sy, 0, ty*sy], [0, 0, 1, 0], [0, 0, 0, 1], ]
WeakPerspectiveCameras
is orthographics indeed, mainly for SMPL(x) projection. Detailed information refer to mmhuman3d cameras. This can be converted from SMPL predicted camera parameter by:from mmhuman3d.core.cameras import WeakPerspectiveCameras K = WeakPerspectiveCameras.convert_orig_cam_to_matrix(orig_cam)
The pred_cam is array/tensor of shape (frame, 4) consists of [scale_x, scale_y, transl_x, transl_y]. See in VIBE.
-
FoVPerspective:
format of intrinsic matrix: K = [ [s1, 0, w1, 0], [0, s2, h1, 0], [0, 0, f1, f2], [0, 0, 1, 0], ]
s1, s2, w1, h1, f1, f2 are defined by FoV parameters (
fov
,znear
,zfar
, etc.), detailed information refer to Pytorch3D. -
Orthographics:
format of intrinsic matrix:
K = [ [fx, 0, 0, px], [0, fy, 0, py], [0, 0, 1, 0], [0, 0, 0, 1], ]
Detailed information refer to Pytorch3D.
-
FoVOrthographics:
K = [ [scale_x, 0, 0, -mid_x], [0, scale_y, 0, -mix_y], [0, 0, -scale_z, -mid_z], [0, 0, 0, 1], ]
scale_x, scale_y, scale_z, mid_x, mid_y, mid_z are defined by FoV parameters(
min_x
,min_y
,max_x
,max_y
,znear
,zfar
, etc.), related information refer to Pytorch3D.
-
Convert between different cameras:
We name intrinsic matrix as
K
, rotation matrix asR
and translation matrix asT
. Different camera conventions have different axis directions, and some use left matrix multiplication and some use right matrix multiplication. Intrinsic and extrinsic matrix should be of the same multiplication convention, but some conventions likePytorch3D
uses right matrix multiplication in computation procedure but passes left matrix multiplicationK
when initializing the cameras(mainly for better understanding). Conversion betweenNDC
(normalized device coordinate) andscreen
also influence the intrinsic matrix, this is independent of camera conventions but should also be included. If you want to use an existing convention, choose in['opengl', 'opencv', 'pytorch3d', 'pyrender', 'open3d']
. E.g., you want to convert your opencv calibrated camera to Pytorch3D NDC defined camera for rendering, you can do:from mmhuman3d.core.conventions.cameras import convert_cameras import torch K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(10, 3) height, width = 1080, 1920 K, R, T = convert_cameras( K=K, R=R, T=T, in_ndc_src=False, in_ndc_dst=True, resolution_src=(height, width), convention_src='opencv', convention_dst='pytorch3d')
Input K could be None, or
array
/tensor
of shape (batch_size, 3, 3) or (batch_size, 4, 4). Input R could be None, orarray
/tensor
of shape (batch_size, 3, 3). Input T could be None, orarray
/tensor
of shape (batch_size, 3). If the originalK
isNone
, it will remainNone
. If the originalR
isNone
, it will be set as identity matrix. If the originalT
isNone
, it will be set as zeros matrix. Please refer to Pytorch3D for more information about cameras inNDC
and inscreen
space.. -
Define your new camera convention:
If want to use a new convention, define your convention in CAMERA_CONVENTION_FACTORY by the order of right to, up to, and off screen. E.g., the first one is pyrender and its convention should be '+x+y+z'. '+' could be ignored. The second one is opencv and its convention should be '+x-y-z'. The third one is Pytorch3D and its convention should be '-xyz'.
OpenGL(PyRender) OpenCV Pytorch3D y z y | / | | / | |_______x /________x x________ | / | / / | / z / y | z /
Convert functions are also defined in conventions.cameras.
-
NDC & screen:
from mmhuman3d.core.conventions.cameras import (convert_ndc_to_screen, convert_screen_to_ndc) K = convert_ndc_to_screen(K, resolution=(1080, 1920), is_perspective=True) K = convert_screen_to_ndc(K, resolution=(1080, 1920), is_perspective=True)
-
3x3 & 4x4 intrinsic matrix
from mmhuman3d.core.conventions.cameras import (convert_K_3x3_to_4x4, convert_K_4x4_to_3x3) K = convert_K_3x3_to_4x4(K, is_perspective=True) K = convert_K_4x4_to_3x3(K, is_perspective=True)
-
world & view:
Convert between world & view coordinates.
from mmhuman3d.core.conventions.cameras import convert_world_view R, T = convert_world_view(R, T)
-
weakperspective & perspective:
Convert between weakperspective & perspective. zmean is needed. WeakperspectiveCameras is in_ndc, so you should pass resolution if perspective not in ndc.
from mmhuman3d.core.conventions.cameras import ( convert_perspective_to_weakperspective, convert_weakperspective_to_perspective) K = convert_perspective_to_weakperspective( K, zmean, in_ndc=False, resolution, convention='opencv') K = convert_weakperspective_to_perspective( K, zmean, in_ndc=False, resolution, convention='pytorch3d')
-
Project 3D coordinates to screen:
points_xydepth = cameras.transform_points_screen(points) points_xy = points_xydepth[..., :2]
-
Compute depth of points:
You can simply convert points to the view coordinates and get the z value as depth. Example could be found in DepthRenderer.
points_depth = cameras.compute_depth_of_points(points)
-
Compute normal of meshes:
Use
Pytorch3D
to compute normal of meshes. Example could be found in NormalRenderer.normals = cameras.compute_normal_of_meshes(meshes)
-
Get camera plane normal:
Get the normalized normal tensor which points out of the camera plane from camera center.
normals = cameras.get_camera_plane_normals()