- Data configuration
- Args
- name: Support
SceneFlow
,KITTI2012
,KITTI2015
,DrivingStereo
,Middlebury
andETH3D
now.- root: The path of your dataset.
- train_list: The path of your train list.
- val_list: The path of your val list.
- test_list: The path of your test list.
- train_batch_size: The batch size of training. (default: 1)
- val_batch_size: The batch size of validation. (default: 1)
- num_workers: The number of workers to collect data for each GPU. (default: 1)
- pin_memory: If
True
, copy data to pinned memory before returning it.- train_transform: The data augmentation method for training.
- shuffle: If
True
, shuffle the data when training. (default:True
)- return_right_disp: If
True
, return the right disparity map. (default:True
).- return_occ_mask: If
True
, return the occlusion mask. (default:False
) only supportFlyingThings3DSubset
now.- image_reader: The image reader. (default:
PIL
) only change this if you know what you are doing.- disp_reader: The disparity reader. (default:
PIL
) only change this if you know what you are doing.- for batch_uniform:
- batch_uniform: If
True
, randomly change the size of the image between based on random type. (default:False
)- random_type: The type of random size. Support
choice
andrange
.- when
random_type
isrange
:
- h_range: Min scale and max scale for height. (eg: [0.5, 2.0]) only work when
batch_uniform
isTrue
andrandom_type
isrange
.- w_range: Min scale and max scale for width. (eg: [0.5, 2.0]) only work when
batch_uniform
isTrue
andrandom_type
isrange
.- when
random_type
ischoice
:
- h_range*: A list of height range. (eg: [256, 288, 320]) only work when
batch_uniform
isTrue
andrandom_type
ischoice
.- w_range*: A list of width range. (eg: [480, 512, 544]) only work when
batch_uniform
isTrue
andrandom_type
ischoice
.- transform:
- type:
RandomCrop
: Randomly crop the image. args:size
(h,w): The size of the crop.- type:
RandomHorizontalFlip
: Randomly flip the image horizontally. args:p
: The probability of flipping.- type:
GetValidDisp
: Get the valid disparity map. args:max_disp
: The maximum disparity.- type:
TransposeImage
: Transpose the image to (C, H, W).- type:
ToTensor
: Convert the image to tensor.- type:
NormalizeImage
: Normalize the image. args:mean
: The mean of the image, usually: [ 0.485, 0.456, 0.406 ].std
: The standard deviation of the image, usually: [ 0.229, 0.224, 0.225 ].- type:
CropOrPad
: Crop or pad the image. args:size
(h,w): The size of the image.- type:
DivisiblePad
: Pad the image to make the height and width divisible by the given number. args:by
: The number to be divisible.
- Loss function
- Args
- log_prefix: the prefix of loss log. (required)
- loss_term_weight: loss weight. (default: 1.0)
- type: Loss function type, support
Smooth_l1_Loss
&Weighted_Smooth_l1_Loss
. (required)- others: Please refer to loss function you chosen.
The log_prefix must match the output of the module's training_disp
. Please refer disp_processor/gwcnet.py
for details.
- Optimizer
- Args >
- solver: Optimizer type, example:
SGD
,Adam
,AdamW
, etc.
* **others**: Please refer to `torch.optim`.
- Learning rate scheduler
- Args
- scheduler : Learning rate scheduler, example:
MultiStepLR
.- others : Please refer to
torch.optim.lr_scheduler
.
- Model to be trained
- Args
- model : Model name, please refer to Model Library for the supported values.
- base_config : The base configuration of the model, which is the parameters shared by backbone, cost_processor and disp_processor.
- backbone : The backbone of the model.
- type : The name of backbone.
- others : Please refer to the backbone you choose.
- cost_processor : The cost processor of the model.
- type : The name of cost processor.
- others : Please refer to the cost processor you choose.
- disp_processor : The disparity processor of the model.
- type : The name of disparity processor.
- others : Please refer to the disparity processor you choose.
- others : Please refer to the model you choose.
Note: Only model name is required. If you define your own model, you can ignore the backbone
, cost_processor
and disp_processor
settings.
- Args
- save_name:
str
The name of the experiment. (required)- total_epoch:
int
The total epoch of training. (required)- restore_hint:
int
value indicates the epoch number of restored checkpoint;str
value indicates the path to restored checkpoint.- resume:
bool
values. IfTrue
, resume training from the latest checkpoint.- optimizer_reset:
bool
values. IfTrue
, reset the optimizer.- scheduler_reset:
bool
values. IfTrue
, reset the scheduler.- warmup_reset:
bool
values. IfTrue
, reset the warmup.- log_iter:
int
values. The interval of logging.- save_every:
int
values. The interval of saving checkpoint.- val_every:
int
values. The interval of validation.- amp:
bool
values. IfTrue
, use automatic mixed precision training.- sync_bn:
bool
values. IfTrue
, use synchronized batch normalization.- fix_bn:
bool
values. IfTrue
, fix the batch normalization.- init_parameters:
bool
values. IfTrue
, initialize the parameters.- optimizer_cfg: (required)
- solver:
str
values. The name of optimizer. (required)- lr:
float
values. The learning rate. (required)- others: Please refer to
torch.optim
.- scheduler_cfg:
- scheduler:
str
values. The name of scheduler.- gamma:
float
values. The decay rate of learning rate.- milestones:
list
values. The milestones of learning rate.- others: Please refer to
torch.optim.lr_scheduler
.- warmup:
- warmup_steps:
int
values. The warmup epoch.- clip_grad_cfg:
- type:
norm
orvalue
- max_norm:
float
values. The maximum norm of gradient. (only fornorm
type)- max_value:
float
values. The maximum value of gradient. (only fornorm
type)- clip_value:
float
values. The clip value of gradient. (only forvalue
type)- evaluator_cfg: (required)
- metric:
d1_all
: Percentage of stereo disparity outliers in first frame.epe
: End point error. Also known as the L1 norm of the difference between the predicted and ground truth disparities.thres_1
: Percentage of erroneous pixels in 1 pixel error threshold.thres_2
: Percentage of erroneous pixels in 2 pixel error threshold.thres_3
: Percentage of erroneous pixels in 3 pixel error threshold.
Note:
- The output directory, which includes the log, checkpoint and summary files, is depended on the
defined
dataset_name
,model
andsave_name
settings, likeoutput/${dataset_name}/${model}/${save_name}
.
data_cfg:
name: SceneFlow
root: data/SceneFlow
train_list: datasets/SceneFlow/sceneflow_cleanpass_train.txt
val_list: datasets/SceneFlow/sceneflow_cleanpass_test.txt
test_list: datasets/SceneFlow/sceneflow_cleanpass_test.txt
num_workers: 8
train_batch_size: 4
val_batch_size: 1
pin_memory: true
shuffle: true
batch_uniform: false
# random_type: range
# w_range: [ 0.5, 2.0 ]
# h_range: [ 0.5, 2.0 ]
# random_type: choice
# h_range: [ 256, 288, 320, 352 ]
# w_range: [ 480, 512, 544, 576 ]
transform:
train:
- type: RandomCrop
size: [ 256, 512 ]
#- type: RandomHorizontalFlip
# prob: 0.5
- type: GetValidDisp
max_disp: 192
- type: TransposeImage
- type: ToTensor
- type: NormalizeImage
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
test:
- type: StereoPad
size: [ 544, 960 ]
- type: GetValidDisp
max_disp: 192
- type: TransposeImage
- type: ToTensor
- type: NormalizeImage
mean: [ 0.485, 0.456, 0.406 ]
std: [ 0.229, 0.224, 0.225 ]
model_cfg:
model: PSMNet
base_config:
max_disp: 192
# Backbone
backbone_cfg:
type: PSMNet
# VolumeCostProcessor
cost_processor_cfg:
type: PSMCostProcessor
# DispProcessor
disp_processor_cfg:
type: PSMDispProcessor
loss_cfg:
- log_prefix: disp
loss_term_weight: 1
type: Weighted_Smooth_l1_Loss
weights: [ 0.5, 0.7, 1.0 ]
trainer_cfg:
save_name: PSMNet_SceneFlow
total_epoch: 15
restore_hint: 0
log_iter: 50 # iter
save_every: 1 # epoch
val_every: 1 # epoch
amp: true
sync_bn: true
fix_bn: false
init_parameters: false
optimizer_cfg:
solver: RMSprop
lr: 0.001
scheduler_cfg:
scheduler: MultiStepLR
gamma: 0.1
milestones: [ 9 ]
warmup:
warmup_steps: 500
evaluator_cfg:
metric:
- d1_all
- epe
- thres_1
- thres_2
- thres_3
clip_grad_cfg:
type: norm
max_norm: 35
norm_type: 2