Skip to content

Latest commit



executable file
257 lines (217 loc) · 9.44 KB

File metadata and controls

executable file
257 lines (217 loc) · 9.44 KB

Configuration item


  • Data configuration
  • Args
    • name: Support SceneFlow, KITTI2012, KITTI2015, DrivingStereo, Middlebury and ETH3D now.
    • root: The path of your dataset.
    • train_list: The path of your train list.
    • val_list: The path of your val list.
    • test_list: The path of your test list.
    • train_batch_size: The batch size of training. (default: 1)
    • val_batch_size: The batch size of validation. (default: 1)
    • num_workers: The number of workers to collect data for each GPU. (default: 1)
    • pin_memory: If True, copy data to pinned memory before returning it.
    • train_transform: The data augmentation method for training.
    • shuffle: If True, shuffle the data when training. (default: True)
    • return_right_disp: If True, return the right disparity map. (default: True).
    • return_occ_mask: If True, return the occlusion mask. (default: False) only support FlyingThings3DSubset now.
    • image_reader: The image reader. (default: PIL) only change this if you know what you are doing.
    • disp_reader: The disparity reader. (default: PIL) only change this if you know what you are doing.
    • for batch_uniform:
      • batch_uniform: If True, randomly change the size of the image between based on random type. (default: False)
      • random_type: The type of random size. Support choice and range.
      • when random_type is range:
        • h_range: Min scale and max scale for height. (eg: [0.5, 2.0]) only work when batch_uniform is True and random_type is range.
        • w_range: Min scale and max scale for width. (eg: [0.5, 2.0]) only work when batch_uniform is True and random_type is range.
      • when random_type is choice:
        • h_range*: A list of height range. (eg: [256, 288, 320]) only work when batch_uniform is True and random_type is choice.
        • w_range*: A list of width range. (eg: [480, 512, 544]) only work when batch_uniform is True and random_type is choice.
    • transform:
      • type: RandomCrop: Randomly crop the image. args: size (h,w): The size of the crop.
      • type: RandomHorizontalFlip: Randomly flip the image horizontally. args: p: The probability of flipping.
      • type: GetValidDisp: Get the valid disparity map. args: max_disp: The maximum disparity.
      • type: TransposeImage: Transpose the image to (C, H, W).
      • type: ToTensor: Convert the image to tensor.
      • type: NormalizeImage: Normalize the image. args: mean: The mean of the image, usually: [ 0.485, 0.456, 0.406 ]. std: The standard deviation of the image, usually: [ 0.229, 0.224, 0.225 ].
      • type: CropOrPad: Crop or pad the image. args: size (h,w): The size of the image.
      • type: DivisiblePad: Pad the image to make the height and width divisible by the given number. args: by: The number to be divisible.


  • Loss function
  • Args
    • log_prefix: the prefix of loss log. (required)
    • loss_term_weight: loss weight. (default: 1.0)
    • type: Loss function type, support Smooth_l1_Loss & Weighted_Smooth_l1_Loss. (required)
    • others: Please refer to loss function you chosen.

The log_prefix must match the output of the module's training_disp. Please refer disp_processor/ for details.


  • Optimizer
  • Args >
  • solver: Optimizer type, example: SGD, Adam, AdamW, etc.
* **others**: Please refer to `torch.optim`.


  • Learning rate scheduler
  • Args
    • scheduler : Learning rate scheduler, example: MultiStepLR.
    • others : Please refer to torch.optim.lr_scheduler.


  • Model to be trained
  • Args
    • model : Model name, please refer to Model Library for the supported values.
    • base_config : The base configuration of the model, which is the parameters shared by backbone, cost_processor and disp_processor.
    • backbone : The backbone of the model.
      • type : The name of backbone.
      • others : Please refer to the backbone you choose.
    • cost_processor : The cost processor of the model.
      • type : The name of cost processor.
      • others : Please refer to the cost processor you choose.
    • disp_processor : The disparity processor of the model.
      • type : The name of disparity processor.
      • others : Please refer to the disparity processor you choose.
    • others : Please refer to the model you choose.

Note: Only model name is required. If you define your own model, you can ignore the backbone, cost_processor and disp_processor settings.


  • Args
    • save_name: str The name of the experiment. (required)
    • total_epoch: int The total epoch of training. (required)
    • restore_hint: int value indicates the epoch number of restored checkpoint; str value indicates the path to restored checkpoint.
    • resume: bool values. If True, resume training from the latest checkpoint.
    • optimizer_reset: bool values. If True, reset the optimizer.
    • scheduler_reset: bool values. If True, reset the scheduler.
    • warmup_reset: bool values. If True, reset the warmup.
    • log_iter: int values. The interval of logging.
    • save_every: int values. The interval of saving checkpoint.
    • val_every: int values. The interval of validation.
    • amp: bool values. If True, use automatic mixed precision training.
    • sync_bn: bool values. If True, use synchronized batch normalization.
    • fix_bn: bool values. If True, fix the batch normalization.
    • init_parameters: bool values. If True, initialize the parameters.
    • optimizer_cfg: (required)
      • solver: str values. The name of optimizer. (required)
      • lr: float values. The learning rate. (required)
      • others: Please refer to torch.optim.
    • scheduler_cfg:
      • scheduler: str values. The name of scheduler.
      • gamma: float values. The decay rate of learning rate.
      • milestones: list values. The milestones of learning rate.
      • others: Please refer to torch.optim.lr_scheduler.
      • warmup:
        • warmup_steps: int values. The warmup epoch.
    • clip_grad_cfg:
      • type: norm or value
      • max_norm: float values. The maximum norm of gradient. (only for norm type)
      • max_value: float values. The maximum value of gradient. (only for norm type)
      • clip_value: float values. The clip value of gradient. (only for value type)
    • evaluator_cfg: (required)
      • metric:
        • d1_all: Percentage of stereo disparity outliers in first frame.
        • epe: End point error. Also known as the L1 norm of the difference between the predicted and ground truth disparities.
        • thres_1: Percentage of erroneous pixels in 1 pixel error threshold.
        • thres_2: Percentage of erroneous pixels in 2 pixel error threshold.
        • thres_3: Percentage of erroneous pixels in 3 pixel error threshold.


  • The output directory, which includes the log, checkpoint and summary files, is depended on the defined dataset_name, model and save_name settings, like output/${dataset_name}/${model}/${save_name}.


  name: SceneFlow
  root: data/SceneFlow
  train_list: datasets/SceneFlow/sceneflow_cleanpass_train.txt
  val_list: datasets/SceneFlow/sceneflow_cleanpass_test.txt
  test_list: datasets/SceneFlow/sceneflow_cleanpass_test.txt
  num_workers: 8
  train_batch_size: 4
  val_batch_size: 1
  pin_memory: true
  shuffle: true

  batch_uniform: false
  #  random_type: range
  #  w_range: [ 0.5, 2.0 ]
  #  h_range: [ 0.5, 2.0 ]
  #  random_type: choice
  #  h_range: [ 256, 288, 320, 352 ]
  #  w_range: [ 480, 512, 544, 576 ]

      - type: RandomCrop
        size: [ 256, 512 ]
      #- type: RandomHorizontalFlip
      #  prob: 0.5
      - type: GetValidDisp
        max_disp: 192
      - type: TransposeImage
      - type: ToTensor
      - type: NormalizeImage
        mean: [ 0.485, 0.456, 0.406 ]
        std: [ 0.229, 0.224, 0.225 ]
      - type: StereoPad
        size: [ 544, 960 ]
      - type: GetValidDisp
        max_disp: 192
      - type: TransposeImage
      - type: ToTensor
      - type: NormalizeImage
        mean: [ 0.485, 0.456, 0.406 ]
        std: [ 0.229, 0.224, 0.225 ]

  model: PSMNet

    max_disp: 192

  # Backbone
    type: PSMNet

  # VolumeCostProcessor
    type: PSMCostProcessor

  # DispProcessor
    type: PSMDispProcessor

  - log_prefix: disp
    loss_term_weight: 1
    type: Weighted_Smooth_l1_Loss
    weights: [ 0.5, 0.7, 1.0 ]

  save_name: PSMNet_SceneFlow
  total_epoch: 15
  restore_hint: 0
  log_iter: 50 # iter
  save_every: 1 # epoch
  val_every: 1 # epoch
  amp: true
  sync_bn: true
  fix_bn: false
  init_parameters: false

    solver: RMSprop
    lr: 0.001

    scheduler: MultiStepLR
    gamma: 0.1
    milestones: [ 9 ]
      warmup_steps: 500

      - d1_all
      - epe
      - thres_1
      - thres_2
      - thres_3

    type: norm
    max_norm: 35
    norm_type: 2