Skip to content
WolframRhodium edited this page Dec 5, 2022 · 29 revisions

DPIR, or Plug-and-Play Image Restoration with Deep Denoiser Prior, is a denoise and deblocking neural network. See also https://github.com/HolyWu/vs-dpir.

DPIR requires a strength parameter.

Link:

Includes these models:

  • Denoise models, default sigma is 5.0
    • drunet_gray: GRAY denoise
    • drunet_color: RGB denoise
  • Deblocking models, default sigma is 50.0
    • drunet_deblocking_grayscale: GRAY deblocking
    • drunet_deblocking_color: RGB deblocking

Requirements & Parameters

  1. block_w and block_h (tile size) must be multiples of 8.
  2. All DPIR models require a strength parameter, or sigma, and you need to pass that in the form of a GRAYS clip (with normalization factor 1.0/255), see examples below for details.

vsmlrt.py wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt that provides a more Pythonic interface:

from vsmlrt import DPIR, DPIRModel, Backend

src = core.std.BlankClip(format=vs.RGBS) # or vs.GRAYS for gray only models

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
# DPIR is a huge model and GPU backend is highly recommended (use TRT to provide the best performance)
# If the model runs out of GPU memory, increase the tiles parameter.
flt = DPIR(src, strength=5, model=DPIRModel.drunet_color, tiles=2, backend=Backend.ORT_CUDA())

If you want to use variable strength, you can also pass a GRAYS or GRAY8 clip as strength parameter that has the same dimension as the input clip where each pixel stores the DPIR strength for that pixel.

Raw Model Usage

src = core.std.BlankClip(width=640, height=360, format=vs.GRAYS)
sigma = 2.0
flt = core.ov.Model([src, core.std.BlankClip(src, color=sigma/255.0)], "drunet_gray.onnx")

Notes

DPIR is a huge network and it is extremely slow when running on CPU (e.g. for 360p input, you might see 0.05fps/cpu).

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

  • GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35
  3. vs-mlrt v8 (driver 511.79)

Performance

FP32

Model [1] ort-cuda [1] trt [2] cuda [2] trt [3] ort-cuda [3] trt [3] trt (no tf32)
gray 2.46 / 5947 2.95 / 4157 2.34 / 12015 2.43 / 4300 2.92 / 5759 3.26 / 4243 3.07 / 4261
color 2.30 / 5979 2.75 / 4187 2.13 / 12031 2.12 / 4384 2.86 / 5790 3.25 / 4330 3.02 / 4291

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt [3] ort-cuda [3] trt [3] trt (2 streams)
gray 3.67 / 3777 9.60 / 3585 10.6 / 5430 3.47 / 11751 7.18 / 4015 4.65 / 5759 10.9 / 2397 11.6 / 3895
color 3.26 / 3817 8.65 / 3619 10.5 / 5492 3.02 / 11765 5.67 / 4277 4.41 / 3628 9.85 / 2440 11.5 / 3975

RTX 2080 Ti

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35
  3. vs-mlrt v8 (driver 511.79)

Performance

FP32

Model [1] ort-cuda [1] trt [2] cuda [2] trt [3] ort-cuda [3] trt
gray 1.68 / 5277 1.84 / 4004 1.67 / 6916 1.87 / 4163 1.60 / 5190 1.91 / 3659
color 1.53 / 5309 1.66 / 4034 1.56 / 6942 1.71 / 4183 1.57 / 5222 1.78 / 3691

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt [3] ort-cuda [3] trt [3] trt (2 streams)
gray 3.04 / 3619 6.18 / 2780 6.77 / 4531 3.07 / 6730 5.98 / 3249 3.10 / 3276 7.22 / 2101 7.89 / 3529
color 2.70 / 3659 5.64 / 2598 6.72 / 4274 2.65 / 6744 4.78 / 3261 2.93 / 3571 6.38 / 2323 7.64 / 3874

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 2.45 / 5188 2.59 / 3979 2.59 / 6829 2.27 / 11552 2.45 / 3959
color 2.39 / 5220 2.51 / 4011 2.56 / 6893 2.12 / 11558 2.26 / 3979

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 5.20 / 3018 8.09 / 2831 8.50 / 4617 5.09 / 11289 6.93 / 3461
color 4.95 / 3058 7.54 / 2863 8.47 / 4687 4.29 / 11302 5.60 / 3473

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 2.34 / 5791 2.75 / 4015 2.78 / 6641 2.20 / 11837 2.67 / 4189
color 2.29 / 5823 2.73 / 4075 2.78 / 6747 2.12 / 11853 2.54 / 4209

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 3.73 / 3621 6.67 / 3437 6.33 / 5285 3.72 / 11853 6.17 / 4079
color 3.65 / 3661 6.26 / 3423 6.32 / 5277 3.45 / 11597 5.25 / 4103

Tesla A10G

Software: VapourSynth R58, Windows Server 2022, Graphics Driver 511.65, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

  1. vs-mlrt v8

Performance

FP32

Model [1] trt
gray 2.75 / 4285
color 2.70 / 4317

FP16

Model [1] trt
gray 7.00 / 2336
color 6.80 / 2368

Tesla A100 (PCIe, 40 GB)

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

  1. vs-mlrt v6
  2. vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 7.12 / 5853 9.68 / 4111 10.3 / 6737 6.43 / 11973 8.56 / 4261
color 6.95 / 5885 9.31 / 4143 10.2 / 6801 5.62 / 11979 7.21 / 4281

FP16

Model [1] ort-cuda [1] trt [1] trt (2 streams) [2] cuda [2] trt
gray 10.1 / 3683 18.9 / 3015 20.5 / 4603 9.67 / 11709 14.6 / 3679
color 9.55 / 3723 17.7 / 3041 20.3 / 4657 7.65 / 11713 10.5 / 3691

Tesla A100 (SXM4, 80 GB)

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

  1. vs-mlrt v9

Performance

FP16

Model [1] trt [1] trt (2 streams)
color 20.5 / 2022 24.3 / 3325
Clone this wiki locally