DALI 2022 roadmap #3774

JanuszL · 2022-03-30T12:54:18Z

The following represents a high-level overview of our 2022 plan. You should be aware that this roadmap may change at any time and the order below does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on this issue here.

Some of the items mentioned below are the continuation of the 2021 effort (#2978)

Improving Usability:

eager mode - introducing DALI operators callable as standalone entities to simplify debugging, prototyping, and improve adoption: Add debug mode tutorial notebook #3648, Add direct operator calls in debug mode #3734, Add eager mode stateful operators #4016
conditional execution - ability to conditionally apply each operation, providing auto Augmentor style capabilities - Ensure sample encapsulation in Tensor Vector #3701, Add internal Split and Merge operators #4359, Fix to enable leading underscore in op name #4405, Extend and relax TensorList sample APIs #4358
intra-pipeline batch size variability - providing an ability to change batch size from operator to operator inside the execution graph
support for Hopper GPU architecture - Enable CUDA 11.8 and Hopper support #4308

Extending input format support:

Extending support of formats and containers with variable frame rate videos: #3615, #3668, #4184, #4296, #4302, #4351, #4354, #4327, #4424
Image decoding operators with support for the following higher dynamic ranges - #4223

Performance:

reducing memory consumption by utilizing fast, pool based, memory allocator: Make output buffers for arugment inputs to GPU operators pinned. #3728, Dynamic & stream-aware scratchpad #3667, Use DynamicScratchpad in KernelManager. #3670, Remove Scratchpad from KernelManager #3678, Add GDS-compatible allocator with 4k alignment. #3754, Quantize GDS chunk size to 1 MB. #3759
operators performance optimizations:
- audio resampling optimization for ARM64 planforms Vectorize audio resampling for ARM NEON. #3745
- audio resampling for GPU - Add signal resampling GPU kernel #3884, Move audio resampler CPU implementation to a single compilation unit #3914 and Add fn.experimental.audio_resample GPU #3911)
- GPU nonsilent-region operator - Add NonsilentRegion GPU, implemented in terms of the CPU version #3874
- slice operator optimization - Coalesce stores in Slice for smaller output types #3568, Make Slice kernel tiling adaptive #3557, Explicitly coalesce stores in Slice for smaller output types #3600,
- transpose operator - Optimization of tiled transposition algorithm on small data types #3730
- cast operator optimizations - Add Cast and CoinFlip GPU benchmarks #3541,

New transformations:

We are constantly extending the set of operations supported by DALI. Currently, this section lists the most notable additions to our areas of interest that we plan to do this year. This list is not exhaustive and we plan on expanding the set of operators as the needs or requests arise.

new transformations for general data processing
- Histogram operator - Histogram skeleton. #3502
- inflate operator that enables decompression of LZ4 compressed input - Add inflate operator #4366
- support for broadcasting in arithmetic operators (CPU and GPU) - Support broadcasting in arithmetic operators (CPU & GPU) #4348
new transformations for image processing
- Laplacian operator - Add Laplacian CPU kernel #3518, Add Laplacian operator [CPU] #3563, Add Laplacian GPU kernel #3618, Add Laplacian GPU operator #3644
- GPU debayer operator - Add debayer operator #4495, Add kernel-wrapper around NPP debayer calls #4486
- remap operator for generic geometric transformation of images and video - Add fn.experimental.remap operator #4379, fn.experimental.remap optimizations #4419, Remap kernel implementation with NPP #4365, Utils and prerequisities for NppRemapKernel implementation #4374, Use numpy instead of naive loops in remap test. #4425
new transformations for video processing - extending support of existing operator for video sequences (including temporal - per frame parametrized, augmentations)
- support for processing video and handling of temporal arguments to color-manipulation operators and affine transform operators - Per frame brightness contrast #3937, Per frame affine transforms #3946, Add examples of processing video with using per-frame #3917

The text was updated successfully, but these errors were encountered:

msaroufim · 2022-04-26T21:26:05Z

Hi @JanuszL I was interested in a more seamless integration between DALI and torchvision for better end to end model training and inference time

Relevant PRs

In DALI: Increase DALI adoption by providing a PyTorch dataset wrapper #2569
In torch/vision: DALI support pytorch/vision#608

In particular we don't necessarily need to integrate everything including the data loader but at the very least I think the accelerated image decoding and specialized preprocessing kernels will be of huge value gated behind another vision backend https://github.com/pytorch/vision#image-backend

The integration would probably look similar to the one made with accimage

I'm guessing torchaudio facebookresearch/mmf (multimodal) would also be similar

JanuszL · 2022-04-26T22:43:00Z

Hi @msaroufim,

Thank you for your feedback regarding our 2022 roadmap.
It would be nice to accelerate existing TorchVision pipelines, however, I'm not convinced if the way that DALI works can be combined in the suggested fashion. DALI relies on the processing graph, we plan to extract some of its operators into callable entities but this will not match the performance of the pipeline execution model. Also, it will lead to less efficient GPU memory utilization.
I would wait until this effort is at least partially completed and let the TorchVision community experiment with it and see what works best.

songyuc · 2022-09-26T09:36:34Z

Hi, @JanuszL,
Is there a timeline for a stable version of DALI with support of python-3.10, as I saw this warning as:

Warning: DALI support for Python 3.10 is experimental and some functionalities may not work.
deprecation_warning("DALI support for Python 3.10 is experimental and some functionalities "

Your answer and guide will be appreciated!

JanuszL · 2022-09-26T11:52:50Z

Hi @songyuc,

This warning is there mostly because we don't have a full test coverage for Python 3.10, although it should work fine.
I cannot commit to a particular timeline but we hope to do it sooner than latter.

csheaff · 2022-10-16T14:08:26Z

Hello, @JanuszL ,

I'm currently looking at DALI for medical image processing. Might there be plans for DICOM support?

Edit: nevermind I see it was mentioned in #3275.

JanuszL · 2022-10-17T06:15:03Z

Hi @csheaff,

Thank you for reaching out. We don't have a short-term plan to support DICOM. As I understand the usual workflow is the conversion from DICOM to NumPy (which includes offline preprocessing, like normalization), and then the NumPy files are used for the training. The conversion is done only once while NumPy files are reused multiple times. That is why we have this item low on our priority list.
Still, can you describe your workflow so we can reprioritize it if needed?

csheaff · 2022-10-17T14:27:41Z

Thanks for the response @JanuszL. I've just discovered DALI and I'm sort of new to MLOps, so perhaps my ideas are off here. If you have better suggestions on workflow I'm happy to hear them.

My priority is low latency from end-to-end in medical imaging applications. Triton is great for inference, but i'm looking for ways to handle data loading and pre-processing as well on the GPU, to be used in tandem with an inference model served by Triton.

It's true that there can be a heavy amount of pre-processing and metadata extraction with DICOMs. The metadata extraction for purposes other than the main processing pipeline will likely always be there. Perhaps this means it doesn't make sense for DALI to handle DICOMs directly.

As for the normalization, my understanding is that DALI would be able to handle such tasks. Perhaps I'm mistaken.

JanuszL · 2022-10-17T19:14:11Z

Hi @csheaff,

As for the normalization, my understanding is that DALI would be able to handle such tasks.

DALI is mostly useful for online augmentation, and in general things, you need to do each iteration of your training/inference process. In the case of DICOM the conversion to NumPy can be done only once as a part of offline preparation, doing it every iteration wouldn't yield any value and would be wasteful from the resource point of view.
Nevertheless, I agree that it would be nice to seemingly handle that in DALI.

blazespinnaker · 2022-12-11T19:49:08Z

@JanuszL

It looks like dali at least partially supports dicom by way of nvjpeg2k now and a bit of a hack. eg: https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/371534 notebook here - https://www.kaggle.com/code/tivfrvqhs5/decode-jpeg2000-dicom-with-dali?scriptVersionId=113466193

Kaggle isn't releasing a decoded dataset for the code competition so folks are having to decode on each run (train about 10 minutes, inference about 7 hrs!). The dali speedup is likely to be a huge win, but as noted it's only for dcmfile.file_meta.TransferSyntaxUID .90 standard, and .70 makes up about 1/2 of the other images. Heres the breakdown:

1.2.840.10008.1.2.4.70 29519
1.2.840.10008.1.2.4.90 25187

Any thoughts on how we might be able to get .70 in there as well? Are there fundamental limitations as to why it can't be supported?

JanuszL · 2022-12-12T14:19:16Z

Hi @blazespinnaker,

I'm glad to see that the community made that work. I think that it should be possible to use the external operator to extract DICOM data and pass it directly to the decoder instead of writing it to the disk.
Can you tell me more about .70 standard? How it is encoded (it may just happen that DALI doesn't support such a format yet)?

blazespinnaker · 2022-12-12T23:36:08Z

Looks like .70 is a rarely used Jpeg Lossless standard .. https://crnl.readthedocs.io/jpeg_formats/index.html

If I had to guess, I'd say the question is whether nvjpeg can support it.

Maybe if we can get the pydicom folks to help support a pipeline to nvjpeg / nvjpeg2000 this can be done.

JanuszL · 2022-12-13T08:54:41Z

I just checked with the nvJPEG team and this format is not supported yet. In this case, DALI should fall back to the CPU libjpeg-turbo decoder. I'm sorry but I don't think we can do much now.

blazespinnaker · 2022-12-16T06:03:32Z

Hmm, I think libjpeg might be a better fall back for 1.2.840.10008.1.2.4.70? libjpeg-turbo does not yet support lossless jpeg I believe.

libjpeg-turbo/libjpeg-turbo#638

JanuszL · 2022-12-16T08:36:24Z

Hi @blazespinnaker,

Currently, we fully rely on libjpeg-turbo for JPEG decoding. If it fails DALI cannot decode it.
What we can do it to try to fallback to the OpenCV (although I believe it may use libjpeg-turbo as well) here https://github.com/NVIDIA/DALI/blob/main/dali/image/jpeg.cc#L80. We would be more than happy to accept PR adding such functionality.

JanuszL · 2023-01-17T20:50:37Z

Please check #4578 for the 2023 roadmap.

JanuszL pinned this issue Mar 30, 2022

JanuszL mentioned this issue Mar 30, 2022

DALI 2021 roadmap #2978

Closed

jantonguirao assigned mzient Mar 30, 2022

JanuszL assigned JanuszL and unassigned mzient Mar 30, 2022

msaroufim mentioned this issue Jul 12, 2022

[discussion] Consolidation of audio-visual I/O in a new package pytorch/pytorch#81102

Open

songyuc mentioned this issue Nov 7, 2022

The pip command install dali 0.27.0 on python-3.11-conda environment #4417

Closed

JanuszL mentioned this issue Jan 17, 2023

Roadmap 2023 #4578

Closed

JanuszL closed this as completed Jan 17, 2023

JanuszL unpinned this issue Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DALI 2022 roadmap #3774

DALI 2022 roadmap #3774

JanuszL commented Mar 30, 2022 •

edited

Loading

msaroufim commented Apr 26, 2022 •

edited

Loading

JanuszL commented Apr 26, 2022

songyuc commented Sep 26, 2022

JanuszL commented Sep 26, 2022

csheaff commented Oct 16, 2022 •

edited

Loading

JanuszL commented Oct 17, 2022

csheaff commented Oct 17, 2022

JanuszL commented Oct 17, 2022

blazespinnaker commented Dec 11, 2022 •

edited

Loading

JanuszL commented Dec 12, 2022

blazespinnaker commented Dec 12, 2022 •

edited

Loading

JanuszL commented Dec 13, 2022

blazespinnaker commented Dec 16, 2022 •

edited

Loading

JanuszL commented Dec 16, 2022

JanuszL commented Jan 17, 2023

DALI 2022 roadmap #3774

DALI 2022 roadmap #3774

Comments

JanuszL commented Mar 30, 2022 • edited Loading

Improving Usability:

Extending input format support:

Performance:

New transformations:

msaroufim commented Apr 26, 2022 • edited Loading

JanuszL commented Apr 26, 2022

songyuc commented Sep 26, 2022

JanuszL commented Sep 26, 2022

csheaff commented Oct 16, 2022 • edited Loading

JanuszL commented Oct 17, 2022

csheaff commented Oct 17, 2022

JanuszL commented Oct 17, 2022

blazespinnaker commented Dec 11, 2022 • edited Loading

JanuszL commented Dec 12, 2022

blazespinnaker commented Dec 12, 2022 • edited Loading

JanuszL commented Dec 13, 2022

blazespinnaker commented Dec 16, 2022 • edited Loading

JanuszL commented Dec 16, 2022

JanuszL commented Jan 17, 2023

JanuszL commented Mar 30, 2022 •

edited

Loading

msaroufim commented Apr 26, 2022 •

edited

Loading

csheaff commented Oct 16, 2022 •

edited

Loading

blazespinnaker commented Dec 11, 2022 •

edited

Loading

blazespinnaker commented Dec 12, 2022 •

edited

Loading

blazespinnaker commented Dec 16, 2022 •

edited

Loading