Ensemble model with shared memory #5418

NikeNano · 2023-02-24T20:33:00Z

NikeNano
Feb 24, 2023

Is your feature request related to a problem? Please describe.
Is is possible to use shared memory between the models that builds up an ensemble in triton? If possible how should this be configured?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Thanks!

Answered by dyastremsky

Feb 25, 2023

If you're asking whether ensemble shares memory between models, the ensemble scheduler passes pointers to the tensors between models to avoid copies. However, there is a copy at the end of the ensemble for the final output.

The backends may also make copies during execution (unrelated to the ensemble). This could be due to the dynamic batcher (if enabled, copies during gathering/scattering of inputs and outputs), pinned memory manager (if used to improve performance), and models moving tensors between host and device memory. There are also other backend-specific situations, like moving data between models pertaining to different backends could introduce copies.

CC: @GuanLuo @Tabrizian

View full answer

dyastremsky · 2023-02-25T01:32:01Z

dyastremsky
Feb 25, 2023
Collaborator

If you're asking whether ensemble shares memory between models, the ensemble scheduler passes pointers to the tensors between models to avoid copies. However, there is a copy at the end of the ensemble for the final output.

The backends may also make copies during execution (unrelated to the ensemble). This could be due to the dynamic batcher (if enabled, copies during gathering/scattering of inputs and outputs), pinned memory manager (if used to improve performance), and models moving tensors between host and device memory. There are also other backend-specific situations, like moving data between models pertaining to different backends could introduce copies.

CC: @GuanLuo @Tabrizian

3 replies

NikeNano Feb 25, 2023
Author

If you're asking whether ensemble shares memory between models, the ensemble scheduler passes pointers to the tensors between models to avoid copies. However, there is a copy at the end of the ensemble for the final output.

Yes, this was exactly what I wanted to know. Thus it would only make sense to use shared memory for the input and output of the ensemble as I understand.

The backends may also make copies during execution (unrelated to the ensemble). This could be due to the dynamic batcher (if enabled, copies during gathering/scattering of inputs and outputs), pinned memory manager (if used to improve performance), and models moving tensors between host and device memory. There are also other backend-specific situations, like moving data between models pertaining to different backends could introduce copies.

Will have to read up further on pinned memory and play around with the settings for it to improve performance. Is there any general advice on it? Is it per client? We are only runing one model per GPU(edge deployments in manufacturing) would it make sense to split the GPU memory with the number of clients and use that? Who would this work if the first step of the ensemble is using CPU?

Thank you for the great answer @dyastremsky!

dyastremsky Feb 27, 2023
Collaborator

I don't belive you need to do anything special for pinned memory. You should see it in your logs. You can see how this is tested in Triton's systems here as an example.

Memory types (e.g. CUDA shared memory) aren't limited to being per-client, I believe. I'd recommend reading through the documentation for whichever type of memory you choose to use. There are some flags (e.g. the pinned-memory-pool-byte-size) that are for all of Triton.

I'd recommend experimenting with Model Analyzer to see what setup meets your needs best. If MA doesn't support the options you need, you can run also try different configurations with perf_analyzer. Perf_analyzer supports options like --shared-memory cuda.

NikeNano Feb 28, 2023
Author

Great, played around with pref_analyzer, will try to use Model Analyzer as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensemble model with shared memory #5418

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ensemble model with shared memory #5418

NikeNano Feb 24, 2023

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Replies: 1 comment · 3 replies

dyastremsky Feb 25, 2023 Collaborator

NikeNano Feb 25, 2023 Author

dyastremsky Feb 27, 2023 Collaborator

NikeNano Feb 28, 2023 Author

NikeNano
Feb 24, 2023

Replies: 1 comment 3 replies

dyastremsky
Feb 25, 2023
Collaborator

NikeNano Feb 25, 2023
Author

dyastremsky Feb 27, 2023
Collaborator

NikeNano Feb 28, 2023
Author