GPU setup for running SLEAP on a large dataset of videos #1942

hemanyaradadia · 2024-09-05T15:53:45Z

hemanyaradadia
Sep 5, 2024

Hello everyone,

I am working on projects involving massive datasets like >10k videos (several seconds long) and 5-week-long high-resolution videos, and I plan to use SLEAP for estimating pose from these videos. I am looking for suggestions for a computing infrastructure suitable for such large-scale projects.

What would be the best GPU setup to run inference on many videos as fast as possible (with a trained model)? For example, a single powerful GPU like A40 or many less powerful ones like A4500? Does virtual parallelization improve performance on a single GPU?

talmo · 2024-09-12T02:30:50Z

talmo
Sep 12, 2024
Maintainer

Hi @hemanyaradadia,

Sorry about the delay!

That's a large dataset indeed!

Looking at some specs:

Specification	NVIDIA A4500	NVIDIA A40
CUDA Cores	7168	10752
Tensor Cores	224	336
RT Cores	56	84
Single-Precision Performance	23.7 TFLOPS	37.4 TFLOPS
Tensor Performance (FP32 with structured sparsity)	189.2 TFLOPS	149.6 TFLOPS
GPU Memory	20 GB GDDR6	48 GB GDDR6
Memory Bandwidth	640 GB/s	696 GB/s

It's hard to say how these will play out in a real setting, but it might be the case that the A4500 could have a bit more performance with its tensor cores somehow.

On the other hand, the A40 has a lot more memory, so you'll be able to increase the batch size more to increase throughput. As you mentioned, another advantage is that if you can do virtual parallelization (GPU fractionalization) to run multiple models in parallel, the A40 has more memory and bandwidth to support that and will get you some gains in terms of data parallelism on the same device.

Given that you have many small videos, one thing I'd strongly recommend is using the Python API to do the inference yourself rather than using the sleap-track CLI since the latter will incur massive overhead in spinning up the process, initializing tensorflow and loading the model every time. (We recently added a feature to allow you to specify a list of videos for input into sleap-track without having to incur these costs, but it's not quite available yet in a released version.)

In all cases though, you'll likely be bottlenecked by the I/O, so definitely consider running as many tracking processes in parallel as your I/O path will permit. For example, if your data is sitting in the network, then make sure you're saturating (but not exceeding) your network card's bandwidth or your storage system's read rates.

Let us know if you have any questions and keep us posted on what solution you land on!

Talmo

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU setup for running SLEAP on a large dataset of videos #1942

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GPU setup for running SLEAP on a large dataset of videos #1942

hemanyaradadia Sep 5, 2024

Replies: 1 comment

talmo Sep 12, 2024 Maintainer

hemanyaradadia
Sep 5, 2024

talmo
Sep 12, 2024
Maintainer