Adding parallelism using ProcessPoolExecutor and concurrent.futures #314

pvbhanuteja · 2023-04-01T09:22:59Z

adding parallelism using ProcessPoolExecutor and concurrent.futures

snakers4 · 2023-04-02T08:17:41Z

Hi,

Many thanks for your contribution to the examples!

determine_label_name

This function seems too specific for a certain task.
Could you maybe trim the example so that it would be generally applicable?

def process_wav_file(wav_file: str):

Looks like you init a VAD each time the new file gets processed.
Maybe there is a way to pass a custom init function to the ProcessPoolExecutor class, so that the VAD gets initiated only one time per process? (also please do not forget to reset the state of the model each time, or to make sure this is done inside of the imported function)

For long files this overhead will be negligible, but I believe we should do our best to provide the best example we can. Because sometimes, e.g. in a multi-threaded application, it may lead to painful memory leaks.

You see, if you are using PyTorch (not ONNX), it actually creates a reference to an underlying object, and in this particular case it hardly matters, but in general I guess a proper way is to init a separate model instance per process.

pvbhanuteja · 2023-04-04T04:38:03Z

Hello @snakers4

Spent a few hours working to pickle the model. And found a small workaround please take a look at the new commit. Cleaned up the code a bit as well.

snakers4 · 2023-04-04T07:20:51Z

Hi,

def determine_label_name(wav_file):

Looks like this is not used.

And found a small workaround please take a look at the new commit.

with ProcessPoolExecutor(max_workers=NUM_PROCESS, initializer=initialize_vad, initargs=(model, utils)) as ex:

def initialize_vad(model, utils):
    global vad_model, vad_utils
    vad_model = model
    vad_utils = utils

Looks like you nevertheless initialize the model in the main process, and then pass it as a reference to process pool executor processes.

Why not just init a separate instance of model in each child process, i.e. in initialize_vad separately each time the new process is spun up?

Spent a few hours working to pickle the model.

You do not really need to pass any parameters to the init. You can write some plain logs to make sure that several instances of model were created in parallel.

snakers4 · 2024-04-18T08:32:41Z

Here is my batched vs unbatched example, the second variant executes in about 0.5s on A770, which is 200x real-time with batch size=50 and is not even catching a sweat.

For a real-time system (the VAD is intended for this use-case) latency of about 500ms when processing one 30ms chunk takes about 1ms is not very good.

Also batching works fine when you process a lot of files locally, but it becomes problematic when you can receive chunks asynchronously. Of course you can keep the state for each batch element and pass it back and forth for a next chunk in a messaging-like system, but we decided not to maintain public examples of such complexity.

sobomax · 2024-04-18T08:36:32Z

Here is my batched vs unbatched example, the second variant executes in about 0.5s on A770, which is 200x real-time with batch size=50 and is not even catching a sweat.

For a real-time system (the VAD is intended for this use-case) latency of about 500ms when processing one 30ms chunk takes about 1ms is not very good.

Also batching works fine when you process a lot of files locally, but it becomes problematic when you can receive chunks asynchronously. Of course you can keep the state for each batch element and pass it back and forth for a next chunk in a messaging-like system, but we decided not to maintain public examples of such complexity.

Well, I still don't see why provide example of batch size=1? Like if someone has 200 files those can be processed by a single executor with batch size 200, not by 200 in parallel.

sobomax · 2024-04-18T08:50:44Z

For a real-time system (the VAD is intended for this use-case) latency of about 500ms when processing one 30ms chunk takes about 1ms is not very good.

Oh that's the whole processing time for 50 wavs of ~10s each. Chunk-wise it was obviously 0.5s/nchunks ~ 5ms.

For a real-time system (the VAD is intended for this use-case) latency of about 500ms when processing one 30ms chunk takes about 1ms is not very good.

Also batching works fine when you process a lot of files locally, but it becomes problematic when you can receive chunks asynchronously. Of course you can keep the state for each batch element and pass it back and forth for a next chunk in a messaging-like system, but we decided not to maintain public examples of such complexity.

Well, it's not terribly complicated. I just implemented it (save/restore) for our own use in couple of hours.

Arguably to make the best use of your GPU resources (i.e. execution units, ram, scheduling etc), you need to batch up local tasks, not to multiplex them. With proper batching I can multiplex small number of functionally different models (i.e. TTS, STT, VAD etc) for say 50 channels on a single GPU today in a real-time, but I cannot multiplex 150 copies of the same 3 models. Well, I can but on 10x number of GPUs. ;-)

So giving only examples of batch size 1 promotes the most inefficient way to use the model, IMHO.

sobomax · 2024-04-18T08:52:59Z

@snakers4 P.S. Sorry for hiding that comment, it's bit late here, I wanted to give it a fresh read and re-post in the morning but you have beaten me to it. 😂

pablodz · 2024-07-28T21:42:06Z

Do not use python for concurrency, GIL will generate cpu spikes

adding parallelism using ProcessPoolExecutor and concurrent.futures

5f8c5b3

clean file and remove model initilizations on every run

bb476c1

snakers4 changed the title ~~adding parallelism using ProcessPoolExecutor and concurrent.futures~~ Adding parallelism using ProcessPoolExecutor and concurrent.futures Apr 5, 2023

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding parallelism using ProcessPoolExecutor and concurrent.futures #314

Adding parallelism using ProcessPoolExecutor and concurrent.futures #314

pvbhanuteja commented Apr 1, 2023

snakers4 commented Apr 2, 2023

pvbhanuteja commented Apr 4, 2023

snakers4 commented Apr 4, 2023

This comment was marked as off-topic.

snakers4 commented Apr 18, 2024

sobomax commented Apr 18, 2024 •

edited

Loading

sobomax commented Apr 18, 2024 •

edited

Loading

sobomax commented Apr 18, 2024 •

edited

Loading

pablodz commented Jul 28, 2024

Adding parallelism using ProcessPoolExecutor and concurrent.futures #314

Are you sure you want to change the base?

Adding parallelism using ProcessPoolExecutor and concurrent.futures #314

Conversation

pvbhanuteja commented Apr 1, 2023

snakers4 commented Apr 2, 2023

pvbhanuteja commented Apr 4, 2023

snakers4 commented Apr 4, 2023

This comment was marked as off-topic.

snakers4 commented Apr 18, 2024

sobomax commented Apr 18, 2024 • edited Loading

sobomax commented Apr 18, 2024 • edited Loading

sobomax commented Apr 18, 2024 • edited Loading

pablodz commented Jul 28, 2024

sobomax commented Apr 18, 2024 •

edited

Loading

sobomax commented Apr 18, 2024 •

edited

Loading

sobomax commented Apr 18, 2024 •

edited

Loading