memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

benniekiss · 2024-05-17T22:30:46Z

While diarizing long audio recordings (>6 hours), I noticed very high memory usage, upwards of 30GB.
I tracked the spike to pyannote.audio.core.inference.Inference.aggregate(), which was initializing several very large tensors.

With this PR, RAM usage is reduced by 10 - 15 GB for long audio files in my tests. I have not tested extensively, but I do not believe this impacts accuracy or speed.

I did have one question related to one of the commits,

currently, frames is recreated only so that it has the same start as chunks,
but from my understanding, there are no cases where chunks.start and frames.start
would be anything other than 0.0.

Is this a correct assumption? Otherwise, frames should be reinitialized.

Now, the whole speaker diarization pipeline does not peak past 20GB of RAM for a 9hr recording. this is constrained by both Inference.aggregate and scipy.cluster.hierarchy.linkage in the AgglomerativeClustering pipeline.

benniekiss · 2024-05-18T12:55:52Z

rebased the changes onto most recent develop, and then fixed an incorrect git authorship config on my end

pyannote/audio/core/inference.py

since we are overwriting scores.data with an augmented scores.data, just operate on the array in place. * this results in ~8GB of memory savings on a 9 hour recording

storing ALL calculated masks in a tensor can consume a lot of RAM, about 4GB for ~9hr audio. so we save calculating the mask until the loop. because the mask is now calculated in the loop, we have to wait until after calculating the mask in order to np.nan_to_num(score)

benniekiss · 2024-05-23T13:06:16Z

rebased and added back the frames section.

hbredin · 2024-05-28T12:34:47Z

Merged! 🎉 Thanks a lot for your contribution. Will be part of next release.

benniekiss · 2024-05-28T13:52:27Z

Awesome! I really appreciate your work. pyannote has become an invaluable tool, so I'm glad I can give back in my small way.

hbredin · 2024-05-28T15:24:12Z

I'd love to know more about how pyannote impacts your work.
Feel free to drop me an email!

benniekiss force-pushed the inference_aggregate_memory_opt branch 2 times, most recently from efd21e0 to 34f4f7e Compare May 18, 2024 12:51

hbredin reviewed May 19, 2024

View reviewed changes

pyannote/audio/core/inference.py Outdated Show resolved Hide resolved

benniekiss added 2 commits May 23, 2024 09:01

do not copy array

18fc446

since we are overwriting scores.data with an augmented scores.data, just operate on the array in place. * this results in ~8GB of memory savings on a 9 hour recording

benniekiss force-pushed the inference_aggregate_memory_opt branch from 64a3351 to 0b4dbd3 Compare May 23, 2024 13:04

hbredin added 2 commits May 24, 2024 12:33

Merge branch 'develop' into inference_aggregate_memory_opt

624f936

doc: update changelog

4906ab8

hbredin merged commit f1951a6 into pyannote:develop May 28, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

benniekiss commented May 17, 2024

benniekiss commented May 18, 2024

benniekiss commented May 23, 2024

hbredin commented May 28, 2024

benniekiss commented May 28, 2024

hbredin commented May 28, 2024

memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

Conversation

benniekiss commented May 17, 2024

benniekiss commented May 18, 2024

benniekiss commented May 23, 2024

hbredin commented May 28, 2024

benniekiss commented May 28, 2024

hbredin commented May 28, 2024