Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713

Merged
merged 4 commits into from
May 28, 2024

Conversation

benniekiss
Copy link
Contributor

While diarizing long audio recordings (>6 hours), I noticed very high memory usage, upwards of 30GB.
I tracked the spike to pyannote.audio.core.inference.Inference.aggregate(), which was initializing several very large tensors.

With this PR, RAM usage is reduced by 10 - 15 GB for long audio files in my tests. I have not tested extensively, but I do not believe this impacts accuracy or speed.

I did have one question related to one of the commits,

currently, frames is recreated only so that it has the same start as chunks,
but from my understanding, there are no cases where chunks.start and frames.start
would be anything other than 0.0.

Is this a correct assumption? Otherwise, frames should be reinitialized.

Now, the whole speaker diarization pipeline does not peak past 20GB of RAM for a 9hr recording. this is constrained by both Inference.aggregate and scipy.cluster.hierarchy.linkage in the AgglomerativeClustering pipeline.

@benniekiss benniekiss force-pushed the inference_aggregate_memory_opt branch 2 times, most recently from efd21e0 to 34f4f7e Compare May 18, 2024 12:51
@benniekiss
Copy link
Contributor Author

rebased the changes onto most recent develop, and then fixed an incorrect git authorship config on my end

since we are overwriting scores.data with an augmented scores.data, just operate on the array in place.

* this results in ~8GB of memory savings on a 9 hour recording
storing ALL calculated masks in a tensor can consume a lot of RAM,
about 4GB for ~9hr audio.
so we save calculating the mask until the loop.
because the mask is now calculated in the loop, we have to wait until after
calculating the mask in order to np.nan_to_num(score)
@benniekiss benniekiss force-pushed the inference_aggregate_memory_opt branch from 64a3351 to 0b4dbd3 Compare May 23, 2024 13:04
@benniekiss
Copy link
Contributor Author

rebased and added back the frames section.

@hbredin hbredin merged commit f1951a6 into pyannote:develop May 28, 2024
4 checks passed
@hbredin
Copy link
Member

hbredin commented May 28, 2024

Merged! 🎉 Thanks a lot for your contribution. Will be part of next release.

@benniekiss
Copy link
Contributor Author

Awesome! I really appreciate your work. pyannote has become an invaluable tool, so I'm glad I can give back in my small way.

@hbredin
Copy link
Member

hbredin commented May 28, 2024

I'd love to know more about how pyannote impacts your work.
Feel free to drop me an email!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants