memory optimizations for pyannote.audio.core.inference.Inference.aggregate() #1713
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While diarizing long audio recordings (>6 hours), I noticed very high memory usage, upwards of 30GB.
I tracked the spike to
pyannote.audio.core.inference.Inference.aggregate()
, which was initializing several very large tensors.With this PR, RAM usage is reduced by 10 - 15 GB for long audio files in my tests. I have not tested extensively, but I do not believe this impacts accuracy or speed.
I did have one question related to one of the commits,
Is this a correct assumption? Otherwise, frames should be reinitialized.
Now, the whole speaker diarization pipeline does not peak past 20GB of RAM for a 9hr recording. this is constrained by both
Inference.aggregate
andscipy.cluster.hierarchy.linkage
in the AgglomerativeClustering pipeline.