Skip to content

Commit

Permalink
improve(pipeline): optimize memory usage in Inference.aggregate
Browse files Browse the repository at this point in the history
  • Loading branch information
benniekiss authored May 28, 2024
1 parent d327195 commit f1951a6
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

- improve(io): when available, default to using `soundfile` backend
- improve(pipeline): do not extract embeddings when `max_speakers` is set to 1
- improve(pipeline): optimize memory usage of most pipelines ([#1713](https://github.com/pyannote/pyannote-audio/pull/1713) by [@benniekiss](https://github.com/benniekiss/))

## Version 3.2.0 (2024-05-08)

Expand Down
9 changes: 4 additions & 5 deletions pyannote/audio/core/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -559,9 +559,6 @@ def aggregate(
step=frames.step,
)

masks = 1 - np.isnan(scores)
scores.data = np.nan_to_num(scores.data, copy=True, nan=0.0)

# Hamming window used for overlap-add aggregation
hamming_window = (
np.hamming(num_frames_per_chunk).reshape(-1, 1)
Expand Down Expand Up @@ -613,11 +610,13 @@ def aggregate(
)

# loop on the scores of sliding chunks
for (chunk, score), (_, mask) in zip(scores, masks):
for chunk, score in scores:
# chunk ~ Segment
# score ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray
# mask ~ (num_frames_per_chunk, num_classes)-shaped np.ndarray

mask = 1 - np.isnan(score)
np.nan_to_num(score, copy=False, nan=0.0)

start_frame = frames.closest_frame(chunk.start + 0.5 * frames.duration)

aggregated_output[start_frame : start_frame + num_frames_per_chunk] += (
Expand Down

0 comments on commit f1951a6

Please sign in to comment.