Fixing blocks read with overlap for files shorter than block size #446
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The
blocks
method to read an audio in frame blocks with an optional overlap has a bug in the edge case when the file has less frames than the block size specified and the overlap specified is not 0. The full file and an overlap number of random values are returned in the output in this case.Explanation
When the method is called without a provided
out
array, it creates an empty array to store the output with thenp.empty
method with the shape corresponding to the block size argument (and the number of channels but that is not relevant for this problem). The default values in an array created bynp.empty
are not deterministic.The file is then read into this
out
array up to the block size or the end of the file, whichever is shorter. The number of frames in the file plus the overlap size is copied into ablock
from theout
array which is returned.Example
Let's take an example of a mono file with
5
frames, a block size of10
and an overlap of1
.10
is produced:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
.[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
(assuming all frames in the mono file were1
)5+1=6
. The output block we get therefore is:[1, 1, 1, 1, 1, 0]
.0
but can be a random number produced bynp.empty
.The
test_block_longer_than_file_with_overlap_mono
test was added to cover this case. It did fail as expected before adding the fix.What happens when the block size is smaller than the file length but there is an overlap?
In case the where the file has multiple blocks in it, the error doesn't occur because at the end of each block read, the overlapping frames are added to the beginning of the
out
array and the next frames for the file are read on top of that. Due to this, when returning the final block, we will again return the number of frames left in the file plus the overlap frames, but in this case the overlap frames will be the frames added to the beginning of the output in the previous run so this case isn't impacted by the bug.The fix
When creating the initial
out
array (in case it was not provided as an argument) we just need to make sure to create it with a length that is equal to the block size OR the number of frames in the file, whichever is smaller. If the block size is smaller than the number of frames in the file, it will work as previously and as discussed above, that case is not affected by the bug. If however the block size is greater, the output array will be equal to the number of frames in the file but in this case there's only going to be one block anyway so having an overlap does not make sense.