Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in split where chunking would be skipped when the chunk size #3800

Merged
merged 2 commits into from
Aug 16, 2022

Conversation

resistor
Copy link
Contributor

happened to be an exact divisor of the buffer size used to read the
input stream.

The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.

This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.

Fixes #3790

@resistor
Copy link
Contributor Author

Ping?

@sylvestre
Copy link
Contributor

Sorry I saw your patch but didn't take the time to reply.
When I see the initial bug, I wonder why we can't just use the same test case as test

@resistor
Copy link
Contributor Author

@sylvestre The original test case requires a multi-gigabyte file. I can probably construct and example with a smaller file, but it has to be an exact multiple of the input stream chunk size, which I don't know to be consistent across platforms.

@sylvestre
Copy link
Contributor

Having the test only on Linux would be good enough :)

Thanks

@resistor
Copy link
Contributor Author

Having the test only on Linux would be good enough :)

What about macOS? That's what I'm developing/testing on.

@sylvestre
Copy link
Contributor

Sure, that works too :)

happened to be an exact divisor of the buffer size used to read the
input stream.

The issue here was that file was being split byte-wise in chunks of 1G.
The input stream was being read in chunks of 8KB, which evenly divides
the chunk size. Because the check to allocate the next output chunk was
done at the bottom of the loop previously, it would never occur because
the current input chunk was fully consumed at that point. By moving the
check to the top of the loop (but still late enough that we know we have
bytes to write) we resolve this issue.

This scenario is unfortunately hard to write a test for, since we don't
explicitly control the input chunk size.

Fixes uutils#3790
@resistor
Copy link
Contributor Author

Added test case.

@sylvestre
Copy link
Contributor

Much better, thanks :)

@sylvestre sylvestre merged commit 9fad6fd into uutils:main Aug 16, 2022
@jfinkels
Copy link
Collaborator

Cool, nice catch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

split command not working right
3 participants