-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid corrupting data when calling seekend
#149
Conversation
@Marlin-Na Does this solve your issue? @quinnj @bicycle1885 This is ready for review. The current behavior of For example currently: julia> stream = GzipCompressorStream(open("foo.txt.gz"; write=true))
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=idle>)
julia> write(stream, collect(1:10000))
80000
julia> seekend(stream)
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=write>)
julia> write(stream, collect(10001:20000))
80000
julia> close(stream)
julia> reinterpret(Int,transcode(GzipDecompressor,read("foo.txt.gz")))
ERROR: zlib error: invalid block type (code: -3) With this PR the error happens while writing, but no data is lost julia> stream = GzipCompressorStream(open("foo.txt.gz"; write=true))
TranscodingStreams.TranscodingStream{GzipCompressor, IOStream}(<mode=idle>)
julia> write(stream, collect(1:10000))
80000
julia> seekend(stream)
ERROR: ArgumentError: not in read mode
Stacktrace:
[1] seekend(stream::TranscodingStreams.TranscodingStream{GzipCompressor, IOStream})
@ TranscodingStreams ~/juliadev/TranscodingStreams/src/stream.jl:308
[2] top-level scope
@ REPL[49]:1
julia> write(stream, collect(10001:20000))
80000
julia> close(stream)
julia> reinterpret(Int,transcode(GzipDecompressor,read("foo.txt.gz")))
20000-element reinterpret(Int64, ::Vector{UInt8}):
1
2
3
4
5
6
⋮
19996
19997
19998
19999
20000 |
seekend
and seekstart
seekend
if mode == :read | ||
while !eof(stream) | ||
emptybuffer!(buffer1) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to read the entire file here. Normally, when a user uses seek
, it's specifically to avoid processing a whole file.
I understand that it may not be possible to seek into a compressed file and get the position of the decompressed stream, but I believe in that case, seekend
should simply not be implemented, else we risk frustrating users when an operation they believe to be constant time suddenly takes ages.
We can then provide the functionality of reading the whole file under a function with another name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that it is much better for Base.seekend(stream::TranscodingStream)
not to be implemented than to silently do the wrong thing. I'm not sure how to go about removing this method though. I created a PR JuliaIO/CodecZlib.jl#68 to remove its use in CodecZlib.jl's tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can split this PR into three parts, one for each change, to make reviewing easier.
Fixes #109
I changed the default
seekstart
to error when applied to a stream in:write
mode, and reset the stream if it is in:read
mode. It also callsseekstart
on the wrapped IO.seekend(s)
should be equivalent toskip(s, big number)
in which case it should currently error if the stream is not in:read
mode.Also changed
skip
to return the stream to be more consistent.