Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing a ZstdCompressionWriter or ZstdDecompressionWriter should not close underlying file #76

Closed
rtkbkish opened this issue Feb 26, 2019 · 15 comments

Comments

@rtkbkish
Copy link

The changes to support ZstdCompressionWriter and ZstdDecompressionWriter without a context mananger has broken use cases where they are being used in a context manager.

Upon a close, the writers should not close their underlying file objects. It should be up to the caller to close the file objects.

We are passing in StringIO objects to the writers and they get closed upon exit--throwing out the buffer. This technique works with other compression libraries.

@indygreg
Copy link
Owner

The behavior of stream wrappers and close() isn't very well-defined by the Python docs and I had trouble understanding what the expected behavior should be. In fact, there is a TODO item to audit the behavior and ensure consistency.

I'm pretty sure I found at least one example (io.BufferedReader) where close() does in fact close() the underlying stream. Maybe that behavior only makes sense on readers instead of writers?

I understand your pain here. I'd prefer to change things once. And that change should match the behavior of other stream wrappers in Python. So I'll need to do some homework.

Specifically:

  1. Does a stream wrapper call close() on the underlying stream when close() is called?
  2. Does a stream wrapper call close() on the underlying stream when __exit__ is called?
  3. If 2 is yes, does behavior change if an exception was raised or does the main/underlying stream always get closed during __exit__?
  4. Does it matter if the stream wrapper is a reader or writer?

@indygreg
Copy link
Owner

I should mention that a workaround would be to use writer.flush(zstd.FLUSH_FRAME). This will ensure a full zstd frame is flushed to the underlying stream without closing the stream. close() at this point shouldn't perform any I/O other than to close the underlying stream.

@rtkbkish
Copy link
Author

https://github.com/python/cpython/blob/master/Lib/gzip.py

    def close(self):
        fileobj = self.fileobj
        if fileobj is None:
            return
        self.fileobj = None
        try:
            if self.mode == WRITE:
                fileobj.write(self.compress.flush())
                write32u(fileobj, self.crc)
                # self.size may exceed 2 GiB, or even 4 GiB
                write32u(fileobj, self.size & 0xffffffff)
            elif self.mode == READ:
                self._buffer.close()
        finally:
            myfileobj = self.myfileobj
            if myfileobj:
                self.myfileobj = None
                myfileobj.close()

fileobj is the output object
myfileobj is None if passed in an output object, or it equals fileobj if it opened the file itself.

fileobj is not closed
myfileobj, if it exists, is closed

@indygreg
Copy link
Owner

I've filed https://bugs.python.org/issue36129 to seek clarification from upstream.

@ghannum
Copy link

ghannum commented Mar 5, 2019

I'm having a similar issue with zstd.ZstdDecompressor().stream_reader not closing the inner file when an outer TextIOWrapper closes. Can this also be addressed?

@indygreg
Copy link
Owner

indygreg commented Mar 5, 2019

Thanks for the report, @ghannum. While I think I know what you are saying, would you mind providing a use case showing the behavior and commenting exactly where you think things are misbehaving?

I'm wondering if we're getting into trouble because we don't involve close() when destroying the stream wrappers. i.e. we're missing a proper __del__ implementation (https://docs.python.org/3/library/io.html#io.IOBase.__del__).

@ghannum
Copy link

ghannum commented Mar 5, 2019

   fh = open("test.zst", 'rb')
   f = io.TextIOWrapper(zstd.ZstdDecompressor().stream_reader(fh)) 
   f.close()

   print(fh.closed)

False

@ghannum
Copy link

ghannum commented Mar 22, 2019

@indygreg any update on this?

@indygreg
Copy link
Owner

The reply to https://bugs.python.org/issue36129 says that flush() and close() should cascade. However, the compression *File types do not do this. It sounds like a closefd argument can be used to control this behavior. I'm waiting for clarification that this is the case. If closefd should apply to any file object, then we should add a closefd argument to our stream wrapping types and have it default to True. The default would be inconsistent with the compression *File types in the Python standard library. But it would be consistent with the ideal that close() cascades by default. I do not want to cargo cult standard library bugs into python-zstandard if it can be avoided.

@ghannum
Copy link

ghannum commented Mar 25, 2019

Thanks for the response. It is an interesting question I hadn't thought of.

Looking at the Python standard libraries, it seems this sort of control is assumed through the parameter type - ie. If you pass gzip.open a steam, it will not cascade a 'close' event (just borrowing the steam). But if you pass it a filename, it will handle all of the file-associated actions, even cascading the close through a hidden TextIOWrapper if requested (creating and owning the file handle and text wrapper).

If the goal is to maintain the Python conventions, then perhaps a similar zstd.open function would be appropriate?

@indygreg
Copy link
Owner

I do plan to provide an open() function at some time. I thought about doing it for the last release. But I wanted the stream_reader() and stream_writer() APIs to be a bit more stable first. Now that the zstandard module is backed by a .py file instead of an extension, it should be trivial to implement higher-level functionality like open(). (Another feature will likely be a function to open a tar file with zstd compression.)

@ghannum
Copy link

ghannum commented Mar 25, 2019

Awesome. I'm looking forward to it!

jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: upon closing the file writer is also closing the underlying file,
indygreg/python-zstandard#76
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
jappja pushed a commit to jappja/pghoard that referenced this issue May 21, 2019
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
@dholth
Copy link

dholth commented Apr 9, 2020

Is this the same bug? The nested zip file is corrupt when extracted.

(oh, it may be that ZipFile simply doesn't close the underlying file when it is a stream and not a filename)

compression = zstandard.ZstdCompressor(level=10)
z1 = ZipFile('a.zip', 'w')
writer = z1.open('nested.zip.zst', 'w')
compressed = compression.stream_writer(writer)
z2 = zipfile.ZipFile(compressed, mode="w", compression=zipfile.ZIP_STORED)
z2.open(...).write(...)

indygreg added a commit that referenced this issue Dec 26, 2020
… stream

The existing behavior of automatically calling close() on the underlying
stream is not desired in all cases. See #76 for discussion.

This commit adds an argument to ZstdCompressionWriter to control whether
the inner stream is closed when the outer stream is closed. This will give
consumers control over how to behave.

We plan to implement this feature for all file-like types. After we're
done, we'll revisit the default behavior.
indygreg added a commit that referenced this issue Dec 26, 2020
…se the inner stream

This is related to #76. So previous commit for more context.
indygreg added a commit that referenced this issue Dec 26, 2020
…am is closed

See #76 and previous commits for more context.
indygreg added a commit that referenced this issue Dec 26, 2020
This completes our implementation of closefd support on all our IO
classes to enable the consumer to have complete control over whether the
inner stream should be closed on our close. See prior commits and #76 for
more context.
@indygreg
Copy link
Owner

I just pushed some commits to add closefd arguments to the stream_reader() and stream_writer() methods for constructing IO objects. The writers default to closefd=True and the readers to closefd=False. I may change these defaults before the next release so behavior is consistent.

indygreg added a commit that referenced this issue Dec 26, 2020
This completes our implementation of closefd support on all our IO
classes to enable the consumer to have complete control over whether the
inner stream should be closed on our close. See prior commits and #76 for
more context.
indygreg added a commit that referenced this issue Dec 27, 2020
… stream

The existing behavior of automatically calling close() on the underlying
stream is not desired in all cases. See #76 for discussion.

This commit adds an argument to ZstdCompressionWriter to control whether
the inner stream is closed when the outer stream is closed. This will give
consumers control over how to behave.

We plan to implement this feature for all file-like types. After we're
done, we'll revisit the default behavior.
indygreg added a commit that referenced this issue Dec 27, 2020
…se the inner stream

This is related to #76. So previous commit for more context.
indygreg added a commit that referenced this issue Dec 27, 2020
…am is closed

See #76 and previous commits for more context.
indygreg added a commit that referenced this issue Dec 27, 2020
This completes our implementation of closefd support on all our IO
classes to enable the consumer to have complete control over whether the
inner stream should be closed on our close. See prior commits and #76 for
more context.
@indygreg
Copy link
Owner

I'm about to close this issue via a pushed commit. The state of the API will be as follows:

  • All stream_reader() and stream_writer() methods that construct a file-like object accept a closefd boolean argument that controls whether to close() the passed stream when the instance itself is closed.
  • closefd defaults to True in all cases, meaning close() cascades by default.
  • __exit__ will always call self.close().

While the issue summary requests that closefd=False, this goes against the suggestion from upstream that close() should cascade. I'm inclined to go with upstream's suggestion here. And since there is a closefd argument to explicitly control behavior, if you want different behavior you can simply change that argument.

indygreg added a commit that referenced this issue Dec 28, 2020
This makes it consistent with stream_writer and inline with what Python
developers recommend. See discussion in #76.
indygreg added a commit that referenced this issue Dec 29, 2020
According to conversation in #76 and linked from there, flush() should
propagate. This commit makes that change for ZstdCompressionWriter.flush().
indygreg added a commit that referenced this issue Dec 29, 2020
This makes behavior consistent with ZstdCompressionWriter. See #76 for
discussion on this topic.
rdunklau pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
rdunklau pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
rdunklau pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
juha-aiven pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
juha-aiven pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
alexole pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
rdunklau pushed a commit to Aiven-Open/rohmu that referenced this issue Apr 27, 2022
File interface is currently broken in python-zstandard: indygreg/python-zstandard#76
As a workaround using internal file wrapper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants