Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak copy_stream #63

Closed
mikeheddes opened this issue Nov 11, 2018 · 3 comments
Closed

Memory leak copy_stream #63

mikeheddes opened this issue Nov 11, 2018 · 3 comments

Comments

@mikeheddes
Copy link

mikeheddes commented Nov 11, 2018

This is similar to #35 and #40. With zstandard 0.10.2 and Python 3.6.0.

def write_pkl(data, file):
    cctx = zstd.ZstdCompressor()
    stream = io.BytesIO()
    with open(file, 'wb') as f:
        pickle.dump(data, stream)
        stream.seek(0)
        cctx.copy_stream(stream, f)

The data input is a list with ~270000 tuples with Numpy arrays and single numbers. No bytes are written with such a large array but with smaller arrays it works fine. It was working for me with an array of ~200000 long.

My memory consumption goes from ~1.5 GB while creating the array to +35 GB when write_pkl is executed.

@mikeheddes
Copy link
Author

mikeheddes commented Nov 11, 2018

I looked a bit further and saw that stream.seek(0) is never called so I think there is something wrong with pickle.dump. I will close this issue because I think it's not related.

@indygreg
Copy link
Owner

Using io.BytesIO() here will buffer the pickled output then copy that buffered output to a zstd compressor.

Try using a zstd.ZstdCompressor().stream_writer() and feed an instance of that directly to pickle.dump(). That may not work though, as stream_writer() doesn't implement the full io.rawIOBase interface. That's on my TODO list.

@mikeheddes
Copy link
Author

Thank you for your suggestion it works.

def write_pkl(data, file):
    with open(file, 'wb') as f:
        cctx = zstd.ZstdCompressor()
        with cctx.stream_writer(f) as compressor:
            pickle.dump(data, compressor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants