Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

Closed
kcajf opened this issue Dec 6, 2024 · 3 comments
Closed

Comments

@kcajf
Copy link

kcajf commented Dec 6, 2024

Describe the bug, including details regarding any error messages, version, and platform.

def load_my_data():
    path = "mydata.arrow"

    with pa.memory_map(path) as f:
        t = pyarrow.feather.read_table(f)
    return t

t = load_my_data()

I have the above script. My arrow file is uncompressed. I was hoping that I would be able to load it without copying, by memory-mapping.
The above doesn't seem to do that. I see no mention of my file path in /proc/{pid}/smaps. Moreover, if I create a large list of load_my_data() results, my resident memory usage goes up and up until I OOM.

Is this expected?
Thanks

Component(s)

Python

@kcajf
Copy link
Author

kcajf commented Dec 6, 2024

This is my fault. Turns out the files are actually compressed (I thought the default compression=None in pyarrow.feather.writer_feather was "no compression" and not lz4).

It would be useful to have a flag that will error a reader if the file can't be memory-mapped.

@kou kou changed the title MemoryMappedFile / memory_map feather table read creates a copy? [Python] MemoryMappedFile / memory_map feather table read creates a copy? Dec 7, 2024
@kou
Copy link
Member

kou commented Dec 7, 2024

The file can be memory-mapped in your case (compressed case).
It just copies (for uncompression) on read.

Can we close this?

@kcajf
Copy link
Author

kcajf commented Dec 9, 2024

Oh, I see, fair enough. Thanks. Happy to close!

@kcajf kcajf closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants