[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

kcajf · 2024-12-06T13:45:11Z

Describe the bug, including details regarding any error messages, version, and platform.

def load_my_data():
    path = "mydata.arrow"

    with pa.memory_map(path) as f:
        t = pyarrow.feather.read_table(f)
    return t

t = load_my_data()

I have the above script. My arrow file is uncompressed. I was hoping that I would be able to load it without copying, by memory-mapping.
The above doesn't seem to do that. I see no mention of my file path in /proc/{pid}/smaps. Moreover, if I create a large list of load_my_data() results, my resident memory usage goes up and up until I OOM.

Is this expected?
Thanks

Component(s)

Python

The text was updated successfully, but these errors were encountered:

kcajf · 2024-12-06T13:57:50Z

This is my fault. Turns out the files are actually compressed (I thought the default compression=None in pyarrow.feather.writer_feather was "no compression" and not lz4).

It would be useful to have a flag that will error a reader if the file can't be memory-mapped.

kou · 2024-12-07T22:35:57Z

The file can be memory-mapped in your case (compressed case).
It just copies (for uncompression) on read.

Can we close this?

kcajf · 2024-12-09T08:31:14Z

Oh, I see, fair enough. Thanks. Happy to close!

kcajf added the Type: bug label Dec 6, 2024

github-actions bot added the Component: Python label Dec 6, 2024

kou changed the title ~~MemoryMappedFile / memory_map feather table read creates a copy?~~ [Python] MemoryMappedFile / memory_map feather table read creates a copy? Dec 7, 2024

kcajf closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

kcajf commented Dec 6, 2024

kcajf commented Dec 6, 2024

kou commented Dec 7, 2024

kcajf commented Dec 9, 2024

[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

[Python] MemoryMappedFile / memory_map feather table read creates a copy? #44957

Comments

kcajf commented Dec 6, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

kcajf commented Dec 6, 2024

kou commented Dec 7, 2024

kcajf commented Dec 9, 2024