Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance issues with MemoryFileSystem.rm #1725

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

maresb
Copy link
Contributor

@maresb maresb commented Oct 14, 2024

Closes #1724.

The idea is that the nominal case is deleting an existing file, and verifying that a path corresponds to a file is an extremely fast key-existence check on the underlying store (dict). Thus by checking first if it's a file we escape the expensive exists check, which must scan all keys to check if the path is a prefix directory for some file.

Now the time to write a single file is roughly constant (instead of linear in the number of files), while the time to remove all files scales linearly with the number of files (instead of quadratically).

Before:

image

After:

image

Before:

image

$$10^{1.945\, \log_{10}(N) - 6.5} = N^{1.945} / 3\times 10^6.$$

After:

image

$$10^{0.94\, \log_{10}(N) - 5} = N^{0.945} / \times 10^5.$$

Closes fsspec#1724.

The idea is that the nominal case is deleting an existing file, and verifying that a
path corresponds to a file is an extremely fast key-existence check on the underlying
store (dict). Thus by checking first if it's a file we escape the expensive exists check,
which must scan all keys to check if the path is a prefix directory for some file.
@martindurant martindurant merged commit 03e89cc into fsspec:master Oct 15, 2024
11 checks passed
@martindurant
Copy link
Member

A consequence of MemoryFS almost exclusively being used in testing for at most 100s of files.

@maresb maresb deleted the fix-memoryfilesystem-rm branch October 15, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MemoryFileSystem time complexity of rm renders it unusable with large number of files
2 participants