-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote: loading dirs is exteremely slow because of PathInfo #3635
Comments
Actually, same with dumping. PathInfo is extremely slow when we need lots of it. |
Using it to load dirs takes around 50 sec compared to 0.4 with simple `replace()`. Path objects are great, but not when you need lots of them. Related to iterative#3635
Using it to load dirs takes around 50 sec compared to 0.4 with simple `replace()`. Path objects are great, but not when you need lots of them. Related to #3635
Another confirmation #3634 (comment) |
The issue is that in Lines 393 to 397 in 52369bd
Lines 45 to 47 in 52369bd
For *nix systems, |
Since we don't actually need the relative path until we are printing/logging status for modified/pushed/pulled files, one easy optimization is that we can just put But this does not cover the case where we eventually have to print/log (and compute relpath for) millions of modified files. To get better behavior in that case, we could implement our own versions of for S3 remote with 2M files, deferring the relpath call drops the @efiop thoughts? |
@pmrowla Very interesting idea about relpath/abspath re-implementation! But I'm a bit unsure if it is really worth creating such a general hack for it, when we could do a local hack something like so:
At least it won't lose the hack context this way. Plus, I feel like path_info might bite us not only in relpath department but in others too, where we won't be able to provide a good general hack in the future, so it is better to keep it local for now. |
This https://github.com/iterative/dvc/blob/0.93.0/dvc/remote/base.py#L290 takes ~50 seconds compared to ~0.4 for simple
replace()
that we used to have on our imagenet dataset.The text was updated successfully, but these errors were encountered: