Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote transfer slow for unversioned data #247

Closed
dberenbaum opened this issue Jan 4, 2023 · 0 comments
Closed

remote transfer slow for unversioned data #247

dberenbaum opened this issue Jan 4, 2023 · 0 comments

Comments

@dberenbaum
Copy link

    w/threadpoolexecutor and the `cats-dogs` dataset:

default remote:

time dvc push -r s3-unversioned
2801 files pushed
dvc push -r s3-unversioned  41.37s user 7.50s system 10% cpu 7:56.26 total
time dvc pull -r s3-unversioned
A       cats-dogs/
1 file added and 2800 files fetched
dvc pull -r s3-unversioned  12.03s user 4.40s system 21% cpu 1:14.68 total

version_aware = true remote:

time dvc push -r s3-versioned
2800 files pushed
dvc push -r s3-versioned  21.65s user 3.40s system 12% cpu 3:13.01 total
time dvc pull -r s3-versioned
A       cats-dogs/
1 file added and 2800 files fetched
dvc pull -r s3-versioned  11.19s user 4.03s system 20% cpu 1:15.42 total

Not sure why versioned remote push performs so much faster than unversioned on my machine after these changes, it may be due to the same listing performance problems noted in the gc issue iterative/dvc#5961 (comment). (we don't do full remote listing for versioned remotes)

Originally posted by @pmrowla in #246 (comment)

@pmrowla pmrowla closed this as completed Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants