Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dagshub upload: variable batch size. #430

Merged
merged 3 commits into from
Feb 13, 2024
Merged

Conversation

yairl
Copy link
Contributor

@yairl yairl commented Feb 13, 2024

  1. Uploads are now grouped into the minimum of 100 files or 100MB-s each (not configurable).
  2. All uploaded files are discovered before upload begins, to make the global progress bar more useful.
  3. Minor restructuring to make the code cleaner.

Notes:

  • On my machine (Debian 12), the progress bar is printed many times and not just updated in place. This is annoying, wondering whether you'd like that changed as well.
  • In process prints such as "Upload finished successfully!" are confusing and polluting the screen when it's one part of a bigger upload.
  • With this change, you may want to change the file batch size from 100 to a larger number, or remove completely.

Resolves #429.

Copy link

dagshub bot commented Feb 13, 2024

@kbolashev kbolashev self-requested a review February 13, 2024 09:03
@yairl
Copy link
Contributor Author

yairl commented Feb 13, 2024

@kbolashev sorry for the "batchs" mistake.
I hate these :)

Copy link
Member

@kbolashev kbolashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for doing this, this is absolutely better now!!!

There's probably still things that can be done, for example not bounding batches by folders, but I honestly don't remember if our uploading API supports that.

I took a liberty of changing a couple variable names to be more descriptive, hope that's ok.

I also liked the suggestion about not cluttering the output that much on dir upload, so I hide those now when a directory is being uploaded. Pull the changes and see if you like it :)

@kbolashev kbolashev merged commit 570d20c into DagsHub:master Feb 13, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upload failing on dataset with large files
2 participants