Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change progress bar backend to tqdm #2333

Merged
merged 68 commits into from
Aug 21, 2019
Merged

change progress bar backend to tqdm #2333

merged 68 commits into from
Aug 21, 2019

Conversation

casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Jul 28, 2019

Substitute current progress with a thin wrapper around tqdm

details

most optional. Necessary starred (*)

  • * add progress.Tqdm(tqdm.tqdm)
  • * progress-aware logging
    • ensure logging/printing does not break progress
  • unit tests
    • add new unit tests
      • test thread safety
    • * update/fix old tests
  • logging level and progress disabling
  • intelligently suppress progress
    • not really required as tqdm is very efficient and there are now at most N_WORKERS bars which will be cleared upon completion
    • quick operations
    • many files
    • un-suppress for long operations (e.g. if pbar.n > LARGE_FILE_SIZE: pbar.write(str(pbar)))
  • nested progress
    • total files
    • * under which appears a per-file download thread bar
      • * which is cleared away correctly

review TODO list

  • * fix remaining 0% bar when dvc pulling
  • * increase description length to 25
    • or truncate description from right?
  • have a fix bar width?
  • * remove MD5 descriptions
  • * add units (file, md5, B) for all bars

removes

  • logger.ColorFormatter._progress_aware
  • progress.Progress
  • progress.ProgressCallback
  • progress.progress
  • progress.CLEARLINE_PATTERN
  • remote.azure.Callback
  • remote.http.ProgressBarCallback
  • remote.s3.Callback
  • remote.ssh.connection.percent_cb
  • remote.ssh.connection.create_cb
  • repo.checkout.get_progress_callback
  • tests.func.test_checkout.TestCheckoutShouldHaveSelfClearingProgressBar

adds

  • progress.Tqdm(tqdm.tqdm)
  • progress.TqdmThreadPoolExecutor(concurrent.futures.ThreadPoolExecutor)
  • logger.LoggerHandler.emit
  • remote.base.RemoteBASE.get_files_number

modifies

  • many things (refactoring)

depends on

@casperdcl casperdcl self-assigned this Jul 28, 2019
@casperdcl casperdcl changed the title Tqdm chane progress bar backend to tqdm Jul 28, 2019
@casperdcl casperdcl changed the title chane progress bar backend to tqdm change progress bar backend to tqdm Jul 28, 2019
@casperdcl casperdcl added the ui user interface / interaction label Jul 28, 2019
@casperdcl casperdcl requested review from efiop and shcheklein July 28, 2019 15:18
@casperdcl casperdcl added refactoring Factoring and re-factoring enhancement Enhances DVC labels Jul 28, 2019
@efiop efiop changed the title change progress bar backend to tqdm [WIP] change progress bar backend to tqdm Jul 28, 2019
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @casperdcl and welcome to the project! It looks way cleaner now) I've added a few comments below.

dvc/progress.py Outdated Show resolved Hide resolved
dvc/remote/base.py Outdated Show resolved Hide resolved
dvc/remote/s3.py Outdated Show resolved Hide resolved
dvc/repo/checkout.py Outdated Show resolved Hide resolved
@casperdcl
Copy link
Contributor Author

Ah just opened a new issue, at least in part due to tqdm/tqdm#795

@casperdcl
Copy link
Contributor Author

incidentally dvc pull fails now even on master with

WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files

@efiop
Copy link
Contributor

efiop commented Aug 19, 2019

incidentally dvc pull fails now even on master with

Looks like it is due to redirection settings being changed on dvc.org. I'm looking into it.

@efiop
Copy link
Contributor

efiop commented Aug 19, 2019

@casperdcl updated the url on master, please rebase. Pull should be working now.

@casperdcl
Copy link
Contributor Author

Right looks about done now. Had to release a new version of tqdm for this!

@casperdcl

This comment has been minimized.

@efiop
Copy link
Contributor

efiop commented Aug 19, 2019

@casperdcl Once again, sorry for that one, it is WIP #2407 . Will merge soon.

@efiop
Copy link
Contributor

efiop commented Aug 19, 2019

@casperdcl Should be fixed now. Sorry for that.

@casperdcl
Copy link
Contributor Author

will merge master later (it does fix the test)

dvc/remote/base.py Outdated Show resolved Hide resolved
dvc/remote/http.py Show resolved Hide resolved
dvc/remote/http.py Show resolved Hide resolved
dvc/remote/local/__init__.py Outdated Show resolved Hide resolved
return ret

return list(set(checksums) & set(self.all()))
pbar.update_desc("", 0) # clear path name description
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add .clear_desc()? This at least would make the comment not needed.

Copy link
Contributor Author

@casperdcl casperdcl Aug 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably need a better description though.. maybe pbar.update_desc("Cache lookup", 0)?

Copy link
Contributor

@efiop efiop Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Btw, it actually seems pretty weird here, is it clearing the progress bars from batch_exists? You have another similar line pbar.update_desc("Checkout", 0) # clear path name description up above. Maybe pbar.finish() would be better? Is this line used to just leave the trace of the progress bar? If that is the case, we could definitely just drop them, those operations are kinda internal ones, so there is no real need to leave them(those leftover progress bars) in the final output, after they've already served the purpose of notifiying user about what was going on.

Copy link
Contributor Author

@casperdcl casperdcl Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop I wrote this to match the old behavior precisely, where the bar was left behind without any description (the callback would've injected various descriptions). The same is sort of true of the "Checkout" case, where I've replaced a description-less bar + "Checkout finished!" message with simply a completed bar labelled "Checkout".

Basically the new version is either identical to or slightly less verbose than the original.

Happy to change if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl I understand, and the idea for this PR was precisely that: to make a migration happen and then polish everything in the next PRs, but looks like making a few changes along the way is inevitable. If my understanding of those update_desc() described above is correct, then we could simply remove those lines and be done with this 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we can indeed deal with it on top. Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but to be more explicit

old version:

>>> progress_callback.finish("")
[##############################] 100%

new version:

>>> pbar.update_desc("", 0)  # clear path name description
100%|██████████| 3/3 [00:01<00:00,  1.96it/s]

if removed:

...f17c72d: 100%|██████████| 3/3 [00:01<00:00,  1.96it/s]

Actually (unrelated) there's a slight logic error (the line needs to be moved up a little before the function returns).

Copy link
Contributor Author

@casperdcl casperdcl Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop I made a minor change just after you merged. It's now rebased on master as e701221 (if you want to cherry-pick or something).

Copy link
Contributor

@efiop efiop Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Sure, please create a new PR for it.

tests/unit/test_progress.py Show resolved Hide resolved
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Let's proceed with the UI on top of this.

@efiop efiop merged commit 5aa9a2f into iterative:master Aug 21, 2019
@casperdcl casperdcl added the performance improvement over resource / time consuming tasks label Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC performance improvement over resource / time consuming tasks refactoring Factoring and re-factoring ui user interface / interaction
Projects
None yet
4 participants