Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] pull: display a lot of jobs #3453

Closed
wants to merge 1 commit into from

Conversation

gurobokum
Copy link
Contributor

Disclaimer

I would like to change the implementation completely but shared for collaboration

How to try

There is a script for simulating behavior
Before
After

TLDR

I tried an approach with limiting amount of visible pbars but would like to change the solution to the single bar that contains information about all files.
The idea was show only N pbars, and when one is finished - add new from the background. It allows to avoid flickering, but it has problems:

  1. Pbars hierarchy (main/nested). At this moment main pbar is defined as pbar with 0 position, but the order is not guaranteed. Can be addressed by more smart main bar definition, relates progress: inconsistent nested format when no main bar #3452
  2. When files are small - there is flickering
  3. the solution interacts with a lot of internal Tqdm stuff

Close #2905


  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @JIoJIaJIu, it'd be good if you could create a separate PR and close this if you intend to change the implementation. 🙂

It could be useful later.

@shcheklein
Copy link
Member

It's definitely a good first step. The main intention is too show that something is happening and probably it's fine to limit the number of pbars if we can somehow show the total progress as well and make sure that it's clear from the UI that number of pbars is artificially limited.

I tried an approach with limiting amount of visible pbars but would like to change the solution to the single bar that contains information about all files.

how would it look like? do we show names, or only number of files?

can we take a look at how other tools like lftp, rclone show this? just to see if we like some ideas?

@gurobokum
Copy link
Contributor Author

@skshetry I have plans to change implementation, but would like to get your opinion about the current one. Maybe things change I will keep/improve this one

Copy link
Contributor

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're trying to do here, but it's not really a full fix (it's just the temporary patch I was thinking of in #2905 (comment)).

There's a more meta comment which maybe needs to be addressed first: #2905 (comment)

@gurobokum
Copy link
Contributor Author

I see what you're trying to do here, but it's not really a full fix (it's just the temporary patch I was thinking of in #2905 (comment)).

There's a more meta comment which maybe needs to be addressed first: #2905 (comment)

Hi @casperdcl , you mentioned that it's not a full fix, but how do you see it should work at the latest stage?
I see that MAX_LINES should be calculated dynamically based on terminal rows, what else?
Maybe there should be completely different approach you see

from having more than about 5 threads?

As I got for network operations user gets benefit with multiple open connections, it would be great if you can share some info about it @efiop. Another point that it's just an interface - we provide option to point jobs parameter, that can be custom and is not limited to cpu count

@casperdcl
Copy link
Contributor

casperdcl commented Mar 17, 2020

@JIoJIaJIu there are a few small bugs I can see with this PR but it's not really worth getting into if we can avoid it altogether by say setting jobs = min(10, args.jobs).

Apart from that (in a general use case/if we decide there should be no upper bound on threads) there probably should be a PR on the upstream https://github.com/tqdm/tqdm repo...

Anyway for DVC purposes, the best way to go about this is to collect info about all tasks first (e.g. total number of bytes) and have them all report back to just one total bar if there are more than say 10 tasks.

@shcheklein / @efiop lemme know if actually you would prefer some per-task info printed at least after each task completes. If you do, this may be fairly easy to solve using something similar to this PR.

@shcheklein
Copy link
Member

@casperdcl there are definitely cases where 10 or even 20 jobs is not enough. Since some workloads are IO bound it might be beneficial to have 100 jobs, so I doubt that it's a good solution to limit to some upper bound.

Could we check some other tools that do parallel processing? sync tools, package managers, etc .. to see some ideas and how people solve this problem.

@casperdcl
Copy link
Contributor

Well I thought originally nobody wanted this approach (#2905 (comment)) but it looks like a temporary patch is indeed best for now until jobs are combined into a single bar in other PR(s).

Rather than commenting a lot on this PR I'll open a new one.

@casperdcl
Copy link
Contributor

@JIoJIaJIu feel free to take over #3500

@efiop
Copy link
Contributor

efiop commented Mar 27, 2020

Closing in favour or #3500

@efiop efiop closed this Mar 27, 2020
casperdcl added a commit to casperdcl/dvc that referenced this pull request Mar 28, 2020
efiop pushed a commit that referenced this pull request Mar 29, 2020
* progress: limit maximum running bars

Fixes #2905
Closes #3453

* progress: add hidden message for overflow bars

* progress: fix nested bars properly

Just use latest `tqdm` :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pull: going completely crazy on large dataset
5 participants