Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better TQDM output #3654

Merged
merged 14 commits into from
Feb 3, 2022
Merged

Better TQDM output #3654

merged 14 commits into from
Feb 3, 2022

Conversation

mariosasko
Copy link
Collaborator

@mariosasko mariosasko commented Jan 31, 2022

This PR does the following:

  • if dataset_infos.json exists for a dataset, uses num_examples to print the total number of examples that needs to be generated (in builder.py)
  • fixes tqdm + multiprocessing in Jupyter Notebook/Colab (the issue stems from this commit in the tqdm repo: tqdm/tqdm@f7722ed)
  • adds the missing drop_last_batch and with_ranks params to DatasetDict.map
  • correctly computes the number of iterations in map and the CSV/JSON loader when batched=True to fix tqdm progress bars
  • removes the bool(logging.get_verbosity() == logging.NOTSET) (or simplifies bool(logging.get_verbosity() == logging.NOTSET) or not utils.is_progress_bar_enabled() to not utils.is_progress_bar_enabled()) condition and uses utils.is_progress_bar_enabled to check if tqdm output is enabled (this comment from @stas00 explains why the bool(logging.get_verbosity() == logging.NOTSET) check is problematic: [logging] unable to turn off tqdm logging transformers#14889 (comment))

Fix #2630

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome thank you ! Looking forward to see this in action

It helps a lot to understand what's happening exactly, we have many progress bars now x)

src/datasets/arrow_dataset.py Show resolved Hide resolved
src/datasets/io/csv.py Outdated Show resolved Hide resolved
src/datasets/utils/py_utils.py Outdated Show resolved Hide resolved
@mariosasko
Copy link
Collaborator Author

@lhoestq I've created a notebook for you to see the difference: https://colab.research.google.com/drive/1by3EqnoKvC2p-yKW4lPDGOFOZHyGVyeQ?usp=sharing.

Feel free to suggest better descriptions for the progress bars.

If everything looks good, think we can merge.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome ! Love it :)

@mariosasko mariosasko merged commit 6ed6ac9 into master Feb 3, 2022
@mariosasko mariosasko deleted the better-tqdm-output branch February 3, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Progress bars are not properly rendered in Jupyter notebook
2 participants