-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
losing DatasetInfo in Dataset.map when num_proc > 1 #6585
Comments
Hi ! This issue comes from the fact that The DatasetInfo merge is defined here, in case you'd like to contribute an improvement: Lines 269 to 270 in d2e0034
|
#self-assign |
* try not to merge DatasetInfos if they're equal * fixes losing DatasetInfo during parallel Dataset.map
* try not to merge DatasetInfos if they're equal * fixes losing DatasetInfo during parallel Dataset.map Co-authored-by: Quentin Lhoest <[email protected]>
Describe the bug
Hello and thanks for developing this package!
When I process a Dataset with the map function using multiple processors some set attributes of the DatasetInfo get lost and are None in the resulting Dataset.
Steps to reproduce the bug
This puts out:
Expected behavior
I expect the DatasetInfo to be kept as it was and there should be no difference in the output of running map with num_proc=1 and num_proc=2.
Expected output:
Environment info
datasets
version: 2.16.1huggingface_hub
version: 0.20.2fsspec
version: 2023.9.2The text was updated successfully, but these errors were encountered: