Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to give the dataset multiprocessing_context #34793

Open
ierezell opened this issue Nov 18, 2024 · 1 comment
Open

Allow to give the dataset multiprocessing_context #34793

ierezell opened this issue Nov 18, 2024 · 1 comment
Labels
Feature request Request for a new feature

Comments

@ierezell
Copy link
Contributor

Feature request

In Huggingface Trainer, allow to pass the multiprocessing context : https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

Motivation

For a dataset that is loaded on multiple cpu cores, sometimes the fork method creates problems (with polars for example) and the spawn method is more adapted.

Your contribution

I could do a PR. A fix could be to add one more parameter to Trainer and pass it to the Dataloader down the line.

@ierezell ierezell added the Feature request Request for a new feature label Nov 18, 2024
@Rocketknight1
Copy link
Member

This looks like a crossover datasets/Trainer issue, so cc @lhoestq @SunMarc @muellerzr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants