Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Data Training #313

Closed
shyama-20 opened this issue Apr 1, 2022 · 5 comments
Closed

Custom Data Training #313

shyama-20 opened this issue Apr 1, 2022 · 5 comments
Labels
question Further information is requested

Comments

@shyama-20
Copy link

❓ Question

Hello,
I have a custom dataset of very small size(only 100s of MBs) and I have been trying to train it using dora run dset=my_dset (as mentioned in #221 ) which lead to the Pytorch error "Killed".
i) Does this command run the training in CPU by default?
ii) If it is running in CPU, why does the process get killed?

I'm using a system with Intel Core i7-10750H CPU @ 2.60GHz × 12 in Ubuntu 20.04 OS. The system has a RAM of 16GB (currently 12GB free). This is the my_dset.yaml I used (as mentioned in #300). In the config.yaml file I have changed dset.sources, epochs, batch_size and augment.

my_dset

I tried using dora run -d dset=my_dset for training with GPU (NVIDIA GeForce GTX 1650, 4 GB) but ran into "CUDA out of memory" error.

I hope I'm following the steps correctly for training a custom dataset.

Can I get a solution for this issue?

Thanks in advance!

@shyama-20 shyama-20 added the question Further information is requested label Apr 1, 2022
@KimberleyJensen
Copy link

@shyama-20 lower your batch_size, you might even have to use batch_size: 1 with 4GB VRAM

@shyama-20
Copy link
Author

Thank you @KimberleyJensen, the training is running now.

Can anyone please tell me approximately how much RAM should my GPU have in case of several GBs of training data?

@adefossez
Copy link
Contributor

The issue is not so much the size of your training data, but really optimization and loading parameters like batch size, model size, or training segment length. The default segment size is 10 seconds, you can lower that a bit, but not too much (this will in general lower quality), setting for instance

dset:
    segment: 6

You can also try to train a smaller model, lowering a bit the number of channels:

demucs:
    channels: 48
# or for hybrid
hdemucs:
    channels: 32

@adefossez
Copy link
Contributor

I dont remember the memory usage, but you can act a bit on either btch size, segment length or nb of channels until it fits.
Also to go lower than 4 batch size, you need to change this value: https://github.com/facebookresearch/demucs/blob/main/conf/config.yaml#L63 to be the actual batch size.

@shyama-20
Copy link
Author

Thank you so much for your time @adefossez!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants