-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Data Training #313
Comments
@shyama-20 lower your batch_size, you might even have to use batch_size: 1 with 4GB VRAM |
Thank you @KimberleyJensen, the training is running now. Can anyone please tell me approximately how much RAM should my GPU have in case of several GBs of training data? |
The issue is not so much the size of your training data, but really optimization and loading parameters like batch size, model size, or training segment length. The default segment size is 10 seconds, you can lower that a bit, but not too much (this will in general lower quality), setting for instance
You can also try to train a smaller model, lowering a bit the number of channels:
|
I dont remember the memory usage, but you can act a bit on either btch size, segment length or nb of channels until it fits. |
Thank you so much for your time @adefossez! |
❓ Question
Hello,
I have a custom dataset of very small size(only 100s of MBs) and I have been trying to train it using
dora run dset=my_dset
(as mentioned in #221 ) which lead to the Pytorch error "Killed".i) Does this command run the training in CPU by default?
ii) If it is running in CPU, why does the process get killed?
I'm using a system with Intel Core i7-10750H CPU @ 2.60GHz × 12 in Ubuntu 20.04 OS. The system has a RAM of 16GB (currently 12GB free). This is the my_dset.yaml I used (as mentioned in #300). In the config.yaml file I have changed dset.sources, epochs, batch_size and augment.
I tried using
dora run -d dset=my_dset
for training with GPU (NVIDIA GeForce GTX 1650, 4 GB) but ran into "CUDA out of memory" error.I hope I'm following the steps correctly for training a custom dataset.
Can I get a solution for this issue?
Thanks in advance!
The text was updated successfully, but these errors were encountered: