-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu usage for Rnn Training #172
Comments
Are you using your own feature embedding npys, or the provided ones? |
I use both my own features and the features of 2000 hours available in huggingface |
I see, I believe the issue comes from this line of the train function: openWakeWord/openwakeword/train.py Line 868 in c40fe92
The 75% comes from this line openWakeWord/openwakeword/train.py Line 275 in c40fe92
since it runs the false positive validation test at 75* completion of training. This is not the actual source of the issue though, that is just the reason it occurs at 75% Also are you generating your own features using the training_models notebook? I also think there might be an issue with using these generating embeddings with the automatic model training notebook, as is. I am also working on a robust fix for that, but if you are I can post the sort of bandaid fix I am doing now to make it compatible. |
@EthanEpp is correct, the script currently loads the validation data into memory as it generally is small enough to not cause an issues. Training RNN based models can dramatically increase the memory requirements for training (at least in comparison to the default simple DNN modles), so in this case you may need to make modifications to If it helps, from my testing RNN based models only rarely perform better than DNN models for short wake words. |
Hello, when I want to train the code through train.py, I found that there is no "generate_samples" function. Have you encountered this problem?thx |
With a NVIDIA GeForce RTX 3070 GPU and 16 GB of RAM on my PC, When training my LSTM model on the GPU, I encountered a CUDA out-of-memory error, indicating insufficient GPU memory to allocate tensors. I've tried reducing the batch size and simplifying the model architecture, but the issue persists. Any suggestions or guidance on how to address this problem would be greatly appreciated!
torchvision is not available - cannot save figures
INFO:root:##################################################
Starting training sequence 1...
##################################################
Training: 75%|█████████████████████ | 3750/5000 [00:12<00:04, 310.02it/s]
Traceback (most recent call last):
File "/home/thatsri9ht/python/openwakeword/openwakeword/train.py", line 888, in
best_model = oww.auto_train(
File "/home/thatsri9ht/python/openwakeword/openwakeword/train.py", line 276, in auto_train
self.train_model(
File "/home/thatsri9ht/python/openwakeword/openwakeword/train.py", line 519, in train_model
val_predictions = self.model(x_val)
File "/home/thatsri9ht/anaconda3/envs/openwakeword/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/thatsri9ht/anaconda3/envs/openwakeword/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/thatsri9ht/python/openwakeword/openwakeword/train.py", line 94, in forward
out, h = self.layer1(x)
File "/home/thatsri9ht/anaconda3/envs/openwakeword/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/thatsri9ht/anaconda3/envs/openwakeword/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/thatsri9ht/anaconda3/envs/openwakeword/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 888, in forward
c_zeros = torch.zeros(self.num_layers * num_directions,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 472.00 MiB. GPU
The text was updated successfully, but these errors were encountered: