-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
munmap_chunk(): invalid pointer #358
Comments
Hi, The first thing I notice is the very high More generally, this error can hide a out of memory error. Are you running low on system memory when running the training? |
Hi, Actually I set the I managed to get it running without the train_and_eval, only with train. However, the speed is incredibly slow (~500 words/sec). My For the logs, tensorflow seems to be using the GPUS. I am using the standard tensorflow compiled in the cluster (version 1.9.0), later I will try the latest one with CUDA 10.0 to see if anything changes. Regards, |
I still recommend setting it to a small value. Can you try without this parameter to use the default value?
How does the GPU usage look like when running |
Thanks for all the info. Can you try setting a fixed and smaller shuffle buffer, e.g.: train:
sample_buffer_size: 1000000 |
It finally worked! By the way, what exactly is Thank you so much for your help and quick responses! |
Great! This parameter is to control the level of data shuffling. Instead of reading the dataset sequentially, it will randomly sample sentences from the next I find it a bit surprising that performance is impacted that much when using a very large dataset. I should probably set an upper limit to 5M or something to this buffer. |
Yes, in fact, this solved the long time waiting for filling the shuffle buffer also. |
Just for information, the maximum value I could use for this dataset was 8M for the Everything is working fine. Thanks :) |
Thanks for the information! |
Hello,
I am encountering an error while trying to use onmt-main train_and_eval.
After evaluation, the following error happens: munmap_chunk(): invalid pointer
I am running in a HPC cluster with:
2 x IBM Power9 8335-GTH
4 x GPU NVIDIA V100
CUDA 9.1
CUDNN 7.1.3
Python 3.6.5
The job is running in an exclusive node using all 4 GPUs.
My configuration file is:
The complete stderr dump is attached.
config_basic1.txt
Any ideas about what is causing this and how to solve it?
The text was updated successfully, but these errors were encountered: