Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

frankst-debug · 2022-10-18T01:55:17Z

❓ Questions and Help

I use two gpu to run code on vqav2 dataset using movie_mcan model, the gpu memory is not enough so the batch_size is set to 16, but every time I run the code will cause the server abnormal lag, I use sar -d 3 5 to check the disk read and write, I found that the read speed is very fast, how to improve this problem, when the lag I can't do any operation.
This is my training code
CUDA_VISIBLE_DEVICES=2,3 mmf_run config=projects/movie_mcan/configsqa2/defaults.yaml model=movie_mcan dataset=vqa2 run_type=train env.cache_dir=/data/students/zzj/ env.data_dir=/data/students/zzj/ training.batch_size=16

Here are the read and write speeds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

frankst-debug commented Oct 18, 2022

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

Disk read and write too fast, resulting in server lag, how to reduce read and write speed #1267

Comments

frankst-debug commented Oct 18, 2022

❓ Questions and Help