空间不足问题 #16

percise · 2023-11-15T04:07:35Z

环境是pytorch1.8.1 python 3.8.18 cuda11.1 ctcdecode0.4成功安装，但是在训练第一轮结束报了内存不足问题，服务器是A100 80g

求告知大概是什么问题，看了您的其他问题说是版本问题，更换了pytorch1.13.0 也成功安装上了ctcdecode 但是在运行时也会直接报ctc的问题，麻烦给个思路谢谢我应该怎么去弄是和gcc版本有问题吗目前是11.4的gcc
或者能否告诉我你的环境是什么吗

percise · 2023-11-15T04:08:24Z

这是问题代码

hulianyuyy · 2023-11-15T07:51:25Z

My environment is pytorch 1.10.1, ctcdecode 0.4.0, python 3.7.1, cuda 11.2. According to other issues, you may upgrade the pytorch version to try it.

percise · 2023-11-15T08:05:29Z

我的环境是pytorch 1.10.1，ctcdecode 0.4.0，python 3.7.1，cuda 11.2。根据其他问题，您可以升级pytorch版本来尝试。

感谢您的耐心解答，我再去试试

percise · 2023-11-15T10:19:41Z

我的环境是pytorch 1.10.1，ctcdecode 0.4.0，python 3.7.1，cuda 11.2。根据其他问题，您可以升级pytorch版本来尝试。

你好，我想请教一下，空间不足是不是因为内存不足导致的，我看在main.py中有个pin_memory设置为TRUE，他就一直会锁住内存，您的配置内存是多大呢。我现在已经改成false正在尝试了

hulianyuyy · 2023-11-15T14:25:48Z

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

hulianyuyy · 2023-11-15T14:29:37Z

Besides, you may refer to this issue. This is mostly caused by ctcdecode.

hulianyuyy · 2023-11-15T14:35:47Z

You could make some trys. If you still encounter this problem, i will add python decode, instead of ctc decode to perform decoding to get rid of this problem. My schedule is around 11.25.

percise · 2023-11-15T14:47:24Z

你可以做一些尝试。如果你仍然遇到这个问题，我将添加pythondecode，而不是ctcdecode来执行解码以摆脱这个问题。我的日程安排在11.25左右。

好的，力顶作者，为手语做出贡献！！！

kido1412y2y · 2023-12-06T16:00:47Z

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough?
Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply.

hulianyuyy · 2023-12-07T12:25:46Z

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough? Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply.

About 20 GB memory for batch size of 2. As we use AMP to accelerate training, this code currectly doesn't support multiple GPUs. You may manually disable AMP, or try using batch size of 1 to run this code.

xxxiaosong · 2024-03-10T05:16:15Z

Hello, I would like to ask why I use two 4090 graphics cards for training, which is much slower than using a single card.

hulianyuyy · 2024-03-10T14:56:56Z

Maybe you have run some code on the 4090, ans so it slows down.

hulianyuyy closed this as completed Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

空间不足问题 #16

空间不足问题 #16

percise commented Nov 15, 2023

percise commented Nov 15, 2023 •

edited

Loading

hulianyuyy commented Nov 15, 2023

percise commented Nov 15, 2023

percise commented Nov 15, 2023 •

edited

Loading

hulianyuyy commented Nov 15, 2023

hulianyuyy commented Nov 15, 2023

hulianyuyy commented Nov 15, 2023

percise commented Nov 15, 2023

kido1412y2y commented Dec 6, 2023

hulianyuyy commented Dec 7, 2023

xxxiaosong commented Mar 10, 2024

hulianyuyy commented Mar 10, 2024

空间不足问题 #16

空间不足问题 #16

Comments

percise commented Nov 15, 2023

percise commented Nov 15, 2023 • edited Loading

hulianyuyy commented Nov 15, 2023

percise commented Nov 15, 2023

percise commented Nov 15, 2023 • edited Loading

hulianyuyy commented Nov 15, 2023

hulianyuyy commented Nov 15, 2023

hulianyuyy commented Nov 15, 2023

percise commented Nov 15, 2023

kido1412y2y commented Dec 6, 2023

hulianyuyy commented Dec 7, 2023

xxxiaosong commented Mar 10, 2024

hulianyuyy commented Mar 10, 2024

percise commented Nov 15, 2023 •

edited

Loading

percise commented Nov 15, 2023 •

edited

Loading