Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

空间不足问题 #16

Closed
percise opened this issue Nov 15, 2023 · 12 comments
Closed

空间不足问题 #16

percise opened this issue Nov 15, 2023 · 12 comments

Comments

@percise
Copy link

percise commented Nov 15, 2023

环境是pytorch1.8.1 python 3.8.18 cuda11.1 ctcdecode0.4成功安装,但是在训练第一轮结束报了内存不足问题,服务器是A100 80g

Uploading 1700021110772.jpg…
求告知大概是什么问题,看了您的其他问题说是版本问题,更换了pytorch1.13.0 也成功安装上了ctcdecode 但是在运行时也会直接报ctc的问题,麻烦给个思路 谢谢 我应该怎么去弄 是和gcc版本有问题吗 目前是11.4的gcc
或者能否告诉我你的环境是什么吗

@percise
Copy link
Author

percise commented Nov 15, 2023

1700021110772
这是问题代码

@hulianyuyy
Copy link
Owner

My environment is pytorch 1.10.1, ctcdecode 0.4.0, python 3.7.1, cuda 11.2. According to other issues, you may upgrade the pytorch version to try it.

@percise
Copy link
Author

percise commented Nov 15, 2023

我的环境是pytorch 1.10.1,ctcdecode 0.4.0,python 3.7.1,cuda 11.2。根据其他问题,您可以升级pytorch版本来尝试。

感谢您的耐心解答,我再去试试

@percise
Copy link
Author

percise commented Nov 15, 2023

我的环境是pytorch 1.10.1,ctcdecode 0.4.0,python 3.7.1,cuda 11.2。根据其他问题,您可以升级pytorch版本来尝试。

你好,我想请教一下 ,空间不足是不是因为内存不足导致的,我看在main.py中有个pin_memory设置为TRUE,他就一直会锁住内存,您的配置内存是多大呢。我现在已经改成false正在尝试了

@hulianyuyy
Copy link
Owner

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

@hulianyuyy
Copy link
Owner

Besides, you may refer to this issue. This is mostly caused by ctcdecode.

@hulianyuyy
Copy link
Owner

You could make some trys. If you still encounter this problem, i will add python decode, instead of ctc decode to perform decoding to get rid of this problem. My schedule is around 11.25.

@percise
Copy link
Author

percise commented Nov 15, 2023

你可以做一些尝试。如果你仍然遇到这个问题,我将添加pythondecode,而不是ctcdecode来执行解码以摆脱这个问题。我的日程安排在11.25左右。

好的,力顶作者,为手语做出贡献!!!

@kido1412y2y
Copy link

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough?
Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply.
image
image
image

@hulianyuyy
Copy link
Owner

I use a single 3090 GPU with 24G memory to train. But i figure that this issue is not raised by GPU memory, since your GPU has 80 GB memory.

Hello, may I ask how much memory was used during training with 24GB of memory? I am using two 3060 and one 12GB of memory. Is that enough? Because I used two GPUs, I changed here in main.py, but when I actually ran the code, the computer only used one GPU and then reported an error. Is there anything I missed? I hope to receive your reply. image image image

About 20 GB memory for batch size of 2. As we use AMP to accelerate training, this code currectly doesn't support multiple GPUs. You may manually disable AMP, or try using batch size of 1 to run this code.

@xxxiaosong
Copy link

Hello, I would like to ask why I use two 4090 graphics cards for training, which is much slower than using a single card.

@hulianyuyy
Copy link
Owner

Maybe you have run some code on the 4090, ans so it slows down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants