Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train problem #7

Closed
xxxiaosong opened this issue Jun 26, 2023 · 15 comments
Closed

Train problem #7

xxxiaosong opened this issue Jun 26, 2023 · 15 comments

Comments

@xxxiaosong
Copy link

Hello! The following issues were encountered during the training process. Could you give me some guidance?
image

@hulianyuyy
Copy link
Owner

hulianyuyy commented Jun 26, 2023 via email

@xxxiaosong
Copy link
Author

Thank you for your reply! The dataset soft link seems to be successful because I can preprocess the dataset.

@hulianyuyy
Copy link
Owner

According to the information in your screenshot, the model fails to read the sign language images and raises an error "list index out of range". To locate the issue, you can check the type of the input data in the dataloader_video.py by 'print(type(video))' after line 107. You will mostly get a None output. Call me if you have further problems.

@xxxiaosong
Copy link
Author

Thank you for your reply! The problem has been resolved. I used an RTX3080 with 10GB memory for training, and in order to prevent OOM, I set the batchsize to 1. However, after the first epoch ended, the following error occurred, and I suspect it was OOM again.
image

@hulianyuyy
Copy link
Owner

hulianyuyy commented Jun 28, 2023 via email

@xxxiaosong
Copy link
Author

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows:
https://github.com/parlance/ctcdecode/issues/124

@hulianyuyy
Copy link
Owner

hulianyuyy commented Jun 29, 2023 via email

@xxxiaosong
Copy link
Author

Hello! I am here again. I have replaced a new device to run this project. But I have encountered some inexplicable errors. The error message is as follows.
image
image
Could you give me some guidance?

@hulianyuyy
Copy link
Owner

I don't have exact answers, but i figure that this may be attributed to that you may set a different number of classes with the target datasets? This issus may be related with the number of classes.

@xxxiaosong
Copy link
Author

Thank you for your reply. The problem has been resolved. This is the problem with sclite. I have another question, can I execute the following command to continue training with the saved weights when my training process unexpectedly terminates.

python main.py --load-weights work_dir/baseline_res18/dev_23.60_epoch15_model.pt

@hulianyuyy
Copy link
Owner

hulianyuyy commented Jul 21, 2023 via email

@xxxiaosong
Copy link
Author

Haha. Thank you. I used the wrong command.

@youthxin
Copy link

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

@hulianyuyy
Copy link
Owner

You

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

You may refer to this issue and this.

@youthxin
Copy link

You

Thank you for your reply! Not due to insufficient disk space. The problem has been resolved as follows: https://github.com/parlance/ctcdecode/issues/124

Hello, I have also encountered such a problem. The link is invalid. How was it resolved

You may refer to this issue and this.

Thank you very much for taking the time to reply to me. I will try it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants