Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--cuda option with strange results #356

Closed
fmobrj opened this issue Jan 3, 2019 · 13 comments
Closed

--cuda option with strange results #356

fmobrj opened this issue Jan 3, 2019 · 13 comments

Comments

@fmobrj
Copy link

fmobrj commented Jan 3, 2019

Hello. Thanks for the great work.

When I try to train withoy using my GPU, without the --cuda option set, the training seems to go on normally, with the loss slowly droping, as I would expect. As expected, no GPU activity, using nvidia-smi.

$ python train.py --train-manifest data/libri_train_manifest.csv --val-manifest data/libri_val_manifest.csv

Without the cuda option training results:

image

When I use the --cuda option, the nvidia-smi shows the GPU usage and the training advances much faster, but the results are somewhat strange. The loss starts ate 0 and stays like that for all the training:

image

It seems to be some kind of problem with the input tensor. Any hints?

Best regards,
Fabio.

@fmobrj
Copy link
Author

fmobrj commented Jan 6, 2019

Hello yuxwang1102, I found this binding implementation for baidu warp-ctc: https://github.com/jpuigcerver/pytorch-baidu-ctc.

I just replaced the warp-ctc for this one (import) and now I have losses similiar to the cpu when using --cuda option.

Installed the new package, then I replaced:

"from warpctc_pytorch import CTCLoss"

for

"from torch_baidu_ctc import ctc_loss, CTCLoss"

@fmobrj
Copy link
Author

fmobrj commented Jan 7, 2019

You are welcome, my friend.

import torch
torch.version
'1.0.0'
torch.version.cuda
'9.0.176'

gcc version 6.5.0 20181026 (Ubuntu 6.5.0-2ubuntu1~16.04)

@miguelvr
Copy link
Contributor

miguelvr commented Jan 7, 2019

hey guys, AFAIK both repos work with pytorch 0.4.X and weren't really tested with pytorch 1.0

@SeanNaren
Copy link
Owner

@miguelvr you think that is the issue? sorry for the silence, I'll hopefully get time to address this ASAP

@fmobrj
Copy link
Author

fmobrj commented Jan 7, 2019

Thanks, Miguel!

For some reason, I couldn't make it work with --cuda be it torch 0.4.X or 1.0.0 using the warp-ctc binding from the README instructions. After trying everything (updated gcc version, multiple torch versions, even Python versions), I found the alternative repo for the ctc bindings and tried with 1.0.0 and it is working for me.

I suspect it has something to do with the cuda driver.

Hope yux can figure out an environment that works for him.

@miguelvr
Copy link
Contributor

miguelvr commented Jan 7, 2019

@SeanNaren no clue, but it would be worth a try. Also, @fmobrj seems to be using windows, so there's that.

@fmobrj
Copy link
Author

fmobrj commented Jan 7, 2019

@miguelvr I am using Ubuntu, 18.04.1.

@miguelvr
Copy link
Contributor

miguelvr commented Jan 7, 2019

oh sorry, my bad.

@fmobrj
Copy link
Author

fmobrj commented Jan 7, 2019

No problem.

@fmobrj
Copy link
Author

fmobrj commented Jan 7, 2019

@miguelvr you think that is the issue? sorry for the silence, I'll hopefully get time to address this ASAP

Thank you very much, @SeanNaren.

@SeanNaren
Copy link
Owner

Could you change sound.shape[1] == 1: to sound.shape[0] == 1: at: https://github.com/SeanNaren/deepspeech.pytorch/blob/master/data/data_loader.py#L26 and tell me if it works?

@SeanNaren
Copy link
Owner

Well thats a mess... I'll investigate further for whatever happened, it seems like the dimensions have swapped from torchaudio so will just have to transpose before we do any other transformations. Will push a fix once i verify it works!

@SeanNaren
Copy link
Owner

Fixed on the master branch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants