Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch_device errors in latest builds #1118

Closed
shikokuchuo opened this issue Nov 4, 2023 · 4 comments · Fixed by #1122
Closed

torch_device errors in latest builds #1118

shikokuchuo opened this issue Nov 4, 2023 · 4 comments · Fixed by #1122

Comments

@shikokuchuo
Copy link
Contributor

I am encountering

runtime_error("y is not a torch_device")

which is your ==.torch_device method's error message using the latest development build of torch fe44f6b in a script which creates a torch device and works with torch <= 0.11.0.

Moreover I get the following:

> device <- torch_device("cuda:0")
Error in `value[[3L]]()`:
! "length" is not support for objects with class <torch_device/R7>
Run `rlang::last_trace()` to see where the error occurred.

which seems to be a regression caused by #1111

@shikokuchuo
Copy link
Contributor Author

Breaking this down after further investigation:

The "y is not a torch_device" error is actually to do with the use of safetensors. If I set torch.serialization_version to 2 then things work as before. It points to this error occurring during loading of serialised tensors (model and optimiser) saved previously. I haven't had the time to figure out exactly where it happens, but it seems that this is a breaking change for existing code.

The second "length" error only happens within Rstudio, I'm guessing as it automatically tries to determine it's length for the environment pane or something. More of a cosmetic issue.

@shikokuchuo
Copy link
Contributor Author

Thanks I see you've addressed the second point already.

I've pinpointed the first issue. It seems that the 'device' argument of safetensors::safe_load() only takes a character string and not a torch device already created with torch_device().

Specifically where this was causing an issue was with loading of an optimizer state dict to the correct device, in the context of something like the code below:

device <- torch_device("cuda:0")

net <- torch_load(modelfile)
net$to(device = device)$train()
optimiser <- optim_adam(net$parameters)
optimiser$load_state_dict(torch_load(optimfile, device = device))

This worked before, but now the second torch_load() fails, but works if torch_load(optimfile, device = "cuda:0") is specified instead.

@dfalbel
Copy link
Member

dfalbel commented Nov 14, 2023

Thanks for the investigation @shikokuchuo ! This was really helpful! I think #1122 will fix it

@shikokuchuo
Copy link
Contributor Author

Thanks, can confirm this fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants