torch_device errors in latest builds #1118

shikokuchuo · 2023-11-04T16:22:54Z

I am encountering

runtime_error("y is not a torch_device")

which is your ==.torch_device method's error message using the latest development build of torch fe44f6b in a script which creates a torch device and works with torch <= 0.11.0.

Moreover I get the following:

> device <- torch_device("cuda:0")
Error in `value[[3L]]()`:
! "length" is not support for objects with class <torch_device/R7>
Run `rlang::last_trace()` to see where the error occurred.

which seems to be a regression caused by #1111

The text was updated successfully, but these errors were encountered:

shikokuchuo · 2023-11-04T18:11:52Z

Breaking this down after further investigation:

The "y is not a torch_device" error is actually to do with the use of safetensors. If I set torch.serialization_version to 2 then things work as before. It points to this error occurring during loading of serialised tensors (model and optimiser) saved previously. I haven't had the time to figure out exactly where it happens, but it seems that this is a breaking change for existing code.

The second "length" error only happens within Rstudio, I'm guessing as it automatically tries to determine it's length for the environment pane or something. More of a cosmetic issue.

shikokuchuo · 2023-11-13T14:28:06Z

Thanks I see you've addressed the second point already.

I've pinpointed the first issue. It seems that the 'device' argument of safetensors::safe_load() only takes a character string and not a torch device already created with torch_device().

Specifically where this was causing an issue was with loading of an optimizer state dict to the correct device, in the context of something like the code below:

device <- torch_device("cuda:0")

net <- torch_load(modelfile)
net$to(device = device)$train()
optimiser <- optim_adam(net$parameters)
optimiser$load_state_dict(torch_load(optimfile, device = device))

This worked before, but now the second torch_load() fails, but works if torch_load(optimfile, device = "cuda:0") is specified instead.

dfalbel · 2023-11-14T12:50:23Z

Thanks for the investigation @shikokuchuo ! This was really helpful! I think #1122 will fix it

shikokuchuo · 2023-11-14T19:50:47Z

Thanks, can confirm this fixes the issue.

dfalbel mentioned this issue Nov 8, 2023

R7 with no custom length shouldn't fail #1120

Merged

dfalbel mentioned this issue Nov 14, 2023

Allow torch_load to work with torch_device() #1122

Merged

dfalbel closed this as completed in #1122 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch_device errors in latest builds #1118

torch_device errors in latest builds #1118

shikokuchuo commented Nov 4, 2023

shikokuchuo commented Nov 4, 2023

shikokuchuo commented Nov 13, 2023

dfalbel commented Nov 14, 2023

shikokuchuo commented Nov 14, 2023

torch_device errors in latest builds #1118

torch_device errors in latest builds #1118

Comments

shikokuchuo commented Nov 4, 2023

shikokuchuo commented Nov 4, 2023

shikokuchuo commented Nov 13, 2023

dfalbel commented Nov 14, 2023

shikokuchuo commented Nov 14, 2023