-
-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem caching instances of torch modules and datasets #2339
Comments
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I'll take a look in this weekend. |
@atusy Thanks. Caching a simple
For information, I think that and torch modules, torch datasets and torch dataloaders are implemented with R6 while torch tensors appears to be implemented with R7 (recently renamed S7) which creates tensor with One way to address the problem of caching objects implemented with reference classes is to explicitly load them in every chunk. The torch package offers methods ( A question I have about the cache mechanism, is whether data cached in a chunk stays in memory or must be reloaded in the next chunk. Apparently, the answer it not simple. Update I have proposed a strategy for caching torch objects in mlverse/torch#1199 (comment). This would actually not require any work to be done on knitr side. It would rely on existing cache mechanism only to skip over some time-consuming chunk, and it would require the user to explicitly save and reload the torch object (I think that reloading the cached object needs to be done only once). |
I confirmed #2340 solves this issue. Here is a reproducible example. ```{r}
library(knitr)
# torch package should implement this
registerS3method(
"knit_cache_hook",
"nn_module",
function(x, nm, path) {
# Cache the object
d <- paste0(path, "__extra")
dir.create(d, showWarnings = FALSE, recursive = TRUE)
f <- file.path(d, paste0(nm, "pt"))
torch::torch_save(x, f)
# Return loader function
structure(function(...) torch::torch_load(f), class = "knit_cache_loader")
},
envir = asNamespace("knitr")
)
```
```{r, cache=TRUE}
lin <- torch::nn_linear(2, 3)
```
```{r}
x <- torch::torch_randn(2)
lin$forward(x)
``` |
That is great! I will refer to your comment in the issue that I raised in the torch repo to probe the maintained to include these hooks in their package. Thank you. |
Caching chunks that create an instance of torch module or of a torch dataset yields an
external pointer is not valid
error when the instance is used in another chunk.Example with torch module:
Example with torch dataset:
If there is no cache, the chunks are executed without problems. However, when a cache exists, an error is created when trying to access the cached instance of the module or of the dataset:
This might be due to the fact that R torch package relies on reference classes (R6 and/or R7) and could be related to issue #2176. In any case, caching would be useful to cache trained instance of a module or instances of datasets which involve a lot processing during initialization.
At the moment, the only alternative is to save the torch model in the cached chunk with
torch_save
and load it in the uncached chunk withtorch_load
(see comments in the chunk above). However, afaik, there is no method to save and load torch datasets.The text was updated successfully, but these errors were encountered: