Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: knitr cache dependency on box file versions #142

Closed
ashirwad opened this issue Jan 20, 2020 · 5 comments
Closed

Feature: knitr cache dependency on box file versions #142

ashirwad opened this issue Jan 20, 2020 · 5 comments

Comments

@ashirwad
Copy link
Contributor

Hello,

I was wondering if it's possible to make a knitr cache depend on Box file versions? So, anytime a newer version of a file is available, I want to invalidate the cache and rerun the code block when knitting. Also, for files in a folder that I am merging into a single data frame, I want to check if any of the files in the folder has a newer version available and run the code again.

Thanks!

@ashirwad
Copy link
Contributor Author

Never mind, I figured out how to do it:

  1. Create a function that returns the current version number of a file.
# Get current file version number from Box
box_current_version <- Vectorize(
  function(file_id) {
    if(is.null(boxr::box_previous_versions(file_id))) {
      return("V1")
    }
    
    # Get latest previous version
    prev_version <- boxr::box_previous_versions(file_id) %>%
      dplyr::pull(version) %>%
      dplyr::last() %>%
      as.character()
    
    # Get current version
    stringr::str_replace(
      prev_version, 
      "[:digit:]", 
      as.character(
        as.integer(stringr::str_extract(prev_version, "[:digit:]")) + 1
      )
    )
  }
)
  1. Add cache = TRUE and cache.extra = boxr::box_current_version(file_id) to an R code block.
df <- boxr::box_read(file_id)

Also possible to pass multiple file ids, i.e., cache.extra = boxr::box_current_version(c(file_id1, file_id2)) if multiple files are being read inside a code chunk.

df1 <- boxr::box_read(file_id1)
df2 <- boxr::box_read(file_id2)
  1. For dependency on ANY file change inside a box folder without a sub-folder, use cache.extra = boxr::box_current_version(dplyr::pull(dplyr::as_tibble(box_ls(folder_id)), id))
boxr::box_fetch(dir_id = folder_id, local_dir = tempdir(), recursive = FALSE)
# do further processing

@nathancday
Copy link
Member

Awesome, glad you figured it out!

Question for you, do you think the behavior of box_previous_versions() should just return "v1" when it is the first. This is a recent fix and I see you did some extra result handling there to process the message/NULL return.

I think you've highlighted a very common and useful use-case for the package, I wonder if box_current_version() should be added too.

@ashirwad
Copy link
Contributor Author

ashirwad commented Jan 29, 2020

Hey, @nathancday! I think box_previous_versions() returning "v1" when it's the first will lead to ambiguity because from the function name it will appear that "v2" is the current version unless you print an additional message. Also, I think it would be good to add box_current_version() to the existing list!

@ashirwad
Copy link
Contributor Author

ashirwad commented Jun 6, 2020

Hey @nathancday, I found another use case for box_current_version(). I have recently started using the workflowr package to set up a reproducible website for my research project. In addition to tracking the version of code that produced the analyses, I also want to track the version of the file that underpins the analyses. I can use the workaround that I showed previously, but do you know if there is a better way to directly query for the current version? If so, addition of box_current_version() will be really helpful!

@nathancday
Copy link
Member

Box doesn't support getting that directly in their API, which is why box_previous_versions() behaves like it does, it only . I don't know if this would work for you right now, but each time you upload a file you get back the version information in the response:

file_info <- box_ul("my_file.txt")
file_info$etag + 1 # current version

I'll look into adding box_current_version() as a shortcut of the previous logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants