Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_zip error in unzipping directory with a different name from the zipfile #1961

Closed
burnsal opened this issue Mar 8, 2024 · 2 comments · Fixed by #2028
Closed

use_zip error in unzipping directory with a different name from the zipfile #1961

burnsal opened this issue Mar 8, 2024 · 2 comments · Fixed by #2028

Comments

@burnsal
Copy link

burnsal commented Mar 8, 2024

I am running into a missing value error trying to download and unzip large files with the use_zip function.

I was able to isolate the issue to an incompatibility with the top_directory function for extracting the head directory of the zip files in the helper function tidy_unzip. Here is the reprex with output included:

## usethis::use_zip() issue reprex

# Goal: download and unzip large files stored on a Figshare platform

# long link: https://agdatacommons.nal.usda.gov/ndownloader/files/44576230
# bitly link: https://bit.ly/ctdv2sp

# attempt with existing function
> usethis::use_zip("https://bit.ly/ctdv2sp",
                 destdir = "C:/temp")
✔ Downloading from 'https://bit.ly/ctdv2sp'
Downloaded: 312.74 MB  (100%)
✔ Download stored in 'C:/temp/species_v2_3.zip'
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

# separate process into its helper functions to source the error
> usethis:::tidy_download("https://bit.ly/ctdv2sp",
                        destdir = "C:/temp")
Downloaded: 312.74 MB  (100%)

> usethis:::tidy_unzip("C:/temp/species_v2_3.zip")
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed


## Dig deeper into `tidy_unzip`

# ABnote: first, define zipfile downloaded with `tidy_download`
> zipfile <- "C:/temp/species_v2_3.zip"
> file.exists(zipfile)
[1] TRUE

> base_path <- path_dir(zipfile)
> print(base_path)
[1] "C:/temp"

> filenames <- utils::unzip(zipfile, list = TRUE)[["Name"]]
> print(filenames) # you can see that the zipfile has a single-folder directory with a different name than the zip file
[1] "species_v2/label_encoder.txt" "species_v2/model_arch.pt"     "species_v2/model_weights.pth"

# dropbox particularites do not apply here, skip next two lines

> td <- top_directory(filenames)
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

## AH HA! Look at `top_directory` function
> in_top <- path_dir(filenames) == "."
[1] FALSE FALSE FALSE
> unique_top <- unique(filenames[in_top])
character(0)
# I think the code should extract the dir `species_v2` from my zipfile here
> is_directory <- grepl("/$", unique_top)
logical(0)

# address conditionals one at a time
> length(unique_top) > 1 
[1] FALSE
> !is_directory
logical(0)

# conditional statement is throwing the error b/c directory was not correctly extracted
> length(unique_top) > 1 || !is_directory
[1] NA
> if (length(unique_top) > 1 || !is_directory) {
+   NA_character_
+ } else {
+   unique_top
+ }
Error in if (length(unique_top) > 1 || !is_directory) { : 
  missing value where TRUE/FALSE needed

The code creating the top directory is not correctly addressing the structure of my zip files and extracting the unique directory within the zipfile.

# current structure
> in_top <- path_dir(filenames) == "."
> all(in_top) == FALSE
[1] TRUE
> unique_top <- unique(filenames[in_top])
character(0)

# the folder name at the top of the directory inside the zip file should be extracted
> path_dir(filenames)
[1] "species_v2" "species_v2" "species_v2"
> unique(path_dir(filenames))
[1] "species_v2"

A revised top_directory function makes this work:

top_directory <- function(filenames) {
  in_top <- path_dir(filenames) == "."
  unique_top <- unique(filenames[in_top])
  is_directory <- grepl("/$", unique_top)
  if(length(unique(path_dir(filenames)))==1 & length(unique_top) == 0){
    unique_top <- unique(path_dir(filenames))
  } else {
    if (length(unique_top) > 1 || !is_directory) {
      NA_character_
    } else {
      unique_top
    }
  }
}

# run use_zip with the revised helper
use_zip(url = "https://bit.ly/ctdv2sp",
+         destdir = "C:/temp", cleanup=T)
✔ Downloading from <https://bit.ly/ctdv2sp>.
Downloaded: 312.74 MB  (100%)
✔ Download stored in C:/temp/species_v2_3.zip.Unpacking ZIP file into species_v2/ (3 files extracted).Deleting species_v2_3.zip.Opening species_v2/ in the file manager.

I have made this change to a new branch in my forked clone of the repository here, and it passes the devtools::check tests. Please let me know if this change needs to be more generalized or if it is ready for a pull request. Thank you!

@jennybc
Copy link
Member

jennybc commented Mar 8, 2024

Thanks for the analysis! Your proposed change is definitely easier to evaluate in pull request form, so yes please go ahead and make one.

jennybc added a commit that referenced this issue Jul 27, 2024
Fixes #1961, closes #1962.

Analyzing the above gave me a new/different understanding of common ZIP file organization structures. I think I've got a better heuristic for setting `exdir`.
@jennybc
Copy link
Member

jennybc commented Jul 27, 2024

Thanks very much for the .zip examples and analysis you gave above and in #1962! I ultimately reached a different understanding of the root problem and how to fix it. So I'm closing these with an alternative PR to #1962, but the work you did there was very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants