Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zip::zip_list can return absolute file paths #17

Closed
mpadge opened this issue Jun 21, 2021 · 2 comments
Closed

zip::zip_list can return absolute file paths #17

mpadge opened this issue Jun 21, 2021 · 2 comments

Comments

@mpadge
Copy link
Collaborator

mpadge commented Jun 21, 2021

The zip::zip_list() does not necessarily return file names without paths - see ?zip::zip for details. The following code demonstrates (in a possibly OS-dependent manner, so don't worry if you can't reproduce):

f <- file.path (tempdir (), "mtcars.csv")
write.table (mtcars, f, sep = ",")
z <- file.path (tempdir (), "my.zip")
utils::zip (z, files = f, flags = "-q")
zip::zip_list (z)
#>                    filename compressed_size uncompressed_size
#> 1 tmp/Rtmp8OkYhX/mtcars.csv             861              1780
#>             timestamp permissions    crc32 offset
#> 1 2021-06-21 08:40:22         666 16e2dfc6      0

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

In such cases, the names of the items read in import_gtfs will be the full paths, and the returned object retains those names. Everything works, but any other packages expecting a gtfs feed to have standard names will then fail. This is what happens:

library (gtfsrouter)
library (gtfsio)
f <- berlin_gtfs_to_zip ()
g <- gtfsio::import_gtfs (f)
names (g)
#> [1] "tmp/Rtmp2EVj5z/calendar"   "tmp/Rtmp2EVj5z/routes"    
#> [3] "tmp/Rtmp2EVj5z/trips"      "tmp/Rtmp2EVj5z/stop_times"
#> [5] "tmp/Rtmp2EVj5z/stops"      "tmp/Rtmp2EVj5z/transfers"

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

I'll submit a PR to fix the names once #16 has been merged. The package will then need a final check_gtfs_format() or gtfsio_is_valid() or similar function that confirms that everything has the expected format - that all required tables are present, and that all required columns of required tables are also present. We can worry about that once #16 and this issue have been addressed. 👍

@dhersz
Copy link
Collaborator

dhersz commented Jun 21, 2021

Great catch, as always, @mpadge. I can fully reproduce this behaviour in my computer as well.

Regarding the check_gtfs_format() function you propose, I suggest using assert_gtfs(), which I created as a validator for the gtfs class (the name is not great, I know, but both {gtfstools} and {tidytransit} had their own validate_gtfs() already, so I opted for the current one).
Currently it checks if all elements inside the GTFS are named and if they inherit from data.frame, but it doesn't check the content of their names. What do you think?

@mpadge
Copy link
Collaborator Author

mpadge commented Jun 21, 2021

The names are official, and so checking them should be part of assert_gtfs(). I would suggest that it should assert that all required names exist, and that any other names match one of the optional names and nothing else. I'll leave you to do that, and i'll PR with a fix for this. Thanks!!

mpadge added a commit to mpadge/gtfsio that referenced this issue Jun 21, 2021
@dhersz dhersz closed this as completed in bce23f9 Jun 21, 2021
dhersz added a commit that referenced this issue Jun 21, 2021
strip file paths from zip_list; closes #17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants