Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: option to set cache dir to permanent directory #106

Merged
merged 1 commit into from
Apr 8, 2020

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Nov 16, 2019

Closes https://github.com/darribas/contextily/issues/46

cc @uvchik

For now, just added a function that the user can call as (and didn't add it in other places, like add_basemap):

contextily.set_cache_dir("/path/to/local/cache/")

I used set_cache_dir, but @uvchik made the suggestion to call it use_cache :

I would name it use_cache because one has to execute it every time one wants to plot. For me 'set' indicates, that I have to execute it once. But I am not a native speaker and I am open to other names.

(note that this "every time one wants to plot is not "for every add_basemap() call", you only need to do it once in your python session (eg put it at the beginning of your script or notebook), but it is true that you still need to do it every time once per session, it's not a permanent setting that survives across python sessions. So use_cache might then indeed be better, no strong feeling here)

I am not sure whether we should add a default value.

Since by default it is not called, I think it is fine to have the path a required argument without default (as user will only call this function if they want to set a specific cache directory, by default, a temporary directory is created by contextily which is destroyed at the end of the python session).

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 92.978% when pulling 9eea895 on jorisvandenbossche:cache-dir into 3e48cf9 on darribas:master.

@darribas
Copy link
Collaborator

So let me see if I get it correctly: if the user runs this method (whatever we end up calling it), say when contextily is imported, all the tiles pulled in the session will go into the directory specified. A couple of questions/comments/suggestions:

  • Is the dir removed automatically everytime the session is closed? Should we make that a parameter in the function (auto_discard=True/False or something like that?)
  • If the user does not remove it, can it be enabled for a future session? ie. if the user path contains tiles, they can be used without download.
  • Do we want to set a default path in so the user can just run the method without arguments and get a decent default?

I think this looks like a cool addition btw!

"""
memory.store_backend.location = path


def _clear_cache():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this method integrated with set_cache_dir? For example but setting tmpdir to path automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in the current design, as we want to delete the default tmpdir at the end, but the cache dir set with set_cache_dir does not need to be removed at the end of a session

@darribas darribas mentioned this pull request Nov 18, 2019
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 18, 2019

So let me see if I get it correctly: if the user runs this method (whatever we end up calling it), say when contextily is imported, all the tiles pulled in the session will go into the directory specified.

Yes, that is correct. But note that by default, we also download all tiles into a specific directory to cache them, we only do that in a temporary directory that will be deleted afterwards.

Is the dir removed automatically everytime the session is closed? Should we make that a parameter in the function (auto_discard=True/False or something like that?)

No, that is not done, as it is specifically the purpose of using this method to not do that. If you are OK with deleting the cache at the end of the session, you can use the default temp-dir cache. A user would use this function to set a non-temp specific directory to have the cache live across several sessions.

If the user does not remove it, can it be enabled for a future session? ie. if the user path contains tiles, they can be used without download.

That's the purpose of the this functionality, so the same cache can be re-used for a next session. But, the user will still need to call contextily.set_cache_dir(..) in that next session to actually use that existing cache.

Do we want to set a default path in so the user can just run the method without arguments and get a decent default?

I am not sure this is worth it, as there is already a cache by default (just one that does not live across sessions)
On the other hand, this does make it easier to use this though (you don't have to create and pass a directory there). My only hesitation is a bit to check for

@uvchik
Copy link
Contributor

uvchik commented Nov 20, 2019

The name set_cache_dir(..) would be fine for me, I do not insist on use_cache(..).

I would vote for using the cache by default to reduce server usage. Even though contextily is not the main demand on such map servers, it is always good to reduce traffic.

@jorisvandenbossche
Copy link
Member Author

I would vote for using the cache by default

To be clear: you mean "use the persistent cache by default" (the default is already to cache within a session)? So that would mean to no create a temporary directory (which is destroyed afterwards), but to create somewhere in user data a contextily directory to cache?

The problems that arise from this are things like: when to invalidate the cache?
(that's the reason in my initial PR to add caching functionality, I just limited to within-session caching to avoid those questions :))

@uvchik
Copy link
Contributor

uvchik commented Nov 20, 2019

To be clear: you mean "use the persistent cache by default" (the default is already to cache within a session)? So that would mean to no create a temporary directory (which is destroyed afterwards), but to create somewhere in user data a contextily directory to cache?

Yes

The problems that arise from this are things like: when to invalidate the cache?
(that's the reason in my initial PR to add caching functionality, I just limited to within-session caching to avoid those questions :))

You are right, there are some open questions in that case. A compromise could be to promote the new function in the documentations of contextily and geopandas and let the user decided.

@jorisvandenbossche
Copy link
Member Author

Reading the discussion above again, I think this is actually more or less ready.

I think there are two questions:

  • The name of the function: set_cache_dir vs use_cache
  • Whether we should provide a default (right now there is no default, so the user has to manually provide a directory path)

The first we need to decide upon of course, but the second could also be done as a follow-up (it would be an addition, but not changing any behaviour otherwise).
The main thing to solve for the default cache dir is to find a cross-platform way to determine a good directory (we could eg check how IPyhon decides where to store user data, as their are conventions around this). But so given that added complexity, I would leave that for a follow-up.

@jorisvandenbossche jorisvandenbossche merged commit 2b072b2 into geopandas:master Apr 8, 2020
@jorisvandenbossche jorisvandenbossche deleted the cache-dir branch April 8, 2020 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to cache the downloaded web tiles
4 participants