Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend env #3858

Closed
wants to merge 5 commits into from
Closed

Backend env #3858

wants to merge 5 commits into from

Conversation

pgierz
Copy link

@pgierz pgierz commented Mar 12, 2020

This merge request allows the user to set a backend_env while opening a file. This should be a dictionary, and the key/value pairs are temporarily added to os.enviorn while opening the file. The old environment is restored later.

  • Closes Custom Table when opening GRIB Files #3853
    Maybe -- I'm not sure if it is clever to actually close this issue at this point. I added the backend_env idea.

  • Tests added

    Here, I need some help: How should I actually design the tests? The environment is only temporarily modified, so as soon as the open_dataset function ends again, the environment is restored. I would have though temporarily adding an equivalent to $ export lala=tada into the environment and checking for that would be an idea; but since I restore the environment right away, there is no way I can see to actually access the new os.environ

  • Passes isort -rc . && black . && mypy . && flake8

  • Fully documented, including whats-new.rst for all changes and api.rst for new API

    I added a section to the relevant docstring. Not sure how much this needs to also be included in the other files.

@pep8speaks
Copy link

pep8speaks commented Mar 12, 2020

Hello @pgierz! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-13 06:29:59 UTC

@shoyer
Copy link
Member

shoyer commented Mar 13, 2020

Thanks for putting together this pull request!

My main concern here is that setting environment variables feels pretty decoupled from the logic of open_dataset. It's also rather poor design to make a library only programmable from environment variables, so I wouldn't want to encourage other backends to make use of this practice.

What do you think about writing your own utility for this sort of thing? e.g., based on one of these examples from StackOverflow:
https://stackoverflow.com/questions/2059482/python-temporarily-modify-the-current-processs-environment

@pgierz
Copy link
Author

pgierz commented Mar 13, 2020

Yes, I agree. Having a library only depend on the environment is less than optimal to say the least. The main reason behind this was to allow one of the GRB backends to read custom tables. I've already contacted both the cfgrib and PyNIO people to see about changing this to something more flexible.

Regarding writing my own utility: I'm afraid this introduces one more hoop for end users to jump through. If I can tell my colleagues "Hey just use Xarray", that normally works pretty well. If I now need to say "Hey, use my own little thing plus Xarray plus ...." (who knows what else) might introduce more friction.

However, it's an interesting idea: How challenging would it be to write my own Xarray backend (In this case for GRB files) and include that instead? Would Xarray be open to including a further backend? That would separate the logic into it's own project and if anything breaks, then the Xarray team would have less to worry about and can just refer issues to the backend development team (so, I guess me 😉)...

@shoyer
Copy link
Member

shoyer commented Mar 13, 2020

If Nio.open_file supported these options as keyword arguments instead of environment variables, these arguments could get set in open_dataset via backend_kwargs.

I think that would be the ideal resolution, but if I recall PyNIO isn't under active development anymore. So in that case, we could consider adding our own solution in xarray to add a new backend argument, but it should be specific PyNIO, not all xarray backends, i.e., it should live entirely in xarray/backends/pynio_.py

To make this work robustly with all of xarray's file caching machinery (used with dask, etc), the setup of the environment variable needs to happen inside a helper function wrapping Nio.open_file, which could replace Nio.open_file on this line:

Nio.open_file, filename, lock=lock, mode=mode, kwargs=kwargs

I think you could do something similar with overriding environment variables here inside this helper function.

Ideally you could also do clean-up here, deleting these environment variables, but I'm not sure if there's an easy/safe way to do this currently. These methods can get called in multiple threads (e.g., from dask) and I don't think we have a global clean-up mechanism that would work for this.

@jhamman
Copy link
Member

jhamman commented Jan 5, 2023

I believe this can be closed now. Pynio is on the way out and it seems like we were leaning away from including this anyway.

@pgierz - please feel free to reopen if I have that wrong.

@jhamman jhamman closed this Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom Table when opening GRIB Files
5 participants