-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate segfaults with NetCDF4 >1.6.0
#5016
Comments
@bjlittle has shared how xarray handles NetCDF4 not being thread-safe: https://github.com/pydata/xarray/blob/main/xarray/backends/file_manager.py#L52 |
Ping @valeriupredoi 👍 |
awesome, cheers @bjlittle - also, let me know if I can help 🍺 |
Current thinkingFix will be a lot of work. We should get started as soon as we can. But we don't need to coincide with the planned release schedule. Rationale
|
@valeriupredoi will Iris |
@trexfeathers this was recently mentioned ESMValGroup/ESMValCore#1776 (comment) |
Note : IIUC experiment is showing that the problem still occurs with a process-based scheduler, See also #5031 (lazy saving trial code) : We see that the mutex problems there are solved by xarray with file-specific locks, which is probably what we need here too. |
IMHO code changes for this should probably be well localised -- just the netcdf load+save modules. So, not a huge rewrite (and not much to document for users, either). Aside : note also the approach of Xarray, as mentioned above : Its locking solution also has a (netcdf) file caching scheme to improve efficiency : It's probably not a requirement, but it would be interesting to see if this approach would be worth copying too. After all, the current NetcdfDataProxy objects are opening+closing the file for every chunk of data read, which at some point has to be pretty inefficient (!) |
On the urgency,@valeriupredoi has recently said "pinning the package is not a long term solution" |
sorry I forgot to get meself here, guys! No, not a problem at all - as @zklaus pointed out, moving to Python=3.11 gonna be a bit lengthy anyway since we'd have to wait for all our dependency ducks to get in line with the new Python. But in the long run (ie a few months) it'd be good to have it. Let me know if I can help you in any way BTW 🍺 |
hi guys, heads up and hope you don't mind I opened a repodata patch PR to get the pin in for previous iris versions since it's a bit of a nightmare for us to pin/repodata patch, plus, it's better for the other packages that use iris (initially we wanted to repodata patch in ESMVal-suite, but @bouweandela had the good idea to patch iris instead, and got this through a vote with the other ESMVal tech leads) - please have a look when you have time conda-forge/conda-forge-repodata-patches-feedstock#358 |
OK that PR got merged so iris should now be repo-patched to use only netCDF4 <1.6.1 (after nearly destroying your conda forge package by replacing all the deps with netCDF4 <1.6.1 instead of appending it to them 🤣 ) |
@valeriupredoi are any of you in ESMValTool able to test against a branch on my fork? |
Martin, sure, I'll take it for a spin a short bit! @bouweandela @zklaus may want to look at it as well, Cheers for working on this issue, mate 🍺 |
hi @trexfeathers I have installed locally your fork but I am getting a lot of iris-related issues a la:
in fact, most of ESMValCore's 308 failed tests (out of 3038 total tests) are due to these (the FAIL, not the ERROR - that occurred once and I had to remove our test so I can run the test suite). Also running pytest in your dev dir:
Happily no SegFault from running our tests (once, will run more) but am not 100% sure the missing/broken functionality in the dev iris may not mask the SegFaults |
BTW we can take the discussion to your fork, can open issues there, if you want me to (just so we don't heavy load the discussion on this thread)? 🍺 |
Sounds good. We can talk on #5095. |
📰 Custom Issue
A bad interaction between Iris, Dask and NetCDF4 is causing segfaults when loading. This is apparently fixed if Dask is set to single-threaded mode - see Unidata/netcdf4-python#1192 (comment)
This workaround removes a large part of the benefit of using Dask in Iris, so we need to find out if we can instead fix via changes to the Iris loading code.
Before the next release (
v3.4
), 1 of the below needs to happen:<=1.6.0
The text was updated successfully, but these errors were encountered: