-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nc_get_vars incredibly slow in Windows compared to Linux #2721
Comments
I would rewrite the code to try to use vara to see if the speed problem goes away. |
You mean use vara to read the values with stride 1 and then do the slicing later? |
Use vara and jump around to get the slicing you need, so you are reading the exact same data, but without vars. |
Hello Ed, Even with nc_get_vara, I still find that Windows is significantly slower: Windows time: 102 seconds The only change I made to the previous code is to replace And |
I am taking a look at this to see if I can determine if the slowdown is in libnetcdf, or if it is something in libhdf5. @abhibaruah a couple questions, if I may, to ensure I'm on the same page.
Since we're using libhdf5 for file access, my fear is that this is an issue in libhdf5; that may limit our ability to address this. But it's not necessarily the case. I'll start by reproducing the issue, and go from there :). |
Thanks @WardF for taking a look.
|
I recall that this issue was raised some time ago. If memory serves, we proposed to convert vars code to use the corresponding HDF5 operations (I assume we are talking netcdf-4 and not netcdf-3). But apparently this proposal never got implemented. |
Was the proposed change to use the corresponding HDF5 operations only for Windows? |
I'm making some progress on this; I haven't narrowed it down to a solution, yet, but I'm able to replicate the observed issue using netCDF |
@abhibaruah I'm seeing some mostly consistent results; out of curiosity, can you give it a try with |
Hello @WardF , Currently, we do not have v4.9.2 in our harness, and hence it will be difficult for me to build v4.9.2 with HDF5 v1.10.10 (will have to go through legal and administrative hoops for that). I can download the Windows binaries from here (https://downloads.unidata.ucar.edu/netcdf/) and give it a try but I am guessing that you must have already tried it. |
Let me clarify, thanks :). I'm seeing results consistent with what you've described, and I'm seeing them in a way I've been able to reproduce them. I'm not certain what the underlying issue is, but I am seeing much faster speeds using netCDF-C I'm at a loss as to why this is only happening in Linux, and will continue trying to figure that out. I've tested with HDF5 1.10.10 as well as HDF5 1.14.1; the results are the same when using |
Just a note to follow up, HDF5 1.14.2 is out, I'm going to try to test this on Windows. I understand there are hoops to jump through, but the issue does appear to be related to the underlying HDF5 library. |
@WardF I am not sure why I am still seeing the slowness on WIndows. |
@abhibaruah thank you, that is good to know at least, the HDF5 script does suggest it is something in netCDF, although why it would be Windows specific is puzzling. I'll pop this back to the top of the stack and see what I can sort out. |
Hello @WardF Windows time - ~123 s Let me know if you find any new information regarding the same. Thanks, |
I also tried the netCDF repro steps with older versions of netCDF. Here are the results (in seconds).
4.6.1 284.1 228 Looks like the Windows regression was introduced sometime between 4.8.1 and 4.9.1. |
Thank you @abhibaruah, that certainly narrows it down. Thank you for bringing this back to the top of the stack, I will see what I can do to dial it in. If I can come up with a test on Windows to replicate this (I should be able to), I can do a |
OS: Windows 10
NetCDF version: 4.9.1
I am trying to read a 3D double variable (2000 x 512 x 512) from a netCDF4 file with the following parameters:
start = {0,0,0}
count[] = {1000, 256, 256};
stride[] = {2, 2, 2};
chunk size: {20, 10, 10}
shuffle: no
deflate : yes
deflate_level : 6
I time the call to nc_get_vars.
On Debian 11, it takes ~25 seconds.
On Windows 10, it takes ~130 seconds.
I would expect Windows to be slightly slower, but >5x slowdown is unexpected.
I see similar slowdowns with 'nc_get_vars_double'
On the contrary, using 'nc_get_var_double' or 'nc_get_var' to read the whole variable is significantly faster (~3 sec on Linux, and ~1 sec on Windows)
Is there a way to optimize the performance of 'nc_get_vars' or 'nc_get_vars_double' so that Windows performance is closer to Linux performance?
Is reading the whole variable using 'nc_get_var' to memory and then slicing it later an option? I have seen that there were some discussions regarding this (Make netcdf-4 use the the stride > 1 facilities of hdf5 #908) and that a submission was made to make strided reads faster. But for my variable, reading the whole variable still seems to be significantly faster than strided reads (especially on Windows)
Please find the link to the nc file here.
Here is my code:
The text was updated successfully, but these errors were encountered: