nc_get_vars incredibly slow in Windows compared to Linux #2721

abhibaruah · 2023-07-19T15:31:05Z

OS: Windows 10
NetCDF version: 4.9.1

I am trying to read a 3D double variable (2000 x 512 x 512) from a netCDF4 file with the following parameters:
start = {0,0,0}
count[] = {1000, 256, 256};
stride[] = {2, 2, 2};
chunk size: {20, 10, 10}
shuffle: no
deflate : yes
deflate_level : 6

I time the call to nc_get_vars.
On Debian 11, it takes ~25 seconds.
On Windows 10, it takes ~130 seconds.

I would expect Windows to be slightly slower, but >5x slowdown is unexpected.
I see similar slowdowns with 'nc_get_vars_double'

On the contrary, using 'nc_get_var_double' or 'nc_get_var' to read the whole variable is significantly faster (~3 sec on Linux, and ~1 sec on Windows)

Is there a way to optimize the performance of 'nc_get_vars' or 'nc_get_vars_double' so that Windows performance is closer to Linux performance?
Is reading the whole variable using 'nc_get_var' to memory and then slicing it later an option? I have seen that there were some discussions regarding this (Make netcdf-4 use the the stride > 1 facilities of hdf5 #908) and that a submission was made to make strided reads faster. But for my variable, reading the whole variable still seems to be significantly faster than strided reads (especially on Windows)

Please find the link to the nc file here.
Here is my code:

#include <stdio.h>
#include <string.h>
#include <netcdf.h>
#include <cstdlib>
#include <iostream>
#include <chrono>

int
main()
{
    int status;
    int ncid;
    int varid;

    int elems_x = 256;
	int elems_y = 256;
	int elems_z = 1000;
    double* outData = (double*)malloc (elems_x*elems_y*elems_z*sizeof(double));

    size_t start[] = {0, 0, 0};
    size_t count[] = {1000, 256, 256};
    ptrdiff_t stride[] = {2, 2, 2};

    
    // open the NetCDF-4 file
    status = nc_open("repro_nc4file.nc", NC_NOWRITE, &ncid);
    if(status != NC_NOERR) {
         printf("Could not open file.\n");
    }
   
    // get the varid 
    status = nc_inq_varid(ncid, "my_var", &varid);
    printf("status after inq var = %d\n", status);
    printf("varid = %d\n", varid);

    // get the strided subset
	auto timestart = std::chrono::high_resolution_clock::now();
    status = nc_get_vars(ncid, varid, start, count, stride, outData);
	auto timeend = std::chrono::high_resolution_clock::now();
	auto duration = std::chrono::duration_cast<std::chrono::seconds>(timeend - timestart);
	std::cout << "Execution time: " << duration.count() << " seconds" << std::endl;
    printf("status after getting strided subset = %d\n", status);

    // close the file 
    status = nc_close(ncid);
    printf("status after close = %d\n", status);

    printf("End of test.\n\n");

    return 0;
}

The text was updated successfully, but these errors were encountered:

edwardhartnett · 2023-07-19T15:33:53Z

I would rewrite the code to try to use vara to see if the speed problem goes away.

abhibaruah · 2023-07-19T15:41:14Z

You mean use vara to read the values with stride 1 and then do the slicing later?

edwardhartnett · 2023-07-19T15:42:59Z

Use vara and jump around to get the slicing you need, so you are reading the exact same data, but without vars.

abhibaruah · 2023-07-19T20:51:41Z

Hello Ed,
I tried your recommendation. The issue is that for using 'nc_get_vara', I ll have to read twice as many elements now (since for my original case, the stride is 2). So, instead of 1000 x 256 x 256 elements, I have to read 2000 x 512 x 512 elements.

Even with nc_get_vara, I still find that Windows is significantly slower:

Windows time: 102 seconds
Linux time: 19 seconds

The only change I made to the previous code is to replace
status = nc_get_vars(ncid, varid, start, count, stride, outData);
with
status = nc_get_vara(ncid, varid, start, count, outData);

And
int elems_x = 512;
int elems_y = 512;
int elems_z = 2000;

WardF · 2023-07-20T16:45:14Z

I am taking a look at this to see if I can determine if the slowdown is in libnetcdf, or if it is something in libhdf5.

@abhibaruah a couple questions, if I may, to ensure I'm on the same page.

When you say Windows, you mean Visual Studio, correct? Or a gcc/variant on Windows
What version of libhdf5 are you linking against?

Since we're using libhdf5 for file access, my fear is that this is an issue in libhdf5; that may limit our ability to address this. But it's not necessarily the case. I'll start by reproducing the issue, and go from there :).

abhibaruah · 2023-07-20T16:55:59Z

Thanks @WardF for taking a look.

Yes, I am using Visual Studio (VS2019v16.11.7)
I am linking against HDF5 v1.10.10

DennisHeimbigner · 2023-07-20T19:04:34Z

I recall that this issue was raised some time ago. If memory serves, we proposed to convert vars code to use the corresponding HDF5 operations (I assume we are talking netcdf-4 and not netcdf-3). But apparently this proposal never got implemented.

abhibaruah · 2023-07-21T19:28:02Z

Was the proposed change to use the corresponding HDF5 operations only for Windows?
Because for my use case Linux time is reasonable (~20 sec) vs (>100 sec) for Windows.

WardF · 2023-08-04T21:09:13Z

I'm making some progress on this; I haven't narrowed it down to a solution, yet, but I'm able to replicate the observed issue using netCDF v4.9.1 and HDF5 1.10.10. Testing with netCDF main and HDF5 1.14.1, I see performance in line with what's observed in your linux environment. I'm still trying to determine if the culprit is a change in the netCDF code, or if it's a change in the HDF5 code.

WardF · 2023-08-11T17:45:41Z

@abhibaruah I'm seeing some mostly consistent results; out of curiosity, can you give it a try with v4.9.2?

abhibaruah · 2023-08-11T18:08:04Z

Hello @WardF ,
When you say 'consistent' results, you mean consistent with the slow speeds I saw or similar to the speed on Linux?

Currently, we do not have v4.9.2 in our harness, and hence it will be difficult for me to build v4.9.2 with HDF5 v1.10.10 (will have to go through legal and administrative hoops for that).

I can download the Windows binaries from here (https://downloads.unidata.ucar.edu/netcdf/) and give it a try but I am guessing that you must have already tried it.

WardF · 2023-08-11T19:38:57Z

Let me clarify, thanks :). I'm seeing results consistent with what you've described, and I'm seeing them in a way I've been able to reproduce them. I'm not certain what the underlying issue is, but I am seeing much faster speeds using netCDF-C v4.9.2 (still slightly slower than on Linux, but that could be because of the VM I'm using, etc. But around 45 seconds instead of > 100).

I'm at a loss as to why this is only happening in Linux, and will continue trying to figure that out. I've tested with HDF5 1.10.10 as well as HDF5 1.14.1; the results are the same when using v4.9.1 (> 100 seconds), and faster when using netCDF v4.9.2 ( < 50 seconds), regardless of which version of HDF5 I'm using.

WardF · 2023-10-19T16:35:23Z

Just a note to follow up, HDF5 1.14.2 is out, I'm going to try to test this on Windows. I understand there are hoops to jump through, but the issue does appear to be related to the underlying HDF5 library.

abhibaruah · 2023-11-30T15:53:46Z

@WardF
I tried the repro with netCDF 4.9.2 and HDF5 1.10.11. Unfortunately, I am still seeing the same performance difference between Windows 11 and Debian 11.
Windows 11 : ~130s
Debian 11: ~11s

I am not sure why I am still seeing the slowness on WIndows.
I created an HDF5 script to mimic my repro above (but with an H5 file), and the reading of the dataset is much faster (~30s).

WardF · 2023-12-01T17:56:24Z

@abhibaruah thank you, that is good to know at least, the HDF5 script does suggest it is something in netCDF, although why it would be Windows specific is puzzling. I'll pop this back to the top of the stack and see what I can sort out.

abhibaruah · 2024-08-15T20:46:50Z

Hello @WardF
Hope you are doing well. I tried the repro steps for this issue with netCCDF 4.9.2 + HDF5 1.14.4.3 and I could still see the slowdown.

Windows time - ~123 s
Debian 12 time - ~15 s

Let me know if you find any new information regarding the same.

Thanks,
Abhi

abhibaruah · 2024-08-19T13:33:32Z

I also tried the netCDF repro steps with older versions of netCDF. Here are the results (in seconds).

        Windows            Linux

4.6.1 284.1 228
4.8.1 17.8 10.51
4.9.1 115.55 12.25
4.9.2 140 23

Looks like the Windows regression was introduced sometime between 4.8.1 and 4.9.1.

WardF · 2024-08-19T17:20:56Z

Thank you @abhibaruah, that certainly narrows it down. Thank you for bringing this back to the top of the stack, I will see what I can do to dial it in. If I can come up with a test on Windows to replicate this (I should be able to), I can do a git bisect to narrow it down even further. To answer a question I see you asked separately (while I was out of the office on PTO last week), I'm hoping to have rc2 for 4.9.3 out by the end of next week, and then moving forward with the full release barring any feedback which would prevent that. Thanks!

WardF mentioned this issue Aug 4, 2023

Explicitly suppress variable length type compression #2730

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nc_get_vars incredibly slow in Windows compared to Linux #2721

nc_get_vars incredibly slow in Windows compared to Linux #2721

abhibaruah commented Jul 19, 2023

edwardhartnett commented Jul 19, 2023

abhibaruah commented Jul 19, 2023

edwardhartnett commented Jul 19, 2023

abhibaruah commented Jul 19, 2023 •

edited

Loading

WardF commented Jul 20, 2023

abhibaruah commented Jul 20, 2023

DennisHeimbigner commented Jul 20, 2023

abhibaruah commented Jul 21, 2023

WardF commented Aug 4, 2023

WardF commented Aug 11, 2023

abhibaruah commented Aug 11, 2023

WardF commented Aug 11, 2023

WardF commented Oct 19, 2023

abhibaruah commented Nov 30, 2023

WardF commented Dec 1, 2023

abhibaruah commented Aug 15, 2024

abhibaruah commented Aug 19, 2024

WardF commented Aug 19, 2024

nc_get_vars incredibly slow in Windows compared to Linux #2721

nc_get_vars incredibly slow in Windows compared to Linux #2721

Comments

abhibaruah commented Jul 19, 2023

edwardhartnett commented Jul 19, 2023

abhibaruah commented Jul 19, 2023

edwardhartnett commented Jul 19, 2023

abhibaruah commented Jul 19, 2023 • edited Loading

WardF commented Jul 20, 2023

abhibaruah commented Jul 20, 2023

DennisHeimbigner commented Jul 20, 2023

abhibaruah commented Jul 21, 2023

WardF commented Aug 4, 2023

WardF commented Aug 11, 2023

abhibaruah commented Aug 11, 2023

WardF commented Aug 11, 2023

WardF commented Oct 19, 2023

abhibaruah commented Nov 30, 2023

WardF commented Dec 1, 2023

abhibaruah commented Aug 15, 2024

abhibaruah commented Aug 19, 2024

WardF commented Aug 19, 2024

abhibaruah commented Jul 19, 2023 •

edited

Loading