NetCDF chunking using extra memory with Iris 2.4 #4107

alastair-gemmell · 2021-04-26T08:05:15Z

On behalf of Scitools user(s):

I have an Iris/Dask/NetCDF question about chunking data and collapsing dimensions, in this case to compute ensemble statistics (ensemble mean/median/standard deviation fields).

The NetCDF files that i'm using have chunking is optimized for this operation. Rather than using the NetCDF file's chunking, Iris/Dask is putting each ensemble member into its own chunk. I think this is resulting in my code trying to read all of the data into memory at once. Is there a simple way to make make Iris use the NetCDF file's chunking when it loads NetCDF data into a Dask array?

Also, the memory requirement for this and other operations seem to have gone up noticeably when moving from Iris 2.2 to Iris 2.4. I've been busy doubling my memory allocations to allow for it. Running on Iris 2.2 this task took a bit over an hour to compute with 16GB RAM allocated. With Iris 2.4 it no longer fits into that memory or within a 2 hour job.

To recreate issue:

First, I created two Conda installations, one with Iris v2.2 and one with Iris v2.4:
conda create --name iris-v2.2 -c conda-forge python=3.7 iris=2.2 memory_profiler
conda create --name iris-v2.4 -c conda-forge python=3.7 iris=2.4 memory_profiler

Then, after activating each environment, I ran (python code below):
python -m mprof run dask_issue.py
mprof plot

For Iris v2.2:
Loading the data:
Has lazy data: True
Chunksize: (1, 1, 36, 72)

Extracting a single year:
Has lazy data: True
Chunksize: (1, 1, 36, 72)

Realising the data of the single year:
Has lazy data: False
Chunksize: (9, 12, 36, 72)

For Iris v2.4:
Loading the data:
Has lazy data: True
Chunksize: (1, 2028, 36, 72)

Extracting a single year:
Has lazy data: True
Chunksize: (1, 12, 36, 72)

Realising the data of the single year:
Has lazy data: False
Chunksize: (9, 12, 36, 72)

The plots produced by memory_profiler shows the memory usage when realising extracted data using Iris v2.4 is over 5 times the memory usage when realising extracted data using Iris v2.2. The time taken to realise the data also increases. The cube prior to extraction contains data from 1850 to 2018. Is it possible that all the data (rather than just the year that had been extracted) are being realised when using Iris v2.4?

dask_issue.py code referenced above:

#!/usr/bin/env python
import iris

def main():
print('Loading the data:')
cube = load_cube()
print_cube_info(cube)

    print('Extracting a single year:')
    cube = extract_year(cube)
    print_cube_info(cube)

    print('Realising the data of the single year:')
    realise_data(cube)
    print_cube_info(cube)


@profile
def load_cube():
    filename = ('[path]/HadSST4/analysis/HadCRUT.5.0.0.0.SST.analysis.anomalies.?.nc')
    cube = iris.load_cube(filename)
    return cube


@profile
def extract_year(cube):
    year = 1999
    time_constraint = iris.Constraint(
        time=lambda cell: cell.point.year == year)
    cube = cube.extract(time_constraint)
    return cube


@profile
def realise_data(cube):
    cube.data


def print_cube_info(cube):
    tab = ' ' * 4
    print(f'{tab}Has lazy data: {cube.has_lazy_data()}')
    print(f'{tab}Chunksize: {cube.lazy_data().chunksize}')


if __name__ == '__main__':
main()

The text was updated successfully, but these errors were encountered:

DPeterK · 2021-04-26T10:27:35Z

Looks like NetCDF chunk handling was changed in Iris between v2.2 and v2.4, here: #3361. That might be a good place to start looking; it's possible that that change had an uncaught consequence, and this is what's reported here.

For the sake of the original reporter...

Is there a simple way to make make Iris use the NetCDF file's chunking when it loads NetCDF data into a Dask array?

Iris already does - this was introduced in #3131.

@alastair-gemmell have you checked what the behaviour is in Iris v3?

Something else that would be interesting to know is what NetCDF reports the dataset chunking as for these problem files. For example:

import netCDF4
ds = netCDF4.Dataset("/path/to/my-file.nc")
print(ds.variables["my-data-var-name"].chunking())

trexfeathers · 2021-04-26T10:31:45Z

Looks like NetCDF chunk handling was changed in Iris between v2.2 and v2.4, here: #3361. That might be a good place to start looking; it's possible that that change had an uncaught consequence, and this is what's reported here.

I did some cursory analysis on this problem and I can confirm it stems from the changes made in #3361 - if I substitute the v2.2.0 version of _lazy_data.py into the latest Iris the behaviour replicates the v2.2 behaviour described.

cpelley · 2021-10-11T15:26:38Z

I suspect we (IMPROVER) are also effected by this issue, see metoppv/improver#1579.
We are seeing an order of magnitude slowdown in some cases due to the unnecessary reading of data :(

cpelley · 2021-10-12T09:05:52Z

I suspect we (IMPROVER) are also effected by this issue...

Confirmed this issue is what is effecting us - substituting as_lazy_data as per #4107 (comment) results in considerable performance improvements.

trexfeathers · 2021-10-12T09:44:39Z

This is the most frustrating sort of software issue 😆

We made the change to fix some users' known performance issues. But many other users were silent before, because performance was acceptable, but now they are the ones experiencing performance issues.

For us to have been aware before making the change, we would have needed a very comprehensive benchmark suite, and unfortunately that wasn't in place at the time nor is there resource to develop it currently.

The potential solution I'm aware of that offers acceptable performance for everyone is much greater user configurability of chunking, but that doesn't appear to be a simple thing to deliver (#3333).

cpelley · 2021-10-12T12:27:00Z

Thanks for getting back to me on this @trexfeathers, appreciated.

The potential solution I'm aware of that offers acceptable performance for everyone is much greater user configurability of chunking, but that doesn't appear to be a simple thing to deliver (#3333).

Yes. I think this has always been my issue with dask - necessarily having to expose such low level parameters to end users :(
Seems dask expects you to know what you will be giving it (what form it takes) and what you will be doing with it ahead of time.

Is there no possibility (development path in dask) where dask arrays can be made to genuinely re-chunk the original dask array? (they are currently immutable)

FYI @benfitzpatrick, @BelligerG

trexfeathers · 2021-10-12T12:44:10Z

Is there no possibility (development path in dask) where dask arrays can be made to genuinely re-chunk the original dask array? (they are currently immutable)

@cpelley I had thought the root of the problem was the shape of re-chunking. Without knowing the 'correct' chunking shape, wouldn't the quoted idea still manifest the same problem?

pp-mo · 2022-02-09T12:50:40Z

Possibly some progress : #4572 aims to address #3333

@cpelley Is there no possibility (development path in dask) where dask arrays can be made to genuinely re-chunk the original dask array? (they are currently immutable)

I really don't think this is likely.
Chunking as a static ahead-of-time time decision is deeply wired in Dask. Because, I think, its concept of a graph requires that chunks behave like 'delayed' objects -- i.e. they are graph elements representing a black-box task with a single result. So, the chunks make the graph, and different chunks will mean a new graph.

trexfeathers · 2022-09-14T15:33:41Z

@ehogan & @cpmorice you were the ones that originally brought this to our attention, thanks very much for your vigilance!

We have an open internal support ticket for it, but this has since moved well beyond support - we diagnosed the precise cause, and as you can see here it's difficult to find a resolution that gives everyone acceptable performance.

I'm therefore closing the internal support ticket. I wanted to tag you both here so that you can monitor our latest thoughts on the issue, should you wish.

github-actions · 2024-01-28T00:14:59Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

trexfeathers · 2024-01-29T09:36:49Z

Gonna keep this open until the release of 3.8 (#5363), since #5588 should allow users to avoid this problem. Would be good to know if this works in action.

alastair-gemmell added Type: Bug New: Issue labels Apr 26, 2021

cpelley mentioned this issue Oct 11, 2021

mask slowdown issue metoppv/improver#1579

Closed

trexfeathers removed the New: Issue label Jun 15, 2022

github-actions bot added the Stale A stale issue/pull-request label Jan 28, 2024

trexfeathers removed the Stale A stale issue/pull-request label Jan 29, 2024

trexfeathers closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetCDF chunking using extra memory with Iris 2.4 #4107

NetCDF chunking using extra memory with Iris 2.4 #4107

alastair-gemmell commented Apr 26, 2021

DPeterK commented Apr 26, 2021

trexfeathers commented Apr 26, 2021

cpelley commented Oct 11, 2021

cpelley commented Oct 12, 2021

trexfeathers commented Oct 12, 2021

cpelley commented Oct 12, 2021

trexfeathers commented Oct 12, 2021 •

edited

Loading

pp-mo commented Feb 9, 2022

trexfeathers commented Sep 14, 2022

github-actions bot commented Jan 28, 2024

trexfeathers commented Jan 29, 2024

NetCDF chunking using extra memory with Iris 2.4 #4107

NetCDF chunking using extra memory with Iris 2.4 #4107

Comments

alastair-gemmell commented Apr 26, 2021

DPeterK commented Apr 26, 2021

trexfeathers commented Apr 26, 2021

cpelley commented Oct 11, 2021

cpelley commented Oct 12, 2021

trexfeathers commented Oct 12, 2021

cpelley commented Oct 12, 2021

trexfeathers commented Oct 12, 2021 • edited Loading

pp-mo commented Feb 9, 2022

trexfeathers commented Sep 14, 2022

github-actions bot commented Jan 28, 2024

trexfeathers commented Jan 29, 2024

trexfeathers commented Oct 12, 2021 •

edited

Loading