Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DayNightCompositor triggers early dask computation #2614

Closed
gerritholl opened this issue Oct 24, 2023 · 3 comments · Fixed by #2617
Closed

DayNightCompositor triggers early dask computation #2614

gerritholl opened this issue Oct 24, 2023 · 3 comments · Fixed by #2617

Comments

@gerritholl
Copy link
Collaborator

gerritholl commented Oct 24, 2023

Describe the bug

Loading composites that use the DayNightCompositor trigger an early dask computation.

To Reproduce

import dask.config
from satpy.tests.utils import CustomScheduler
from satpy import Scene
from glob import glob
seviri_files = glob("/media/nas/x21308/scratch/SEVIRI/202103300900/H-000*")
sc = Scene(filenames={"seviri_l1b_hrit": seviri_files})
with dask.config.set(scheduler=CustomScheduler(max_computes=1)):
    sc.load(["natural_color_with_night_ir"])
    ls = sc.resample("eurol")
    ls.compute()

Expected behavior

I expect that exactly one computation occurs, in ls.compute(). Therefore, I expect no output.

Actual results

In reality, this fails with:

Traceback (most recent call last):
  File "/data/gholl/checkouts/protocode/mwe/day-night-compute.py", line 10, in <module>
    ls.compute()
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1294, in compute
    datasets = compute(*(new_scn._datasets.values()), **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/gholl/checkouts/satpy/satpy/tests/utils.py", line 288, in __call__
    raise RuntimeError("Too many dask computations were scheduled: "
RuntimeError: Too many dask computations were scheduled: 3

NB: it says there were three computations, even though one would expect it should fail as soon as it tries to calculate number two — see "Additional context" for comments on why this happens.

Environment Info:

  • OS: Linux
  • Satpy Version: main

Additional context

The early computes that happen before ls.compute() are not easily seen, because the ValueError raised in satpy.tests.utils.CustomScheduler is swallowed by dask in compute_meta. See #2615 for more details, but briefly:

https://github.com/dask/dask/blob/2023.9.3/dask/array/utils.py#L95-L96

When we run with max_computes=0 and set a breakpoint in satpy/tests/utils.py:288, we can investigate the traceback and why compute_meta gets called:

  File "/data/gholl/checkouts/protocode/mwe/day-night-compute.py", line 9, in <module>
    ls = sc.resample("eurol")
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 983, in resample
    new_scn.generate_possible_composites(unload)
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1508, in generate_possible_composites
    keepables = self._generate_composites_from_loaded_datasets()
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1527, in _generate_composites_from_loaded_datasets
    return self._generate_composites_nodes_from_loaded_datasets(needed_comp_nodes)
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1533, in _generate_composites_nodes_from_loaded_datasets
    self._generate_composite(node, keepables)
  File "/data/gholl/checkouts/satpy/satpy/scene.py", line 1591, in _generate_composite
    composite = compositor(prereq_datasets,
  File "/data/gholl/checkouts/satpy/satpy/composites/__init__.py", line 714, in __call__
    weights = self._mask_weights_with_data(weights, day_data, night_data)
  File "/data/gholl/checkouts/satpy/satpy/composites/__init__.py", line 789, in _mask_weights_with_data
    return da.where(mask, weights, np.nan)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/routines.py", line 2107, in where
    return elemwise(np.where, condition, x, y)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 4831, in elemwise
    result = blockwise(
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/blockwise.py", line 286, in blockwise
    meta = compute_meta(func, dtype, *args[::2], **kwargs)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/utils.py", line 140, in compute_meta
    meta = func(*args_meta, **kwargs_meta)
  File "<__array_function__ internals>", line 200, in where
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/common.py", line 165, in __array__
    return np.asarray(self.values, dtype=dtype)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/dataarray.py", line 759, in values
    return self.variable.values
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 616, in values
    return _as_array_or_item(self._data)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 309, in _as_array_or_item
    data = np.asarray(data)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 1700, in __array__
    x = self.compute()
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/data/gholl/checkouts/satpy/satpy/tests/utils.py", line 288, in __call__
    raise RuntimeError("Too many dask computations were scheduled: "

The da.where call in satpy.composites passes xarray types to dask:

In [3]: print(mask)
<xarray.DataArray (y: 2048, x: 2560)>
dask.array<or_, shape=(2048, 2560), dtype=bool, chunksize=(2048, 2560), chunktype=numpy.ndarray>
Coordinates:
  * y            (y) float64 -1.502e+06 -1.504e+06 ... -7.64e+06 -7.642e+06
  * x            (x) float64 -3.778e+06 -3.776e+06 ... 3.896e+06 3.898e+06
    crs          object PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unknow...
    bands        <U1 'R'
    spatial_ref  int64 0

In [4]: print(weights)
<xarray.DataArray '_cos_zen_ndarray-2b25ca5da362f711551dec6e02deccd8' (y: 2048,
                                                                       x: 2560)>
dask.array<clip, shape=(2048, 2560), dtype=float64, chunksize=(2048, 2560), chunktype=numpy.ndarray>
Dimensions without coordinates: y, x

I'm still not sure why this triggers a computation of metadata, but perhaps it could be avoided by passing dask array or using xarray.where.

@djhoese
Copy link
Member

djhoese commented Oct 24, 2023

The key point of confusion here is that compute_meta is NOT a Satpy function and it is not talking about METADATA. It is talking about a dask "meta array". The meta array in dask is the object that describes the computed version of the dask array. So in normal dask usage this would be something like np.array((), dtype=np.float64). This tells dask the object type it will get when it calls .compute() and also the dtype.

My guess, which is not extremely well founded, is that dask is being given DataArrays and it says "these aren't dask arrays that contain an internal meta representation, so I'll call the low-level function with fake data and see what the low-level function produces". I'm not entirely sure though, but I can't imagine dask is doing the meta checking on normal dask arrays as that would be a huge waste of time. I'll see if I can test this.

@pnuu
Copy link
Member

pnuu commented Oct 25, 2023

I replaced the da.where() calls with foo.where() calls and the example script no longer trigger the compute exceptions. Will PR in a bit.

@pnuu
Copy link
Member

pnuu commented Oct 25, 2023

Draft PR going here, checking the failing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants