Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomScheduler does not warn about early computes via compute_meta #2615

Open
gerritholl opened this issue Oct 24, 2023 · 3 comments
Open
Labels

Comments

@gerritholl
Copy link
Collaborator

Describe the bug

No warning from the CustomScheduler utility in satpy.utils.tests reaches the user/developer when the (accidental) computation occurs via the dask compute_meta function.

To Reproduce

import dask.config
import dask.array as da
import numpy as np
import xarray as xr
from satpy.tests.utils import CustomScheduler

cs = CustomScheduler(max_computes=0)
dabl = xr.DataArray(da.array([[True, True], [False, True]]), dims=("y", "x"))
dain = xr.DataArray(da.array([[0, 1], [2, 3]]), dims=("y", "x"))
with dask.config.set(scheduler=cs):
    for i in range(5):
        da.where(dabl, dain, np.nan)
    print(cs.total_computes)

Expected behavior

I expect either that cs.total_computes equals zero, or that I'm told as soon as it becomes larger than max_computes.

Actual results

5

Environment Info:

  • OS: Linux
  • Satpy Version: main

Additional context

This isn't really satpys fault and I'm not sure what satpy could do about it. It does raise a RuntimeError, but this one is swallowed by dask. We can tell this by setting a breakpoint and inspecting the stack:

  File "/data/gholl/checkouts/protocode/mwe/custom-scheduler-doesnt-catch.py", line 12, in <module>
    da.where(dabl, dain, np.nan)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/routines.py", line 2107, in where
    return elemwise(np.where, condition, x, y)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 4831, in elemwise
    result = blockwise(
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/blockwise.py", line 286, in blockwise
    meta = compute_meta(func, dtype, *args[::2], **kwargs)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/utils.py", line 140, in compute_meta
    meta = func(*args_meta, **kwargs_meta)
  File "<__array_function__ internals>", line 200, in where
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/common.py", line 165, in __array__
    return np.asarray(self.values, dtype=dtype)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/dataarray.py", line 759, in values
    return self.variable.values
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 616, in values
    return _as_array_or_item(self._data)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/xarray/core/variable.py", line 309, in _as_array_or_item
    data = np.asarray(data)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/array/core.py", line 1700, in __array__
    x = self.compute()
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/data/gholl/mambaforge/envs/py311/lib/python3.11/site-packages/dask/base.py", line 628, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/data/gholl/checkouts/satpy/satpy/tests/utils.py", line 288, in __call__
    raise RuntimeError("Too many dask computations were scheduled: "

however, in dask.array.utils.compute_meta we have:

            try:
                ...
                meta = func(*args_meta, **kwargs_meta)
                ...
            except Exception:
                return None

which is why the user never sees the RuntimeError raised in the CustomScheduler, and the early compute (such as in #2614) goes unnoticed.

I'm not sure what type of workaround could work to still inform the user.

@djhoese
Copy link
Member

djhoese commented Oct 24, 2023

Wow that except Exception is real bad in dask. I don't see a reasonable way for us to return any other normal exception and not get it swallowed up by that.

@djhoese
Copy link
Member

djhoese commented Oct 24, 2023

Dask issue: dask/dask#10595

@pnuu
Copy link
Member

pnuu commented Nov 1, 2023

One workaround in tests seems to be to do an arr.compute() and use max_computes=1. In #2623, before the fix, the scheduler reported four computations when triggered. So cause one intended computation to reveal the hidden ones. Maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants