Test chunking (including Hypothesis tests) #57

TomNicholas · 2021-05-25T19:08:19Z

This builds on #49 by adding a pretty comprehensive set of tests of different chunking arrangements.

There are some normal tests, and some tests that use the Hypothesis library to try out all sorts of different chunk shapes (inspired by @rabernat 's similar test in the rechunker library).

There are some failures, but I think that they are because sometimes dask decides it knows better than me and changes the chunks:

  /home/tegn500/Documents/Work/Code/xhistogram/xhistogram/core.py:334: 
  PerformanceWarning: Increasing number of chunks by factor of 100
    bin_counts = dsa.blockwise(

I'm not quite sure how that causes those tests to fail though - I'm not even sure that behaviour is deterministic.

How do I turn this feature off @jrbourbeau @gjoseph92 ? Or alternatively how do I debug what happened tp cause those tests to fail?

TomNicholas · 2021-05-25T19:21:05Z

I think that they are because sometimes dask decides it knows better than me

One of the CI runs (ubuntu-latest, 3.9) has 10 PerformanceWarnings and 17 failures though, so there might also be other problems...

rabernat · 2021-05-26T14:05:45Z

Do you think the failures are implementation dependent? In other words, should I merge this branch with #49 and see if the tests fare any better? Or do you think there is a problem with the tests themselves?

TomNicholas · 2021-05-26T15:19:41Z

should I merge this branch with #49 and see if the tests fare any better?

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Even locally I don't get a consistent number of failures - I just ran the whole suite 3 times and got 17, then 19, then 18 failures. 😕

What is consistent is that every parametrization of the the test_2d_chunks_2d_hist test fails every time, as does the test_all_chunking_patterns_2d hypothesis test. So either those tests are wrong (I don't think they are...) or they indicate a real bug in the code.

I don't know what could be causing the non-deterministic behaviour apart from the dask PerformanceWarnings - I'm putting random data in the test fixtures but the numpy random seed does get set in one of the existing tests... (test_histogram_results_1d). I'll check whether we should be setting the seed at the test module level or something, but that's the only other reason for inconsistent behaviour I can think of. (It's not the hypothesis tests that are inconsistent either, so that's not the problem.)

rabernat · 2021-05-26T15:38:25Z

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Ah thanks, I had missed that 😄

Would it be worthwhile running the tests on the old, pre-#49 code?

TomNicholas · 2021-05-26T17:12:21Z

Would it be worthwhile running the tests on the old, pre-#49 code?

I just tried that in #58 (messed up a rebase before realising I actually needed to cherry-pick), but the tests still fail. Similar test behaviour - the same tests fail, though now a lot of them fail with

xhistogram/test/test_chunking.py:156: in test_all_chunking_patterns_dd_hist
    h = histogram(*[da for name, da in ds.data_vars.items()], bins=bins)
xhistogram/xarray.py:163: in histogram
    h_data, bins = _histogram(
xhistogram/core.py:339: in histogram
    bin_counts = _histogram_2d_vectorized(
xhistogram/core.py:163: in _histogram_2d_vectorized
    bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
xhistogram/duck_array_ops.py:24: in f
    return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:5: in ravel_multi_index
    ???
../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:1551: in __array_function__
    return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

multi_index = [dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>, dask.array<digitize, sha... chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>], dims = [9, 10, 11, 12], mode = 'raise', order = 'C'

    @wraps(np.ravel_multi_index)
    def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
>       return multi_index.map_blocks(
            _ravel_multi_index_kernel,
            dtype=np.intp,
            chunks=(multi_index.shape[-1],),
            drop_axis=0,
            func_kwargs=dict(dims=dims, mode=mode, order=order),
        )
E       AttributeError: 'list' object has no attribute 'map_blocks'

../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/routines.py:1763: AttributeError

rabernat · 2021-05-26T17:32:30Z

That error is #27 (comment)

TomNicholas · 2021-05-26T18:04:36Z

I've opened a dask issue to ask about the PerformanceWarnings.

That error is #27 (comment)

Hmm - I guess I could pin my local environment to dask=2021.02.0 to see if the tests pass then... (EDIT: that did not work - same errors)

jrbourbeau

I just commented over in the upstream dask issue. If you pass align_arrays=False to the blockwise call here, that will avoid the PerformanceWarning being raised (though there are still other test failures)

TomNicholas · 2021-05-26T20:24:05Z

Thanks @jrbourbeau , that silences the warning, but unfortunately doesn't fix the failures, and the failures are still inconsistent! 😭

jrbourbeau · 2021-05-26T21:00:57Z

I also see flaky tests when trying this PR out locally. FWIW the pytest-repeat plugin is a nice way to trigger a flaky test by running it several times. For example, pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20 (the --count=20 part is where pytest-repeat comes in) consistently triggers a failure for me locally.

Interestingly, the failure for this particular test has to do with the dataarray_factory utility (see the traceback below) and not the actual histogramming logic (or rather the test isn't getting to the histogram logic yet)

Full traceback:

(xhistogram) ➜  xhistogram git:(chunk_tests) ✗ pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20
===================================================================== test session starts ======================================================================
platform darwin -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /Users/james/miniforge3/envs/xhistogram/bin/python3.8
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/james/projects/xgcm/xhistogram/.hypothesis/examples')
rootdir: /Users/james/projects/xgcm/xhistogram, configfile: setup.cfg
plugins: hypothesis-6.13.6, repeat-0.9.1
collected 160 items

xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20] PASSED                                                                        [  0%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-2-20] PASSED                                                                        [  1%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-3-20] PASSED                                                                        [  1%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-4-20] PASSED                                                                        [  2%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-5-20] PASSED                                                                        [  3%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-6-20] PASSED                                                                        [  3%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-7-20] PASSED                                                                        [  4%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-8-20] PASSED                                                                        [  5%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-9-20] PASSED                                                                        [  5%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-10-20] PASSED                                                                       [  6%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-11-20] PASSED                                                                       [  6%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-12-20] PASSED                                                                       [  7%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-13-20] PASSED                                                                       [  8%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-14-20] PASSED                                                                       [  8%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-15-20] PASSED                                                                       [  9%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-16-20] PASSED                                                                       [ 10%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-17-20] PASSED                                                                       [ 10%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-18-20] PASSED                                                                       [ 11%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-19-20] PASSED                                                                       [ 11%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-20-20] PASSED                                                                       [ 12%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-1-20] PASSED                                                                        [ 13%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-2-20] PASSED                                                                        [ 13%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-3-20] PASSED                                                                        [ 14%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-4-20] PASSED                                                                        [ 15%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-5-20] PASSED                                                                        [ 15%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-6-20] PASSED                                                                        [ 16%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-7-20] PASSED                                                                        [ 16%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-8-20] PASSED                                                                        [ 17%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-9-20] PASSED                                                                        [ 18%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-10-20] PASSED                                                                       [ 18%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-11-20] PASSED                                                                       [ 19%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-12-20] PASSED                                                                       [ 20%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-13-20] PASSED                                                                       [ 20%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-14-20] PASSED                                                                       [ 21%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-15-20] PASSED                                                                       [ 21%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-16-20] PASSED                                                                       [ 22%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-17-20] PASSED                                                                       [ 23%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-18-20] PASSED                                                                       [ 23%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-19-20] PASSED                                                                       [ 24%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-20-20] PASSED                                                                       [ 25%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-1-20] PASSED                                                                        [ 25%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-2-20] PASSED                                                                        [ 26%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-3-20] PASSED                                                                        [ 26%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-4-20] PASSED                                                                        [ 27%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-5-20] PASSED                                                                        [ 28%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-6-20] PASSED                                                                        [ 28%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-7-20] PASSED                                                                        [ 29%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-8-20] PASSED                                                                        [ 30%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-9-20] PASSED                                                                        [ 30%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-10-20] PASSED                                                                       [ 31%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-11-20] PASSED                                                                       [ 31%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-12-20] PASSED                                                                       [ 32%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-13-20] PASSED                                                                       [ 33%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-14-20] PASSED                                                                       [ 33%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-15-20] PASSED                                                                       [ 34%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-16-20] PASSED                                                                       [ 35%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-17-20] PASSED                                                                       [ 35%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-18-20] PASSED                                                                       [ 36%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-19-20] PASSED                                                                       [ 36%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-20-20] PASSED                                                                       [ 37%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-1-20] PASSED                                                                       [ 38%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-2-20] PASSED                                                                       [ 38%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-3-20] PASSED                                                                       [ 39%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-4-20] PASSED                                                                       [ 40%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-5-20] PASSED                                                                       [ 40%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-6-20] PASSED                                                                       [ 41%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-7-20] PASSED                                                                       [ 41%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-8-20] PASSED                                                                       [ 42%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-9-20] PASSED                                                                       [ 43%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-10-20] PASSED                                                                      [ 43%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-11-20] PASSED                                                                      [ 44%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-12-20] PASSED                                                                      [ 45%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-13-20] PASSED                                                                      [ 45%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-14-20] PASSED                                                                      [ 46%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-15-20] PASSED                                                                      [ 46%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-16-20] PASSED                                                                      [ 47%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-17-20] PASSED                                                                      [ 48%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-18-20] PASSED                                                                      [ 48%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-19-20] PASSED                                                                      [ 49%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-20-20] PASSED                                                                      [ 50%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20] PASSED                                                                        [ 50%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-2-20] PASSED                                                                        [ 51%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-3-20] PASSED                                                                        [ 51%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-4-20] PASSED                                                                        [ 52%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-5-20] PASSED                                                                        [ 53%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-6-20] PASSED                                                                        [ 53%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-7-20] PASSED                                                                        [ 54%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-8-20] PASSED                                                                        [ 55%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-9-20] PASSED                                                                        [ 55%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-10-20] PASSED                                                                       [ 56%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-11-20] PASSED                                                                       [ 56%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20] PASSED                                                                       [ 57%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-13-20] PASSED                                                                       [ 58%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20] PASSED                                                                       [ 58%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20] PASSED                                                                       [ 59%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20] PASSED                                                                       [ 60%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20] PASSED                                                                       [ 60%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20] PASSED                                                                       [ 61%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-19-20] PASSED                                                                       [ 61%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20] PASSED                                                                       [ 62%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-1-20] PASSED                                                                        [ 63%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-2-20] PASSED                                                                        [ 63%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-3-20] PASSED                                                                        [ 64%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-4-20] PASSED                                                                        [ 65%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-5-20] PASSED                                                                        [ 65%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-6-20] PASSED                                                                        [ 66%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20] PASSED                                                                        [ 66%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-8-20] PASSED                                                                        [ 67%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-9-20] PASSED                                                                        [ 68%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-10-20] PASSED                                                                       [ 68%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-11-20] PASSED                                                                       [ 69%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-12-20] PASSED                                                                       [ 70%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-13-20] PASSED                                                                       [ 70%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-14-20] PASSED                                                                       [ 71%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-15-20] PASSED                                                                       [ 71%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-16-20] PASSED                                                                       [ 72%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-17-20] PASSED                                                                       [ 73%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-18-20] PASSED                                                                       [ 73%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-19-20] PASSED                                                                       [ 74%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-20-20] PASSED                                                                       [ 75%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-1-20] PASSED                                                                        [ 75%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-2-20] PASSED                                                                        [ 76%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-3-20] PASSED                                                                        [ 76%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-4-20] PASSED                                                                        [ 77%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-5-20] PASSED                                                                        [ 78%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-6-20] PASSED                                                                        [ 78%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-7-20] PASSED                                                                        [ 79%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-8-20] PASSED                                                                        [ 80%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-9-20] PASSED                                                                        [ 80%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-10-20] PASSED                                                                       [ 81%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-11-20] PASSED                                                                       [ 81%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-12-20] PASSED                                                                       [ 82%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-13-20] PASSED                                                                       [ 83%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-14-20] PASSED                                                                       [ 83%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-15-20] PASSED                                                                       [ 84%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-16-20] PASSED                                                                       [ 85%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] FAILED                                                                       [ 85%]

=========================================================================== FAILURES ===========================================================================
__________________________________________________________ test_fixed_size_1d_chunks[shape1-3-17-20] ___________________________________________________________

dataarray_factory = <function dataarray_factory.<locals>._dataarray_factory at 0x7f9969444040>, chunksize = 3, shape = (10, 4)

    @pytest.mark.parametrize("chunksize", [1, 2, 3, 10])
    @pytest.mark.parametrize("shape", [(10,), (10,4)])
    def test_fixed_size_1d_chunks(dataarray_factory, chunksize, shape):

>       data_a = dataarray_factory(shape).chunk((chunksize,))

xhistogram/test/test_chunking.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:1057: in chunk
    ds = self._to_temp_dataset().chunk(
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:488: in _to_temp_dataset
    return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:540: in _to_dataset_whole
    dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes)
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:1008: in _construct_direct
    dims = calculate_dimensions(variables)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

variables = {<this-array>: <xarray.Variable (l: 4)>
array([[ 0.31830519,  1.19267377, -0.36415368,  1.65018558],
       [-0.397767...   [ 0.18824782, -1.29960002,  0.54894081,  0.75569833],
       [ 0.72254191, -0.54123615, -0.24358458,  0.91154796]])}

    def calculate_dimensions(variables: Mapping[Hashable, Variable]) -> Dict[Hashable, int]:
        """Calculate the dimensions corresponding to a set of variables.

        Returns dictionary mapping from dimension names to sizes. Raises ValueError
        if any of the dimension sizes conflict.
        """
        dims: Dict[Hashable, int] = {}
        last_used = {}
        scalar_vars = {k for k, v in variables.items() if not v.dims}
        for k, var in variables.items():
            for dim, size in zip(var.dims, var.shape):
                if dim in scalar_vars:
                    raise ValueError(
                        "dimension %r already exists as a scalar variable" % dim
                    )
                if dim not in dims:
                    dims[dim] = size
                    last_used[dim] = k
                elif dims[dim] != size:
>                   raise ValueError(
                        "conflicting sizes for dimension %r: "
                        "length %s on %r and length %s on %r"
                        % (dim, size, k, dims[dim], last_used[dim])
                    )
E                   ValueError: conflicting sizes for dimension 'l': length 4 on <this-array> and length 10 on <this-array>

../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:206: ValueError
===================================================================== slowest 10 durations =====================================================================
0.02s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20]
=================================================================== short test summary info ====================================================================
FAILED xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] - ValueError: conflicting sizes for dimension 'l': length 4 on <this-array...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================ 1 failed, 136 passed in 2.59s =================================================================

gjoseph92 · 2021-05-26T21:02:45Z

See dask/dask#7711 (comment) for more, but I don't think align_arrays=False is the right thing to do here (without adding other rechunking logic to align the input arrays). I think eventually, it could be a good idea to pick the chunk pattern ourselves (so that one input array with small chunks doesn't split all the others into tiny pieces), but that should only affect performance, not correctness.

From a quick glance at the failures, it seems like there are generally 2 types of errors:

cases where the resulting histogram is different (in particular, it contains more zeros / the counts are much lower than expected)
errors inside Dataset.chunk like conflicting sizes for dimension 'n': length 12 on <this-array> and length 10 on {'n': <this-array>}.

I haven't looked carefully at these tests yet, but I can try to take a closer look soon. One thing I noticed is that:

dims = [random.choice(string.ascii_lowercase) for ax in shape]

does allow for the potential of repeated dimension names in the same array.

TomNicholas · 2021-05-26T21:16:59Z

Thanks both. This is very helpful.

I don't think align_arrays=False is the right thing to do here

Makes sense - I'll undo that now.

Looks like my dataset_factory fixture is causing at least some of the test failures.

allow for the potential of repeated dimension names

Good point! I've pushed a commit to stop that happening, and everything seems to pass locally now! 🍾

codecov · 2021-05-26T21:18:51Z

Codecov Report

Merging #57 (6fc4161) into master (9c7c722) will increase coverage by 15.37%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master      #57       +/-   ##
===========================================
+ Coverage   81.81%   97.18%   +15.37%     
===========================================
  Files           3        2        -1     
  Lines         242      249        +7     
  Branches       68       71        +3     
===========================================
+ Hits          198      242       +44     
+ Misses         37        5       -32     
+ Partials        7        2        -5

Impacted Files	Coverage Δ
xhistogram/duck_array_ops.py
xhistogram/xarray.py	`96.42% <0.00%> (+4.90%)`	⬆️
xhistogram/core.py	`97.40% <0.00%> (+18.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c7c722...6fc4161. Read the comment docs.

rabernat · 2021-05-26T21:39:51Z

Coverage 96.61% <0.00%> (+5.08%) 😄 😄 😄

jrbourbeau

Nice! It looks like there are also some linting errors. Could you run the pre-commit hooks and commit the changes so CI passes?

TomNicholas · 2021-05-26T21:55:05Z

Yep, I've just fixed them. (flake8 didn't like my fixtures though so I did just have to stick a #noqa on the whole test_chunking file)

TomNicholas · 2021-05-26T21:58:01Z

I don't actually think the tests are complete yet though - there should also be tests targeting dask arrays of weights and bins.

rabernat · 2021-05-26T22:09:53Z

dask arrays of weights and bins.

Weights yes. Bins no. I think we want to always require bins to be in-memory.

xhistogram/test/test_chunking.py

xhistogram/test/fixtures.py

gjoseph92

Am I missing something, or are the Hypothesis tests gone now?

xhistogram/test/fixtures.py

xhistogram/test/test_chunking.py

Co-authored-by: James Bourbeau <[email protected]>

TomNicholas · 2021-05-27T15:27:37Z

Am I missing something, or are the Hypothesis tests gone now?

@gjoseph92 I moved them to another file to avoid a linting error with the hypothesis import, but forgot to git add that file before committing at the end of the day yesterday!

Thanks everyone for their comments - I think I've addressed them all. I've also turned the fixtures into normal functions, and finally I added a test for chunked weights.

One question is whether it would be a good idea to have a test for input arrays with unaligned chunks?

gjoseph92 · 2021-05-27T16:09:06Z

@TomNicholas I definitely think you should test with unaligned chunks, in both the inputs and the weights.

xhistogram/test/test_chunking.py

Co-authored-by: James Bourbeau <[email protected]>

xhistogram/test/fixtures.py

xhistogram/test/test_chunking.py

gjoseph92 · 2021-05-27T16:34:24Z

xhistogram/test/test_chunking_hypotheses.py

+
+    # TODO mark as slow?
+    @pytest.mark.parametrize("n_vars", [1, 2, 3, 4])
+    @given(chunk_shapes(n_dim=2, max_arr_len=7))


Might be nice to also test dims= and weights= with Hypothesis. It can be nice to throw all the possible axes of variation into a Hypothesis test as an easy way to check all possible cases, without having to write as many individual tests.

What exactly do you mean? If I make the test_all_chunking_patterns_dd_hist accept a dims (or reduce_axes) argument then I also need a np.histogramdd function that can handle that generality. Is there a quick way to achieve that in the test? Possibly with np.apply_over_axes?

For the weights then I guess I could pass weights and allow the data and weights to have different chunking patterns - is that what you meant?

Yeah, I suppose it's trickier to test that, since you'd need something to do N-D histograms (xhistogram) to verify the results.

I suppose you could just compare against histogram of the computed (NumPy) arrays, and make it purely a test of the dask functionality. If we're confident the NumPy code paths are well-tested, that seems reasonable to me.

But it was just a thought; I think the tests here are already quite good, so fine to leave it as-is too.

Co-authored-by: Gabe Joseph <[email protected]>

…parately

dougiesquire

Thanks for doing this @TomNicholas. Looks good to me. It's great that these test include testing of multiple dask arguments, which was previously untested and would have caught #48.

I see that this PR is to merge into xgcm:refactor-histogram-map-blocks which was merged with master in #49. What's the right way to merge this PR into master? I'm a bit of a gitwit.

TomNicholas · 2021-06-22T04:12:12Z

gitwit

I'm stealing that haha

What's the right way to merge this PR into master?

Good question - apparently you used to have to make a new local branch and push that as a new PR, but now github allows me (or you probably as a maintainer) to edit the target branch directly.

TomNicholas added 6 commits May 25, 2021 14:46

new fixtures

7879749

new tests, including hypothesis tests

588451e

clean up test class structure

1716d57

pass strategy to final monster test

731f6f1

add Hypothesis library to CI environments

ae91eb5

removed uneccessary space

6f4359f

TomNicholas mentioned this pull request May 25, 2021

Refactor histogram to use blockwise #49

Merged

TomNicholas mentioned this pull request May 26, 2021

Chunking tests applied pre-blockwise #58

Closed

TomNicholas mentioned this pull request May 26, 2021

Option to prevent automatic rechunking? dask/dask#7711

Closed

jrbourbeau reviewed May 26, 2021

View reviewed changes

silence PerformanceWarning

f652bfd

TomNicholas added 3 commits May 26, 2021 17:09

reverted align_arrays=False

fd69d10

eliminated possibility of repeated dims/vars in fixtures

1379ac3

removed non-redundant imports

2fd7192

jrbourbeau reviewed May 26, 2021

View reviewed changes

fixed linting

ce32da4

jrbourbeau reviewed May 26, 2021

View reviewed changes

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved

xhistogram/test/fixtures.py Outdated Show resolved Hide resolved

xhistogram/test/fixtures.py Outdated Show resolved Hide resolved

gjoseph92 reviewed May 26, 2021

View reviewed changes

xhistogram/test/fixtures.py Outdated Show resolved Hide resolved

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved

TomNicholas and others added 6 commits May 27, 2021 10:18

reinstated hpothesis tests in their own file

7d248d1

remove rogue print statement

68df0f2

Co-authored-by: James Bourbeau <[email protected]>

standardized all n_* variable names

0ccce21

generalised fixtures to create n-dimensional outputs

f04edcf

demoted fixtures to functions

2d022a2

test chunked weights

9979379

linting

f0baec4

tests for unaligned chunks

932c0ee

jrbourbeau reviewed May 27, 2021

View reviewed changes

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved

TomNicholas and others added 2 commits May 27, 2021 13:16

Removed more rogue print statements

6ab521a

Co-authored-by: James Bourbeau <[email protected]>

and more

c957a6d

Co-authored-by: James Bourbeau <[email protected]>

gjoseph92 reviewed May 27, 2021

View reviewed changes

TomNicholas and others added 3 commits May 27, 2021 15:19

un-raveled where unnecessary

3cc2c11

Generalise shape of example dataset to ND

ef11c86

Co-authored-by: Gabe Joseph <[email protected]>

parameterized broadcast test to test reducing over both dimensions se…

5b173a6

…parately

TomNicholas mentioned this pull request Jun 21, 2021

Release v0.3? #64

Closed

dougiesquire approved these changes Jun 22, 2021

View reviewed changes

TomNicholas changed the base branch from refactor-histogram-map-blocks to master June 22, 2021 04:12

Trigger tests

6fc4161

TomNicholas merged commit 3144ebd into xgcm:master Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test chunking (including Hypothesis tests) #57

Test chunking (including Hypothesis tests) #57

TomNicholas commented May 25, 2021 •

edited

Loading

TomNicholas commented May 25, 2021

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 •

edited

Loading

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 •

edited

Loading

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 •

edited

Loading

jrbourbeau left a comment

TomNicholas commented May 26, 2021

jrbourbeau commented May 26, 2021

gjoseph92 commented May 26, 2021

TomNicholas commented May 26, 2021

codecov bot commented May 26, 2021 •

edited

Loading

rabernat commented May 26, 2021

jrbourbeau left a comment

TomNicholas commented May 26, 2021

TomNicholas commented May 26, 2021

rabernat commented May 26, 2021

gjoseph92 left a comment

TomNicholas commented May 27, 2021

gjoseph92 commented May 27, 2021

gjoseph92 May 27, 2021

TomNicholas May 27, 2021 •

edited

Loading

gjoseph92 May 27, 2021

dougiesquire left a comment

TomNicholas commented Jun 22, 2021

Test chunking (including Hypothesis tests) #57

Test chunking (including Hypothesis tests) #57

Conversation

TomNicholas commented May 25, 2021 • edited Loading

TomNicholas commented May 25, 2021

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 • edited Loading

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 • edited Loading

rabernat commented May 26, 2021

TomNicholas commented May 26, 2021 • edited Loading

jrbourbeau left a comment

Choose a reason for hiding this comment

TomNicholas commented May 26, 2021

jrbourbeau commented May 26, 2021

gjoseph92 commented May 26, 2021

TomNicholas commented May 26, 2021

codecov bot commented May 26, 2021 • edited Loading

Codecov Report

rabernat commented May 26, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment

TomNicholas commented May 26, 2021

TomNicholas commented May 26, 2021

rabernat commented May 26, 2021

gjoseph92 left a comment

Choose a reason for hiding this comment

TomNicholas commented May 27, 2021

gjoseph92 commented May 27, 2021

gjoseph92 May 27, 2021

Choose a reason for hiding this comment

TomNicholas May 27, 2021 • edited Loading

Choose a reason for hiding this comment

gjoseph92 May 27, 2021

Choose a reason for hiding this comment

dougiesquire left a comment

Choose a reason for hiding this comment

TomNicholas commented Jun 22, 2021

TomNicholas commented May 25, 2021 •

edited

Loading

TomNicholas commented May 26, 2021 •

edited

Loading

TomNicholas commented May 26, 2021 •

edited

Loading

TomNicholas commented May 26, 2021 •

edited

Loading

codecov bot commented May 26, 2021 •

edited

Loading

TomNicholas May 27, 2021 •

edited

Loading