Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test chunking (including Hypothesis tests) #57

Merged
merged 25 commits into from
Jun 22, 2021

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 25, 2021

This builds on #49 by adding a pretty comprehensive set of tests of different chunking arrangements.

There are some normal tests, and some tests that use the Hypothesis library to try out all sorts of different chunk shapes (inspired by @rabernat 's similar test in the rechunker library).

There are some failures, but I think that they are because sometimes dask decides it knows better than me and changes the chunks:

  /home/tegn500/Documents/Work/Code/xhistogram/xhistogram/core.py:334: 
  PerformanceWarning: Increasing number of chunks by factor of 100
    bin_counts = dsa.blockwise(

I'm not quite sure how that causes those tests to fail though - I'm not even sure that behaviour is deterministic.

How do I turn this feature off @jrbourbeau @gjoseph92 ? Or alternatively how do I debug what happened tp cause those tests to fail?

@TomNicholas
Copy link
Member Author

I think that they are because sometimes dask decides it knows better than me

One of the CI runs (ubuntu-latest, 3.9) has 10 PerformanceWarnings and 17 failures though, so there might also be other problems...

@rabernat
Copy link
Contributor

Do you think the failures are implementation dependent? In other words, should I merge this branch with #49 and see if the tests fare any better? Or do you think there is a problem with the tests themselves?

@TomNicholas
Copy link
Member Author

TomNicholas commented May 26, 2021

should I merge this branch with #49 and see if the tests fare any better?

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Even locally I don't get a consistent number of failures - I just ran the whole suite 3 times and got 17, then 19, then 18 failures. 😕

What is consistent is that every parametrization of the the test_2d_chunks_2d_hist test fails every time, as does the test_all_chunking_patterns_2d hypothesis test. So either those tests are wrong (I don't think they are...) or they indicate a real bug in the code.

I don't know what could be causing the non-deterministic behaviour apart from the dask PerformanceWarnings - I'm putting random data in the test fixtures but the numpy random seed does get set in one of the existing tests... (test_histogram_results_1d). I'll check whether we should be setting the seed at the test module level or something, but that's the only other reason for inconsistent behaviour I can think of. (It's not the hypothesis tests that are inconsistent either, so that's not the problem.)

@rabernat
Copy link
Contributor

This branch builds atop #49 so if you merge them you will only end up with exactly the same code that's here.

Ah thanks, I had missed that 😄

Would it be worthwhile running the tests on the old, pre-#49 code?

@TomNicholas
Copy link
Member Author

TomNicholas commented May 26, 2021

Would it be worthwhile running the tests on the old, pre-#49 code?

I just tried that in #58 (messed up a rebase before realising I actually needed to cherry-pick), but the tests still fail. Similar test behaviour - the same tests fail, though now a lot of them fail with

xhistogram/test/test_chunking.py:156: in test_all_chunking_patterns_dd_hist
    h = histogram(*[da for name, da in ds.data_vars.items()], bins=bins)
xhistogram/xarray.py:163: in histogram
    h_data, bins = _histogram(
xhistogram/core.py:339: in histogram
    bin_counts = _histogram_2d_vectorized(
xhistogram/core.py:163: in _histogram_2d_vectorized
    bin_indices = ravel_multi_index(each_bin_indices, hist_shapes)
xhistogram/duck_array_ops.py:24: in f
    return getattr(module, name)(*args, **kwargs)
<__array_function__ internals>:5: in ravel_multi_index
    ???
../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:1551: in __array_function__
    return da_func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

multi_index = [dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>, dask.array<digitize, sha... chunktype=numpy.ndarray>, dask.array<digitize, shape=(1, 72), dtype=int64, chunksize=(1, 1), chunktype=numpy.ndarray>], dims = [9, 10, 11, 12], mode = 'raise', order = 'C'

    @wraps(np.ravel_multi_index)
    def ravel_multi_index(multi_index, dims, mode="raise", order="C"):
>       return multi_index.map_blocks(
            _ravel_multi_index_kernel,
            dtype=np.intp,
            chunks=(multi_index.shape[-1],),
            drop_axis=0,
            func_kwargs=dict(dims=dims, mode=mode, order=order),
        )
E       AttributeError: 'list' object has no attribute 'map_blocks'

../../../../miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/routines.py:1763: AttributeError

@rabernat
Copy link
Contributor

That error is #27 (comment)

@TomNicholas
Copy link
Member Author

TomNicholas commented May 26, 2021

I've opened a dask issue to ask about the PerformanceWarnings.

That error is #27 (comment)

Hmm - I guess I could pin my local environment to dask=2021.02.0 to see if the tests pass then... (EDIT: that did not work - same errors)

Copy link
Contributor

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just commented over in the upstream dask issue. If you pass align_arrays=False to the blockwise call here, that will avoid the PerformanceWarning being raised (though there are still other test failures)

@TomNicholas
Copy link
Member Author

Thanks @jrbourbeau , that silences the warning, but unfortunately doesn't fix the failures, and the failures are still inconsistent! 😭

@jrbourbeau
Copy link
Contributor

I also see flaky tests when trying this PR out locally. FWIW the pytest-repeat plugin is a nice way to trigger a flaky test by running it several times. For example, pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20 (the --count=20 part is where pytest-repeat comes in) consistently triggers a failure for me locally.

Interestingly, the failure for this particular test has to do with the dataarray_factory utility (see the traceback below) and not the actual histogramming logic (or rather the test isn't getting to the histogram logic yet)

Full traceback:
(xhistogram) ➜  xhistogram git:(chunk_tests) ✗ pytest -v xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks -x --count=20
===================================================================== test session starts ======================================================================
platform darwin -- Python 3.8.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /Users/james/miniforge3/envs/xhistogram/bin/python3.8
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/james/projects/xgcm/xhistogram/.hypothesis/examples')
rootdir: /Users/james/projects/xgcm/xhistogram, configfile: setup.cfg
plugins: hypothesis-6.13.6, repeat-0.9.1
collected 160 items

xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20] PASSED                                                                        [  0%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-2-20] PASSED                                                                        [  1%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-3-20] PASSED                                                                        [  1%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-4-20] PASSED                                                                        [  2%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-5-20] PASSED                                                                        [  3%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-6-20] PASSED                                                                        [  3%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-7-20] PASSED                                                                        [  4%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-8-20] PASSED                                                                        [  5%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-9-20] PASSED                                                                        [  5%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-10-20] PASSED                                                                       [  6%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-11-20] PASSED                                                                       [  6%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-12-20] PASSED                                                                       [  7%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-13-20] PASSED                                                                       [  8%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-14-20] PASSED                                                                       [  8%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-15-20] PASSED                                                                       [  9%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-16-20] PASSED                                                                       [ 10%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-17-20] PASSED                                                                       [ 10%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-18-20] PASSED                                                                       [ 11%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-19-20] PASSED                                                                       [ 11%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-20-20] PASSED                                                                       [ 12%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-1-20] PASSED                                                                        [ 13%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-2-20] PASSED                                                                        [ 13%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-3-20] PASSED                                                                        [ 14%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-4-20] PASSED                                                                        [ 15%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-5-20] PASSED                                                                        [ 15%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-6-20] PASSED                                                                        [ 16%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-7-20] PASSED                                                                        [ 16%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-8-20] PASSED                                                                        [ 17%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-9-20] PASSED                                                                        [ 18%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-10-20] PASSED                                                                       [ 18%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-11-20] PASSED                                                                       [ 19%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-12-20] PASSED                                                                       [ 20%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-13-20] PASSED                                                                       [ 20%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-14-20] PASSED                                                                       [ 21%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-15-20] PASSED                                                                       [ 21%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-16-20] PASSED                                                                       [ 22%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-17-20] PASSED                                                                       [ 23%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-18-20] PASSED                                                                       [ 23%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-19-20] PASSED                                                                       [ 24%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-2-20-20] PASSED                                                                       [ 25%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-1-20] PASSED                                                                        [ 25%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-2-20] PASSED                                                                        [ 26%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-3-20] PASSED                                                                        [ 26%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-4-20] PASSED                                                                        [ 27%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-5-20] PASSED                                                                        [ 28%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-6-20] PASSED                                                                        [ 28%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-7-20] PASSED                                                                        [ 29%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-8-20] PASSED                                                                        [ 30%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-9-20] PASSED                                                                        [ 30%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-10-20] PASSED                                                                       [ 31%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-11-20] PASSED                                                                       [ 31%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-12-20] PASSED                                                                       [ 32%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-13-20] PASSED                                                                       [ 33%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-14-20] PASSED                                                                       [ 33%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-15-20] PASSED                                                                       [ 34%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-16-20] PASSED                                                                       [ 35%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-17-20] PASSED                                                                       [ 35%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-18-20] PASSED                                                                       [ 36%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-19-20] PASSED                                                                       [ 36%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-3-20-20] PASSED                                                                       [ 37%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-1-20] PASSED                                                                       [ 38%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-2-20] PASSED                                                                       [ 38%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-3-20] PASSED                                                                       [ 39%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-4-20] PASSED                                                                       [ 40%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-5-20] PASSED                                                                       [ 40%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-6-20] PASSED                                                                       [ 41%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-7-20] PASSED                                                                       [ 41%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-8-20] PASSED                                                                       [ 42%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-9-20] PASSED                                                                       [ 43%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-10-20] PASSED                                                                      [ 43%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-11-20] PASSED                                                                      [ 44%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-12-20] PASSED                                                                      [ 45%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-13-20] PASSED                                                                      [ 45%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-14-20] PASSED                                                                      [ 46%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-15-20] PASSED                                                                      [ 46%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-16-20] PASSED                                                                      [ 47%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-17-20] PASSED                                                                      [ 48%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-18-20] PASSED                                                                      [ 48%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-19-20] PASSED                                                                      [ 49%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-10-20-20] PASSED                                                                      [ 50%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20] PASSED                                                                        [ 50%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-2-20] PASSED                                                                        [ 51%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-3-20] PASSED                                                                        [ 51%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-4-20] PASSED                                                                        [ 52%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-5-20] PASSED                                                                        [ 53%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-6-20] PASSED                                                                        [ 53%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-7-20] PASSED                                                                        [ 54%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-8-20] PASSED                                                                        [ 55%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-9-20] PASSED                                                                        [ 55%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-10-20] PASSED                                                                       [ 56%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-11-20] PASSED                                                                       [ 56%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20] PASSED                                                                       [ 57%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-13-20] PASSED                                                                       [ 58%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20] PASSED                                                                       [ 58%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20] PASSED                                                                       [ 59%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20] PASSED                                                                       [ 60%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20] PASSED                                                                       [ 60%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20] PASSED                                                                       [ 61%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-19-20] PASSED                                                                       [ 61%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20] PASSED                                                                       [ 62%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-1-20] PASSED                                                                        [ 63%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-2-20] PASSED                                                                        [ 63%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-3-20] PASSED                                                                        [ 64%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-4-20] PASSED                                                                        [ 65%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-5-20] PASSED                                                                        [ 65%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-6-20] PASSED                                                                        [ 66%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20] PASSED                                                                        [ 66%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-8-20] PASSED                                                                        [ 67%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-9-20] PASSED                                                                        [ 68%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-10-20] PASSED                                                                       [ 68%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-11-20] PASSED                                                                       [ 69%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-12-20] PASSED                                                                       [ 70%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-13-20] PASSED                                                                       [ 70%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-14-20] PASSED                                                                       [ 71%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-15-20] PASSED                                                                       [ 71%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-16-20] PASSED                                                                       [ 72%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-17-20] PASSED                                                                       [ 73%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-18-20] PASSED                                                                       [ 73%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-19-20] PASSED                                                                       [ 74%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-20-20] PASSED                                                                       [ 75%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-1-20] PASSED                                                                        [ 75%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-2-20] PASSED                                                                        [ 76%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-3-20] PASSED                                                                        [ 76%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-4-20] PASSED                                                                        [ 77%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-5-20] PASSED                                                                        [ 78%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-6-20] PASSED                                                                        [ 78%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-7-20] PASSED                                                                        [ 79%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-8-20] PASSED                                                                        [ 80%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-9-20] PASSED                                                                        [ 80%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-10-20] PASSED                                                                       [ 81%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-11-20] PASSED                                                                       [ 81%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-12-20] PASSED                                                                       [ 82%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-13-20] PASSED                                                                       [ 83%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-14-20] PASSED                                                                       [ 83%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-15-20] PASSED                                                                       [ 84%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-16-20] PASSED                                                                       [ 85%]
xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] FAILED                                                                       [ 85%]

=========================================================================== FAILURES ===========================================================================
__________________________________________________________ test_fixed_size_1d_chunks[shape1-3-17-20] ___________________________________________________________

dataarray_factory = <function dataarray_factory.<locals>._dataarray_factory at 0x7f9969444040>, chunksize = 3, shape = (10, 4)

    @pytest.mark.parametrize("chunksize", [1, 2, 3, 10])
    @pytest.mark.parametrize("shape", [(10,), (10,4)])
    def test_fixed_size_1d_chunks(dataarray_factory, chunksize, shape):

>       data_a = dataarray_factory(shape).chunk((chunksize,))

xhistogram/test/test_chunking.py:12:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:1057: in chunk
    ds = self._to_temp_dataset().chunk(
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:488: in _to_temp_dataset
    return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataarray.py:540: in _to_dataset_whole
    dataset = Dataset._construct_direct(variables, coord_names, indexes=indexes)
../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:1008: in _construct_direct
    dims = calculate_dimensions(variables)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

variables = {<this-array>: <xarray.Variable (l: 4)>
array([[ 0.31830519,  1.19267377, -0.36415368,  1.65018558],
       [-0.397767...   [ 0.18824782, -1.29960002,  0.54894081,  0.75569833],
       [ 0.72254191, -0.54123615, -0.24358458,  0.91154796]])}

    def calculate_dimensions(variables: Mapping[Hashable, Variable]) -> Dict[Hashable, int]:
        """Calculate the dimensions corresponding to a set of variables.

        Returns dictionary mapping from dimension names to sizes. Raises ValueError
        if any of the dimension sizes conflict.
        """
        dims: Dict[Hashable, int] = {}
        last_used = {}
        scalar_vars = {k for k, v in variables.items() if not v.dims}
        for k, var in variables.items():
            for dim, size in zip(var.dims, var.shape):
                if dim in scalar_vars:
                    raise ValueError(
                        "dimension %r already exists as a scalar variable" % dim
                    )
                if dim not in dims:
                    dims[dim] = size
                    last_used[dim] = k
                elif dims[dim] != size:
>                   raise ValueError(
                        "conflicting sizes for dimension %r: "
                        "length %s on %r and length %s on %r"
                        % (dim, size, k, dims[dim], last_used[dim])
                    )
E                   ValueError: conflicting sizes for dimension 'l': length 4 on <this-array> and length 10 on <this-array>

../../../miniforge3/envs/xhistogram/lib/python3.8/site-packages/xarray/core/dataset.py:206: ValueError
===================================================================== slowest 10 durations =====================================================================
0.02s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape0-1-1-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-17-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-15-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-16-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-20-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-18-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-2-7-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-1-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-14-20]
0.01s call     xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-1-12-20]
=================================================================== short test summary info ====================================================================
FAILED xhistogram/test/test_chunking.py::test_fixed_size_1d_chunks[shape1-3-17-20] - ValueError: conflicting sizes for dimension 'l': length 4 on <this-array...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================ 1 failed, 136 passed in 2.59s =================================================================

@gjoseph92
Copy link
Contributor

See dask/dask#7711 (comment) for more, but I don't think align_arrays=False is the right thing to do here (without adding other rechunking logic to align the input arrays). I think eventually, it could be a good idea to pick the chunk pattern ourselves (so that one input array with small chunks doesn't split all the others into tiny pieces), but that should only affect performance, not correctness.

From a quick glance at the failures, it seems like there are generally 2 types of errors:

  1. cases where the resulting histogram is different (in particular, it contains more zeros / the counts are much lower than expected)
  2. errors inside Dataset.chunk like conflicting sizes for dimension 'n': length 12 on <this-array> and length 10 on {'n': <this-array>}.

I haven't looked carefully at these tests yet, but I can try to take a closer look soon. One thing I noticed is that:

dims = [random.choice(string.ascii_lowercase) for ax in shape]

does allow for the potential of repeated dimension names in the same array.

@TomNicholas
Copy link
Member Author

Thanks both. This is very helpful.

I don't think align_arrays=False is the right thing to do here

Makes sense - I'll undo that now.

Looks like my dataset_factory fixture is causing at least some of the test failures.

allow for the potential of repeated dimension names

Good point! I've pushed a commit to stop that happening, and everything seems to pass locally now! 🍾

@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

Merging #57 (6fc4161) into master (9c7c722) will increase coverage by 15.37%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #57       +/-   ##
===========================================
+ Coverage   81.81%   97.18%   +15.37%     
===========================================
  Files           3        2        -1     
  Lines         242      249        +7     
  Branches       68       71        +3     
===========================================
+ Hits          198      242       +44     
+ Misses         37        5       -32     
+ Partials        7        2        -5     
Impacted Files Coverage Δ
xhistogram/duck_array_ops.py
xhistogram/xarray.py 96.42% <0.00%> (+4.90%) ⬆️
xhistogram/core.py 97.40% <0.00%> (+18.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c7c722...6fc4161. Read the comment docs.

@rabernat
Copy link
Contributor

Coverage 96.61% <0.00%> (+5.08%) 😄 😄 😄

Copy link
Contributor

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It looks like there are also some linting errors. Could you run the pre-commit hooks and commit the changes so CI passes?

@TomNicholas
Copy link
Member Author

Yep, I've just fixed them. (flake8 didn't like my fixtures though so I did just have to stick a #noqa on the whole test_chunking file)

@TomNicholas
Copy link
Member Author

I don't actually think the tests are complete yet though - there should also be tests targeting dask arrays of weights and bins.

@rabernat
Copy link
Contributor

dask arrays of weights and bins.

Weights yes. Bins no. I think we want to always require bins to be in-memory.

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved
xhistogram/test/fixtures.py Outdated Show resolved Hide resolved
xhistogram/test/fixtures.py Outdated Show resolved Hide resolved
Copy link
Contributor

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something, or are the Hypothesis tests gone now?

xhistogram/test/fixtures.py Outdated Show resolved Hide resolved
xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member Author

Am I missing something, or are the Hypothesis tests gone now?

@gjoseph92 I moved them to another file to avoid a linting error with the hypothesis import, but forgot to git add that file before committing at the end of the day yesterday!

Thanks everyone for their comments - I think I've addressed them all. I've also turned the fixtures into normal functions, and finally I added a test for chunked weights.

One question is whether it would be a good idea to have a test for input arrays with unaligned chunks?

@gjoseph92
Copy link
Contributor

@TomNicholas I definitely think you should test with unaligned chunks, in both the inputs and the weights.

xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved
xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved
TomNicholas and others added 2 commits May 27, 2021 13:16
Co-authored-by: James Bourbeau <[email protected]>
xhistogram/test/fixtures.py Outdated Show resolved Hide resolved
xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved
xhistogram/test/test_chunking.py Outdated Show resolved Hide resolved

# TODO mark as slow?
@pytest.mark.parametrize("n_vars", [1, 2, 3, 4])
@given(chunk_shapes(n_dim=2, max_arr_len=7))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to also test dims= and weights= with Hypothesis. It can be nice to throw all the possible axes of variation into a Hypothesis test as an easy way to check all possible cases, without having to write as many individual tests.

Copy link
Member Author

@TomNicholas TomNicholas May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean? If I make the test_all_chunking_patterns_dd_hist accept a dims (or reduce_axes) argument then I also need a np.histogramdd function that can handle that generality. Is there a quick way to achieve that in the test? Possibly with np.apply_over_axes?

For the weights then I guess I could pass weights and allow the data and weights to have different chunking patterns - is that what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose it's trickier to test that, since you'd need something to do N-D histograms (xhistogram) to verify the results.

I suppose you could just compare against histogram of the computed (NumPy) arrays, and make it purely a test of the dask functionality. If we're confident the NumPy code paths are well-tested, that seems reasonable to me.

But it was just a thought; I think the tests here are already quite good, so fine to leave it as-is too.

@TomNicholas TomNicholas mentioned this pull request Jun 21, 2021
Copy link
Contributor

@dougiesquire dougiesquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @TomNicholas. Looks good to me. It's great that these test include testing of multiple dask arguments, which was previously untested and would have caught #48.

I see that this PR is to merge into xgcm:refactor-histogram-map-blocks which was merged with master in #49. What's the right way to merge this PR into master? I'm a bit of a gitwit.

@TomNicholas
Copy link
Member Author

gitwit

I'm stealing that haha

What's the right way to merge this PR into master?

Good question - apparently you used to have to make a new local branch and push that as a new PR, but now github allows me (or you probably as a maintainer) to edit the target branch directly.

@TomNicholas TomNicholas changed the base branch from refactor-histogram-map-blocks to master June 22, 2021 04:12
@TomNicholas TomNicholas merged commit 3144ebd into xgcm:master Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants