Unit tests are slow #213

ZedThree · 2021-09-01T17:19:38Z

The unit tests take really quite a long time to run. I used the pytest profiling extension, and it looks like creating and opening datasets takes the vast majority of the time.

Is there a lot of reading/writing to disk? Would it be possible to just do things in memory?

johnomotani · 2021-09-02T09:43:53Z

I did implement creating-in-memory in #132. I thought it was being used in most places (there's a save-to-disk argument to the input-creation functions that defaults to not writing). It's possible we need to switch more of the tests over. But I think the combining-into-a-single-Dataset operation may be relatively expensive. The data is created with a lot of small variations on the inputs in different tests or when parameterizing - it might still be worth caching the created datasets, and maybe trying to reduce the number of variations, but it's probably quite a faff to do!

johnomotani · 2021-09-07T13:28:28Z

I had an idea for caching the input data, to avoid re-creating it. Working on it now, PR hopefully coming soon...

johnomotani · 2021-09-07T16:16:23Z

Looking at how long different tests take, the biggest chunks of time are the parallel interpolation tests (which you'd expect to be computationally heavy because they have four FFT transforms as well as the interpolation stencil operation), and the Region tests and TestPlot::test_region_disconnecteddoublenull, which have a lot of different cases to cover. It might be worth someone taking a look at those two sets in particular: are there more cases than we really need? Could the tests be done with smaller arrays without compromising anything? For TestPlot::test_region_disconnecteddoublenull specifically, it might be worth splitting into one test that includes all the plot functions currently included for one or a few combinations of input parameters, and a second that only uses one plot function (say pcolormesh) to test all the parameters.

ZedThree · 2022-10-19T09:02:07Z

xarray 2022.10 causes further slow down of the tests -- deep copying seems to have gotten x20 slower

See, e.g. #252 (comment)

For TestPlot::test_region_disconnecteddoublenull specifically, it might be worth splitting into one test that includes all the plot functions currently included for one or a few combinations of input parameters, and a second that only uses one plot function (say pcolormesh) to test all the parameters.

Sounds good

ZedThree · 2022-10-25T10:57:41Z

The deep copying is more expensive because it's now copying DataArrays in attributes -- and the Regions have two or three nested levels of this. So ways of removing that nesting are probably worth investigating. Can the Regions make use of views somehow perhaps?

johnomotani · 2022-10-25T12:54:38Z

hmm... wonder if we actually need to deep-copy at all? I suspect the idea at the time was that it would be safer, but maybe it's not needed...

ZedThree · 2022-10-25T13:02:54Z

It seems that prior to 2022.10, deep copying didn't deep copy the attributes (and therefore regions?), so presumably a shallow copy should be fine now?

I switched some deep copies to shallow ones and it did make a substantial difference to the one test I've been running, but it's still x10 slower than before.

TomNicholas · 2023-01-19T22:17:05Z

Can I ask why xBOUT stores arrays of data in the attrs?

Whilst xarray's .attrs can technically store arbitrary objects, it's intended for storing small metadata (e.g. names of units, provenance, explanation of what variables represent etc.). Putting entire xr.DataArrays in the .attrs is not an intended use at all. If xBOUT didn't put actual data arrays in the attrs then the difference between deep- or shallow-copying metadata wouldn't matter.

johnomotani · 2023-01-20T10:04:47Z

The DataArrays in attrs were in the 'regions'. They were actually just a few scalars, so probably makes sense to switch stick in .values to have them as just numbers.

This change is a 4-liner, and substantially reduces the number of errors we get from xarray-2022.9.0. I'll make a draft PR as a starting point for fixing #275.

johnomotani mentioned this issue Sep 7, 2021

Cache input data for tests #215

Merged

johnomotani mentioned this issue Jan 19, 2023

xarray - compatibility with latest versions #275

Open

johnomotani mentioned this issue Jan 20, 2023

Compatibility with xarray>=2022.9.0 #276

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit tests are slow #213

Unit tests are slow #213

ZedThree commented Sep 1, 2021

johnomotani commented Sep 2, 2021

johnomotani commented Sep 7, 2021

johnomotani commented Sep 7, 2021 •

edited

Loading

ZedThree commented Oct 19, 2022

ZedThree commented Oct 25, 2022

johnomotani commented Oct 25, 2022

ZedThree commented Oct 25, 2022

TomNicholas commented Jan 19, 2023

johnomotani commented Jan 20, 2023 •

edited

Loading

Unit tests are slow #213

Unit tests are slow #213

Comments

ZedThree commented Sep 1, 2021

johnomotani commented Sep 2, 2021

johnomotani commented Sep 7, 2021

johnomotani commented Sep 7, 2021 • edited Loading

ZedThree commented Oct 19, 2022

ZedThree commented Oct 25, 2022

johnomotani commented Oct 25, 2022

ZedThree commented Oct 25, 2022

TomNicholas commented Jan 19, 2023

johnomotani commented Jan 20, 2023 • edited Loading

johnomotani commented Sep 7, 2021 •

edited

Loading

johnomotani commented Jan 20, 2023 •

edited

Loading