smart read #49

malmans2 · 2023-06-14T16:18:44Z

Hi,

I'm surprised you had to code smart_read to have good performance.
I have a few tips, but before I would like to explore xarray/dask/numpy optimisations.

Do you have a good MRE where smart_read makes the difference and that I can use to benchmark?

The text was updated successfully, but these errors were encountered:

MaceKuailv · 2023-06-14T16:52:41Z

Essentially, if you do many unstructured reads on a small number of chunks the difference is very large.

I think if I generate a dataset using things like zeros, it is going to be different from reading the disks, and the two methods will have similar performance. So I would need an actual dataset to write an example. Does sciserver works for you?

malmans2 · 2023-06-14T17:05:30Z

yup, sciserver works

MaceKuailv · 2023-06-14T18:11:38Z

Here is an example

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog('HYCOM')
size = 100000
iy = np.random.randint(1251, size = size)
ix = np.random.randint(701, size = size)
it = np.random.randint(5, size = size)
# this is randomly seeded in two chunks. 

# %%time this block runs for about 130ms
array = np.array(od._ds.Eta.isel(time = slice(0,5)))
res = array[it,iy,ix]

# %%time this run a little less than 2 seconds
xrind = tuple(xr.DataArray(dim, dims=["new"]) for dim in (it,iy,ix))
res = np.array(od._ds.Eta[xrind])

The difference here is actually not very large, but if you do the same operation repeatedly, it is going to be significant.
I remember from a year ago there are cases on ECCO that the former is hundreds of times faster than the latter.

MaceKuailv · 2023-06-14T18:25:35Z

I agree this is an ad hoc thing, and the majority of the function is not very frequently called.

When I first started developing this, I remember reading one single entry from ECCO using xarray takes 2 seconds. This is either because the xarray code has improved, or because we were using netcdf at that point.

malmans2 · 2023-06-14T19:00:16Z

I think you are showing something slightly different here.
Yes, xarray and dask are slower when you can load all data in memory (see: pydata/xarray#2799).

If you are sure that you can load the data, this is faster:

da.values[indexes0, ...]

But in smart_read it looks like you are doing something different. I don't follow exactly, are you looping over chunks to extract particles for each chunk separately? If yes, is that faster than xarray, or even better dask?

da.data.vindex[indexes0, ...]  # data return dask if chunked, numpy otherwise

malmans2 · 2023-06-14T19:01:43Z

This is my benchmark:

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog("HYCOM")
da = od._ds.Eta.isel(time=slice(0, 5))

size = 100000
iy = np.random.randint(1251, size=size)
ix = np.random.randint(701, size=size)
it = np.random.randint(5, size=size)

indexers = {dim: xr.DataArray(indexes) for dim, indexes in zip(da.dims, (it, iy, ix))}

_ = da.values  # make sure there's no caching

%%timeit
da.values[it, iy, ix]

92.1 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
da.isel(indexers).values

1.49 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
da.data.vindex[it, iy, ix].compute()

1.42 s ± 75.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

malmans2 · 2023-06-14T19:12:34Z

Here is the same benchmark removing the slicing of da performed in the previous one.
Data is big now, so I had to remove numpy benchmark because it blows up.
However, both xarray and pure dask works with no problem, and they take about the same time as before:

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog("HYCOM")
da = od._ds.Eta

size = 100000
iy = np.random.randint(1251, size=size)
ix = np.random.randint(701, size=size)
it = np.random.randint(5, size=size)

indexers = {dim: xr.DataArray(indexes) for dim, indexes in zip(da.dims, (it, iy, ix))}

%%timeit
da.isel(indexers).values

1.47 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
da.data.vindex[it, iy, ix].compute()

1.58 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MaceKuailv · 2023-06-14T19:58:26Z

For my purposes, there tend to be some locality of the data. For example, the velocities I read is going to be at the same time. So the indexes I am interested in tends to be on a few chunks. The problem for using values is that it may not be very easy to figure out how to slice. For example, for LLC 4320 data, each face is a chunk, but I want to read only from face 2 and 10 but not 3,4,5,6...

Since most of the time is spend reading the data, figuring out which chunks to read is not too time consuming. If I only need to read a couple of chunks, it is faster than xarray (at least was).

Again, I am not claiming this is anywhere near the best way to do it. I isolate the function so much is precisely because I would like to improve it in the future.

malmans2 · 2023-06-14T20:18:22Z

ahhh, now I understand the goal of this function.
Indeed it's much faster using the example above! I'll see if we can implement something similar in xarray.

%%timeit
smart_read(da, (it, iy, ix))

281 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MaceKuailv · 2023-06-16T14:34:43Z

Got quite some improvement here #51. We might revisit some day.

malmans2 · 2023-06-16T14:43:37Z

Ah, you merged quick. Final comments:

docstring is too long and still a little vague. I'd use a single sentence and get rid of all the useless stuff (e.g., if you want to say that you'll change it, say what needs to be improved. Otherwise, it's just adding noise and we have more docstring than code).
TODO comment. We had a discussion about it, looks like you know what's going on, change the comment with something more meaningful, like # TODO: we need this because ...
Not 100% sure, but I think it's better to add dask[array] in pyproject rather than dask
Why did you change the dask switch from 100 to 10?.

MaceKuailv · 2023-06-16T14:52:57Z

1. docstring is too long and still a little vague. I'd use a single sentence and get rid of all the useless stuff (e.g., if you want to say that you'll change it, say what needs to be improved. Otherwise, it's just adding noise and we have more docstring than code).

OK, I will change that.

2. TODO comment. We had a discussion about it, looks like you know what's going on, change the comment with something more meaningful, like `# TODO: we need this because ...`

I don't think there is anything that needs to be done about it here. But I will change that.

3. Not 100% sure, but I think it's better to add `dask[array]` in pyproject rather than `dask`

I didn't know you could just add dask.array as an dependency. I will try that.

4. Why did you change the dask switch from 100 to 10?.

Based on some rough bench marking, for a typical number of particles, 10^4- 10^7 (I tested 10^5) this is a good criterion to switch.

MaceKuailv closed this as completed Jun 16, 2023

MaceKuailv mentioned this issue Jun 18, 2023

Break-up analytical function and test #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smart read #49

smart read #49

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023 •

edited

Loading

malmans2 commented Jun 14, 2023 •

edited

Loading

malmans2 commented Jun 14, 2023 •

edited

Loading

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 16, 2023

malmans2 commented Jun 16, 2023

MaceKuailv commented Jun 16, 2023

smart read #49

smart read #49

Comments

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023 • edited Loading

malmans2 commented Jun 14, 2023 • edited Loading

malmans2 commented Jun 14, 2023 • edited Loading

MaceKuailv commented Jun 14, 2023

malmans2 commented Jun 14, 2023

MaceKuailv commented Jun 16, 2023

malmans2 commented Jun 16, 2023

MaceKuailv commented Jun 16, 2023

malmans2 commented Jun 14, 2023 •

edited

Loading

malmans2 commented Jun 14, 2023 •

edited

Loading

malmans2 commented Jun 14, 2023 •

edited

Loading