Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smart read #49

Closed
malmans2 opened this issue Jun 14, 2023 · 12 comments
Closed

smart read #49

malmans2 opened this issue Jun 14, 2023 · 12 comments

Comments

@malmans2
Copy link
Contributor

Hi,

I'm surprised you had to code smart_read to have good performance.
I have a few tips, but before I would like to explore xarray/dask/numpy optimisations.

Do you have a good MRE where smart_read makes the difference and that I can use to benchmark?

@MaceKuailv
Copy link
Owner

Essentially, if you do many unstructured reads on a small number of chunks the difference is very large.

I think if I generate a dataset using things like zeros, it is going to be different from reading the disks, and the two methods will have similar performance. So I would need an actual dataset to write an example. Does sciserver works for you?

@malmans2
Copy link
Contributor Author

yup, sciserver works

@MaceKuailv
Copy link
Owner

Here is an example

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog('HYCOM')
size = 100000
iy = np.random.randint(1251, size = size)
ix = np.random.randint(701, size = size)
it = np.random.randint(5, size = size)
# this is randomly seeded in two chunks. 

# %%time this block runs for about 130ms
array = np.array(od._ds.Eta.isel(time = slice(0,5)))
res = array[it,iy,ix]

# %%time this run a little less than 2 seconds
xrind = tuple(xr.DataArray(dim, dims=["new"]) for dim in (it,iy,ix))
res = np.array(od._ds.Eta[xrind])

The difference here is actually not very large, but if you do the same operation repeatedly, it is going to be significant.
I remember from a year ago there are cases on ECCO that the former is hundreds of times faster than the latter.

@MaceKuailv
Copy link
Owner

I agree this is an ad hoc thing, and the majority of the function is not very frequently called.

When I first started developing this, I remember reading one single entry from ECCO using xarray takes 2 seconds. This is either because the xarray code has improved, or because we were using netcdf at that point.

@malmans2
Copy link
Contributor Author

malmans2 commented Jun 14, 2023

I think you are showing something slightly different here.
Yes, xarray and dask are slower when you can load all data in memory (see: pydata/xarray#2799).

If you are sure that you can load the data, this is faster:

da.values[indexes0, ...]

But in smart_read it looks like you are doing something different. I don't follow exactly, are you looping over chunks to extract particles for each chunk separately? If yes, is that faster than xarray, or even better dask?

da.data.vindex[indexes0, ...]  # data return dask if chunked, numpy otherwise

@malmans2
Copy link
Contributor Author

malmans2 commented Jun 14, 2023

This is my benchmark:

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog("HYCOM")
da = od._ds.Eta.isel(time=slice(0, 5))

size = 100000
iy = np.random.randint(1251, size=size)
ix = np.random.randint(701, size=size)
it = np.random.randint(5, size=size)

indexers = {dim: xr.DataArray(indexes) for dim, indexes in zip(da.dims, (it, iy, ix))}
_ = da.values  # make sure there's no caching
%%timeit
da.values[it, iy, ix]
92.1 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
da.isel(indexers).values
1.49 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
da.data.vindex[it, iy, ix].compute()
1.42 s ± 75.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@malmans2
Copy link
Contributor Author

malmans2 commented Jun 14, 2023

Here is the same benchmark removing the slicing of da performed in the previous one.
Data is big now, so I had to remove numpy benchmark because it blows up.
However, both xarray and pure dask works with no problem, and they take about the same time as before:

import oceanspy as ospy
import numpy as np
import xarray as xr

od = ospy.open_oceandataset.from_catalog("HYCOM")
da = od._ds.Eta

size = 100000
iy = np.random.randint(1251, size=size)
ix = np.random.randint(701, size=size)
it = np.random.randint(5, size=size)

indexers = {dim: xr.DataArray(indexes) for dim, indexes in zip(da.dims, (it, iy, ix))}
%%timeit
da.isel(indexers).values
1.47 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
da.data.vindex[it, iy, ix].compute()
1.58 s ± 118 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@MaceKuailv
Copy link
Owner

For my purposes, there tend to be some locality of the data. For example, the velocities I read is going to be at the same time. So the indexes I am interested in tends to be on a few chunks. The problem for using values is that it may not be very easy to figure out how to slice. For example, for LLC 4320 data, each face is a chunk, but I want to read only from face 2 and 10 but not 3,4,5,6...

Since most of the time is spend reading the data, figuring out which chunks to read is not too time consuming. If I only need to read a couple of chunks, it is faster than xarray (at least was).

Again, I am not claiming this is anywhere near the best way to do it. I isolate the function so much is precisely because I would like to improve it in the future.

@malmans2
Copy link
Contributor Author

ahhh, now I understand the goal of this function.
Indeed it's much faster using the example above! I'll see if we can implement something similar in xarray.

%%timeit
smart_read(da, (it, iy, ix))
281 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@MaceKuailv
Copy link
Owner

Got quite some improvement here #51. We might revisit some day.

@malmans2
Copy link
Contributor Author

Ah, you merged quick. Final comments:

  1. docstring is too long and still a little vague. I'd use a single sentence and get rid of all the useless stuff (e.g., if you want to say that you'll change it, say what needs to be improved. Otherwise, it's just adding noise and we have more docstring than code).
  2. TODO comment. We had a discussion about it, looks like you know what's going on, change the comment with something more meaningful, like # TODO: we need this because ...
  3. Not 100% sure, but I think it's better to add dask[array] in pyproject rather than dask
  4. Why did you change the dask switch from 100 to 10?.

@MaceKuailv
Copy link
Owner

1. docstring is too long and still a little vague. I'd use a single sentence and get rid of all the useless stuff (e.g., if you want to say that you'll change it, say what needs to be improved. Otherwise, it's just adding noise and we have more docstring than code).

OK, I will change that.

2. TODO comment. We had a discussion about it, looks like you know what's going on, change the comment with something more meaningful, like `# TODO: we need this because ...`

I don't think there is anything that needs to be done about it here. But I will change that.

3. Not 100% sure, but I think it's better to add `dask[array]` in pyproject rather than `dask`

I didn't know you could just add dask.array as an dependency. I will try that.

4. Why did you change the dask switch from 100 to 10?.

Based on some rough bench marking, for a typical number of particles, 10^4- 10^7 (I tested 10^5) this is a good criterion to switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants