You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem
Group by is an expensive operation. Therefore I want to store my dataset to disk in the form of groups from group by operation. My use case is concerned with the groups, for example I want to take advantage of the lazy loading and only want to load selected groups into memory and process them.
Preferred Solution
An additional parameter to pass the groups to cache when writing the dataset to disk
Or
A separate function to write the dataset as a collection of groups to file.
Alternatives considered
Treating each group as separate dataset and writing each of them to separate file. This is not suitable if number of groups is large and each group is relatively very small.
Additional context
It would also be great if groupby operation is natively supported for multiple coordinates.
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion. This didn't get traction and we're trying to keep issues < 1000, so I'll close, but feel free to suggest again, possibly with some motivating examples.
Funnily enough, the shuffle_by op here is one way to accomplish this: #9320 . Chunk boundaries will line up with group boundaries, so with the right data format this will work well.
Problem
Group by is an expensive operation. Therefore I want to store my dataset to disk in the form of groups from group by operation. My use case is concerned with the groups, for example I want to take advantage of the lazy loading and only want to load selected groups into memory and process them.
Preferred Solution
Or
Alternatives considered
Treating each group as separate dataset and writing each of them to separate file. This is not suitable if number of groups is large and each group is relatively very small.
Additional context
It would also be great if groupby operation is natively supported for multiple coordinates.
The text was updated successfully, but these errors were encountered: