-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calc nobs divisions #288
Calc nobs divisions #288
Conversation
pinging both @wilsonbb for general technical review and @nevencaplar to make sure this works for his use case. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #288 +/- ##
==========================================
+ Coverage 94.01% 94.02% +0.01%
==========================================
Files 23 23
Lines 1203 1206 +3
==========================================
+ Hits 1131 1134 +3
Misses 72 72 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, it seems we should likely file an issue for adding a multi-partition data fixture in our testing suite. This hopefully would be more robust for catching issues like these.
Ultimately, looks good to me!
band_col = self._band_col | ||
|
||
# Get the band metadata | ||
unq_bands = np.unique(self._source[band_col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm not fully certain about the performance differences, but I feel self._source[band_col].unique()
might be preferable and slightly more readable
I have verified that this solves issue 287 |
Addresses #287. Makes calc_nobs use map partitions to propagate metadata instead of groupby when divisions are known. This introduces an annoying cost of having to calculate the unique bands for the resulting meta in the by_band=True case, it's slow, but works for now.