Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Per-Band Support in Ensemble.batch #317

Closed
wilsonbb opened this issue Dec 9, 2023 · 1 comment · Fixed by #327
Closed

Improve Per-Band Support in Ensemble.batch #317

wilsonbb opened this issue Dec 9, 2023 · 1 comment · Fixed by #327
Assignees
Labels
bug Something isn't working

Comments

@wilsonbb
Copy link
Collaborator

wilsonbb commented Dec 9, 2023

It's pretty typical for a user to want to perform per-band analysis via Ensemble.batch. Currently, a user can implement themselves in their custom function via array masking, but this is a) poor user experience b) gives the user opportunity to make an error.

Currently, Ensemble.batch's on parameter could be used to allow the user to specify that the results should be grouped by both object ID and band. This yields an output table with a MultiIndex which (in my attempts at playing around with TAPE) initially works but eventually produces an error in the computation graph (unsurprising since Dask lacks MultiIndex support).

Possible solutions include:

  • @dougbrn has suggested that we could consider using a pivot table in Ensemble.batch similar to what we do in Ensemble.calc_nobs
  • Provide a specific per_band boolean parameter to Ensemble.batch that will take a user's output columns and create one per-each observed band (though we probably want to do some variation of the above anyway).
@wilsonbb wilsonbb added the bug Something isn't working label Dec 9, 2023
@hombit
Copy link
Contributor

hombit commented Dec 11, 2023

I like this idea, it would simplify feature extraction support. However, I see some discussion points here. Would we like user to select a list of bands? Do we really want it in batch?

  1. I don't really understand the first proposal...
  2. It would make batch to be quiet complicated, especially in meta specification by user and return types. I would think about alternatives, for example having a separate batch_per_band or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants