Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnMapper Quality of Life #352

Open
dougbrn opened this issue Jan 18, 2024 · 1 comment
Open

ColumnMapper Quality of Life #352

dougbrn opened this issue Jan 18, 2024 · 1 comment
Labels
question Further information is requested

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Jan 18, 2024

The concept of column mappings was an early component of TAPE design, with the idea being that an Ensemble will know which columns map to known timeseries quantities. The benefits of this being that any (internal) operation that leverages these can use these columns without the user needing to specify a column name. We use these most heavily in Ensemble.batch, when using TAPE analysis functions where the difference is showcased here:

Without Column Mapping:

def tape_analysis_function(time, flux, error, band):
    return result

ensemble.batch(tape_analysis_function, flux="flux_col", time="time_col", error="error_col", band="band_col")

With Column Mapping:

def tape_analysis_function(time, flux, error, band):
    return result

ensemble.batch(tape_analysis_function)

Of course, the above only applies to TAPE analysis functions, and not any externally defined user functions. And a further downside is that ColumnMapping requires users to set up the mapping manually up front, before any data is loaded. And it's probably the most cumbersome component of setting up a new Ensemble workflow, with code looking like this:

ens = Ensemble()

column_mapper = ColumnMapper(id_col="object_id", time_col="mjd", flux_col="flux", err_col="err", band_col="band")

ens.from_parquet(...)

This ticket is really just asking whether this is adding value to TAPE or not. @nevencaplar has had issues with it from a usability perspective. Do we think users would prefer to just deal with what operations are using what columns manually in all cases? In the future, we have had plans to build out the suite of default mappings for the major surveys. Where users of ZTF, LSST, PS1 data would only need to specify:

column_mapper = ColumnMapper().use_known_map("ZTF")

In this case, we would still only be choosing some column set, and it's possible that users will want to choose different columns to map.

@dougbrn dougbrn added the question Further information is requested label Jan 18, 2024
@dougbrn
Copy link
Collaborator Author

dougbrn commented Jan 25, 2024

From LINCC-UP, the decision was made to deprecate column mapping, @dougbrn will investigate the scope of work needed in the internals to move away from it. Edit: It's possible we may actually keep this...

@dougbrn dougbrn self-assigned this Jan 25, 2024
@dougbrn dougbrn removed their assignment Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant