Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save Ensemble to Parquet #151

Closed
6 tasks done
dougbrn opened this issue Jul 11, 2023 · 1 comment · Fixed by #343
Closed
6 tasks done

Save Ensemble to Parquet #151

dougbrn opened this issue Jul 11, 2023 · 1 comment · Fixed by #343
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@dougbrn
Copy link
Collaborator

dougbrn commented Jul 11, 2023

Saving the ensemble to parquet would be useful for saving state. The object and source tables should probably be saved as two separate parquet files.

Neccessary Components:

  • ensemble.save_ensemble() function; Saves an Ensemble to disk, with subdirectories for object and source partitions. Option (perhaps on by default) to include all result frames, or just a subset of result frames
  • ensemble.from_ensemble() function; Reads the directory structure established by save (parquet-like) and loads into a new ensemble, I think this will need to be a separate function from read_parquet to handle the addition of many result tables implicitly
  • from_ensemble(); An Ensemble constructor function that bypasses ensemble initialization
  • unit tests; with some consideration to how we test the save function within the actions framework. Does it just purge any created files automatically? Edit: No it doesn't. Resolved this by using pytest with pathlib.Path temporary directories
  • documentation; this feature will be important enough to need some documentation beyond just autoapi.

Non-critical Components:

  • Potentially save the column mapper and load it within the save/load internals. Avoid needing to specify the column mapper in the load.

Open Questions/Issues:

  • Schema error with saving Stetson_J result frames to parquet. These frames have a dictionary result column, it may be that there's a simple schema argument to pass, but this would also be resolved by just making stetson_J return a column per band (as has been talked about for a while). Given that stetson_J is really just a tester function for us, I'm opting to not spend time on getting it working.
@hombit
Copy link
Contributor

hombit commented Dec 26, 2023

We should also save result frames

@dougbrn dougbrn self-assigned this Jan 8, 2024
@dougbrn dougbrn mentioned this issue Jan 8, 2024
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants