Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make SchemaError and SchemaErrors picklable #722

Merged
merged 3 commits into from
Dec 31, 2021

Conversation

matthiashuschle
Copy link
Contributor

@matthiashuschle matthiashuschle commented Dec 27, 2021

description

  • Solves SchemaError breaks pickle #713
  • The issue extends to ParserError.
  • As BaseException implements __reduce__, which expects an args attribute in the object, it needs to be overridden.
  • To abstract the logic for all three affected classes, ReducedPickleExceptionBase was created as base class for exceptions that need to convert attributes before pickling, so actual changes in the final exception classes are minimal.
    • Its __reduce__ composes the state from args, and __dict__, and converts all __dict__ elements listed in TO_STRING_KEYS to strings. The return signature of __reduce__ uses the three-tuple, that uses __new__ for object creation and then updates it via __setstate__.
    • Its __setstate__ creates a warning stating the non-conservation of state, listing converted keys.
  • Most non-primitive members are converted to string.
  • SchemaErrors.schema_errors is not primitive, but not transformed in the process. I did not encounter an unpicklable case and it is highly unlikely to become problematic in size.

rejected ideas - up for discussion

  • using sample or head instead of string conversion to reduce memory footprint of data members. The idea is to allow some degree of inspection. However, all ways to reduce data size are arbitrary to some degree, and string representation breaks any attempt to accidentally use incomplete data in further processing.

@codecov
Copy link

codecov bot commented Dec 28, 2021

Codecov Report

Merging #722 (7ee5d82) into master (eb130b4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #722   +/-   ##
=======================================
  Coverage   98.40%   98.40%           
=======================================
  Files          39       39           
  Lines        3757     3773   +16     
=======================================
+ Hits         3697     3713   +16     
  Misses         60       60           
Impacted Files Coverage Δ
pandera/errors.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb130b4...7ee5d82. Read the comment docs.

@cosmicBboy
Copy link
Collaborator

Thanks @matthiashuschle !

SchemaError.failure_cases seems to be limited to 10 entries. Is this documented somewhere? I'd like to add a hint to TestSchemaError._validate_error

It's not great, but the only place this is documented is the Check init signature: https://pandera.readthedocs.io/en/stable/reference/generated/pandera.checks.Check.html#pandera.checks.Check

Added a warning on unpickling (to be discussed).

This makes sense to do. Maybe state explicitly what state isn't preserved? (e.g. "Pickling ParserError does not preserve the following state: failure_cases, data, etc.")

@cosmicBboy
Copy link
Collaborator

also, be sure to set up pre-commit to catch mypy/linting/unit test errors

@matthiashuschle matthiashuschle marked this pull request as ready for review December 30, 2021 17:38
@matthiashuschle
Copy link
Contributor Author

Thanks @cosmicBboy !
The latest commit does some streamlining and I consider this now in shape for reviewing. The Linter errors also appear in the current dev branch. I don't think they are related to my changes.

@cosmicBboy cosmicBboy changed the base branch from dev to master December 31, 2021 15:16
@cosmicBboy
Copy link
Collaborator

okay, gonna merge this into master, lint tests aren't related to PR

@cosmicBboy cosmicBboy merged commit 9448d0a into unionai-oss:master Dec 31, 2021
@cosmicBboy cosmicBboy mentioned this pull request Dec 31, 2021
3 tasks
@cosmicBboy
Copy link
Collaborator

thanks @matthiashuschle, and congrats on your first contribution to pandera! 🎉🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants