SchemaError breaks pickle #713

matthiashuschle · 2021-12-18T09:48:43Z

Description
SchemaError (and probably SchemaErrors) have a couple of problems that make it impossible to use them with pickle:

Pickling breaks when the schema attribute contains Check objects with lambdas or local functions from Check.isin or similar.
Unpickling breaks always, as the signature differs from Exception and has more than one required positional argument.

This is relevant, because when a subprocess raises an uncaught exception - which might be the intention of using pandera - the exception object is part of the return value, which is piped to the parent process using pickle. This usecase also raises the third issue, that the size limit of these pipes is 2GiB per pickled object, and the data contained in the exception might easily become larger.

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the master branch of pandera.

Code Sample

import pickle
import pandas as pd
from pandera import DataFrameSchema, Check, Column
from pandera.errors import SchemaError
data = pd.DataFrame({"a": [-1, 0, 1]})

# case 1 with Check.isin:
schema = DataFrameSchema({
    "a": Column(int, Check.isin([0, 1]))
})
try:
    schema.validate(data)
except SchemaError as exc:
    # raises AttributeError
    pickle.loads(pickle.dumps(exc))

# case 1 with lambda:
schema = DataFrameSchema({
    "a": Column(int, Check(lambda x: x > 0))
})
try:
    schema.validate(data)
except SchemaError as exc:
    # raises PicklingError
    pickle.loads(pickle.dumps(exc))

# case 2:
schema = DataFrameSchema({
    "a": Column(str)
})
try:
    schema.validate(data)
except SchemaError as exc:
    # raises TypeError during unpickling
    pickle.loads(pickle.dumps(exc))

Expected behavior

None of those should raise an exception. Then the exception would be handed to the parent process in a multiprocessing setting. There is no way to keep the actual data and schema attributes in this case, so they should be replaced.

Desktop (please complete the following information):

OS: Ubuntu 20.04
Python: 3.7, 3.8

Proposal

The unpickling issue can be solved by implementing __reduce__.
The problem with unpicklabe content and possibly huge attributes can not be solved while preserving them. My proposal would be to implement __getstate__ to map all attributes of __dict__ to their string representation.

If there is consensus of the desired behavior, I can implement it.

The text was updated successfully, but these errors were encountered:

cosmicBboy · 2021-12-18T21:43:25Z

Thanks for raising this issue @matthiashuschle!

Your proposal looks good, and thanks in advance for the contribution!

Let me know if you have any questions about dev environment setup.

* make SchemaError and SchemaErrors picklable * make ParserError picklable * refactor #713

cosmicBboy · 2021-12-31T16:55:54Z

fixed by #722

matthiashuschle added the bug Something isn't working label Dec 18, 2021

cosmicBboy assigned matthiashuschle Dec 18, 2021

matthiashuschle mentioned this issue Dec 27, 2021

make SchemaError and SchemaErrors picklable #722

Merged

matthiashuschle added a commit to matthiashuschle/pandera that referenced this issue Dec 30, 2021

refactor unionai-oss#713

7ee5d82

cosmicBboy pushed a commit that referenced this issue Dec 31, 2021

make SchemaError and SchemaErrors picklable (#722)

9448d0a

* make SchemaError and SchemaErrors picklable * make ParserError picklable * refactor #713

cosmicBboy closed this as completed Dec 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SchemaError breaks pickle #713

SchemaError breaks pickle #713

matthiashuschle commented Dec 18, 2021

cosmicBboy commented Dec 18, 2021

cosmicBboy commented Dec 31, 2021

SchemaError breaks pickle #713

SchemaError breaks pickle #713

Comments

matthiashuschle commented Dec 18, 2021

Code Sample

Expected behavior

Desktop (please complete the following information):

Proposal

cosmicBboy commented Dec 18, 2021

cosmicBboy commented Dec 31, 2021