Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pickling BuildsFoo #333

Closed
Jasha10 opened this issue Nov 8, 2022 · 6 comments · Fixed by #360
Closed

Support pickling BuildsFoo #333

Jasha10 opened this issue Nov 8, 2022 · 6 comments · Fixed by #360
Labels
enhancement New feature or request

Comments

@Jasha10
Copy link
Contributor

Jasha10 commented Nov 8, 2022

Currently it seems pickling the output of hydra_zen.builds is not possible:

import pickle
import re
from hydra_zen import builds
from omegaconf import OmegaConf
from pytest import raises

def foo() -> None:
    pass

BuildsFoo = builds(foo)
cfg = OmegaConf.create(BuildsFoo)

with raises(pickle.PickleError, match=re.escape("Can't pickle <class 'types.Builds_foo'>: attribute lookup Builds_foo on types failed")):
    pickle.dumps(cfg)

I believe the PickleError comes up because BuildsFoo is generated dynamically at runtime.

I wonder if it's possible to work around this unpicklability by defining a custom BuildsFoo.__reduce__ dunder method.
(Ref: Stackoverflow question on pickling dynamically generated classes)

@rsokl rsokl added the enhancement New feature or request label Nov 8, 2022
@rsokl
Copy link
Contributor

rsokl commented Nov 8, 2022

Thanks so much for bringing this to our attention, @Jasha10 !

Fortunately, dataclasses – even dynamically generated ones – already have a __reduce__ method:

>>> from hydra_zen import builds

>>> builds(dict, x=1)().__reduce__()
(<function copyreg._reconstructor(cls, base, state)>,
 (types.Builds_dict, object, None),
 {'x': 1})

So what this all seems to come down to is: the generated dataclass has to have __module__ and a __name__ that enables it to be "found". That is

  • its __module__ cannot be the default types that make_dataclass uses, it needs to reflect the caller's module
  • the referencing symbol needs to have the same name as the class object

See the following:

import pickle
import re
from hydra_zen import builds
from omegaconf import OmegaConf
from pytest import raises

def foo(x) -> None:
    pass

# Note: reference name and dataclass_name must match 
A = builds(foo, dataclass_name="A", populate_full_signature=True)
A.__module__ = __name__

cfg1 = OmegaConf.create(A)
cfg2 = OmegaConf.create(A(x=3))

assert pickle.loads(pickle.dumps(A)) is A
assert pickle.loads(pickle.dumps(A(x=2))) == A(x=2)
assert pickle.loads(pickle.dumps(cfg1)) == cfg1
assert pickle.loads(pickle.dumps(cfg2)) == cfg2

So, unfortunately, I do not think that there is anything we can do to add automatic pickling support for just, builds, et al. That being said, I could add a "module" argument that users can specify:

# contents of my_lib/configs.py
from hydra_zen import make_custom_builds_fn

builds = make_custom_builds_fn(zen_module=__name__)

# these can be pickled
Foo = builds(foo, dataclass_name="Foo")
Bar = builds(bar, zen_partial=True, dataclass_name="Bar")

I can also take care to ensure that all of hydra-zen's auto-config functionality (e.g. just(1+2j)) produces pickle-able dataclasses, so that things like A = builds(dict, x=1+2j, dataclass_name="A", zen_module=__name__) work.

Unfortunately, patterns like builds(dict, x=builds(int, 2)) will never support pickling, I fear. Also many of the patterns encouraged by hydra_zen.store will produce un-pickleable configs.

In total, hydra_zen can enable users to configure the __module__ of their dynamically-generated dataclasses 1 and we can document patterns that users can stick to in order to ensure pickle support. We can also ensure that our auto-config functionality always produces pickle-compatible dataclasses. That all being said, there will inevitably be patterns that preclude some configs from being pickled.

I am curious to know if this pickle compatibility issue is potentially a big hurdle for compatibility with upcoming omegaconf/Hydra features. Would this be a sticking point that hydra-zen users hit frequently?

Footnotes

  1. And perhaps I can do something to make specifying dataclass_name="Bar" less verbose – if I could go back in time and change some of hydra-zen's lengthier names I would. Another thought: maybe builds et al could have a mode concise auto-naming scheme. Right now Builds_<Target> is also verbose.

@Jasha10
Copy link
Contributor Author

Jasha10 commented Nov 8, 2022

Thanks for the suggested workaround!

I am curious to know if this pickle compatibility issue is potentially a big hurdle for compatibility with upcoming omegaconf/Hydra features. Would this be a sticking point that users hit frequently?

Hydra's core functionality (i.e. config composition) does not depend on pickle at all, and I don't foresee any upcoming features that leverage pickle either. As far as current use-cases for pickle+Hydra, here's what comes to mind:

  • Some of hydra's launcher plugins make use of cloudpickle to serialize job configs so that they can be sent between processes or over the network. This is probably the most common use-case for pickle among Hydra users broadly.
  • Hydra's experimental re-run functionality depends on pickle. I believe this means pytorch-ligntning's Hydra+DDP support #11617 depends on configs being picklable too, as that PR makes use of the --experimental-rerun flag. (I wonder if @jgbos has run into pickling issues when using builds+PL+DDP)
  • OmegaConf objects are generally picklable, and we try to support forward-compatibility of old pickles with new versions of omegaconf on a best-effort basis. Some people may manually pickle their configs with the goal of reanimating them later, though I suspect this usage pattern is not super common.

@Jasha10
Copy link
Contributor Author

Jasha10 commented Nov 8, 2022

Amazingly, cloudpickle gets the job done!

from cloudpickle import dumps, loads
from hydra_zen import builds
from omegaconf import OmegaConf

def foo(x) -> None:
    pass

A = builds(foo, populate_full_signature=True)

cfg1 = OmegaConf.create(A)
cfg2 = OmegaConf.create(A(x=3))

assert loads(dumps(A)) is A
assert loads(dumps(A(x=2))) == A(x=2)
assert loads(dumps(cfg1)) == cfg1
assert loads(dumps(cfg2)) == cfg2

B = builds(dict, x=builds(int, 2))

cfg3 = OmegaConf.create(B)
cfg4 = OmegaConf.create(B())

assert loads(dumps(B)) is B
assert loads(dumps(cfg1)) == cfg1
assert loads(dumps(cfg2)) == cfg2

@jgbos
Copy link
Contributor

jgbos commented Nov 8, 2022

Yep, cloudpickle is what I've had to use.

@rsokl
Copy link
Contributor

rsokl commented Nov 8, 2022

Just a heads-up that cloudpickle has issues with dataclasses: cloudpipe/cloudpickle#386

Thanks for the additional context @Jasha10 . I will plan on tackling the following:

  • Expose a zen_module argument to config-creation functions to enable users to specify __module__ for discoverability
  • Ensure that our auto-config capabilities produce pickle-compatible results
  • Write a How-To for serializing dynamically-generated configs

@rsokl
Copy link
Contributor

rsokl commented Nov 8, 2022

I was curious to see how dill fares. It doesn't preserve object identity

from dill import dumps, loads
from dataclasses import dataclass, asdict

from hydra_zen import builds

def foo(x) -> None:
    pass

A = builds(foo, populate_full_signature=True)

@dataclass
class B:
    a: A


a = A(1)
before = B(a)
save = dumps(before)
after = loads(save)

assert type(before.a) is not type(after.a)   # note the "not"!

assert asdict(before) == asdict(after)
assert asdict(after.a) == asdict(before.a)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants