Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize data classes based on their fields only #2697

Closed
wants to merge 1 commit into from

Conversation

esantorella
Copy link
Contributor

Summary:

Context:

I think the unit test is the easiest way to understand this!

There are several cases in Ax where an instance has an attribute that

  1. can't or shouldn't be serialized; this is usually subclasses of torch.nn.Module, including BoTorch models, but can be other large and complex objects,
  2. isn't typically passed at initialization, but rather constructed afterwards, and
  3. should be an attribute rather than a property because it's too expensive to construct more than once

These classes have thus required custom serialization logic so that such attributes are not serialized. Classes that have this issue include many benchmarking classes that work with surrogates or neural nets, such as SurrogateRunner, PyTorchCNNTorchvisionRunner, and PyTorchCNNTorchvisionBenchmarkProblem, as well as MBM classes.

A simpler solution is to use dataclasses, by

  • specifying features that satisfy (1)-(3) with InitVar, and, if they are needed immediately, constructing them in the __post_init__
  • only serializing fields; InitVars are not fields

This gives more flexibility in what we serialize without taking any away: If an attribute is constructed in the post-init and should be serialized, that is still supported by marking it as a field and not an InitVar. Attributes that are constructed in the init will be serialized, even if they are modified elsewhere.

Downside

It is not quite the normal usage to use an InitVar to define a persistent non-field attribute; instead InitVar is intended for something that is needed only for initializing fields. So the usage pattern outlined in the unit test causes Pyre errors, and someone who uses an InitVar that way might be surprised that it can't be recovered. However, it might not need to be, since the other fields would be recovered. Also, InitVar is a rarely used feature.

This PR:

  • Changes Ax's JSON serialization for dataclasses to exclude non-fields
  • Adds a unit test

Differential Revision: D61665461

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 22, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61665461

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.28%. Comparing base (9c2fa96) to head (2345564).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2697   +/-   ##
=======================================
  Coverage   95.28%   95.28%           
=======================================
  Files         493      493           
  Lines       47861    47876   +15     
=======================================
+ Hits        45605    45620   +15     
  Misses       2256     2256           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Summary:
Pull Request resolved: facebook#2697

# Context:

**I think the unit test is the easiest way to understand this!**

There are several cases in Ax where an instance has an attribute that
1) can't or shouldn't be serialized; this is usually subclasses of `torch.nn.Module`, including BoTorch models, but can be other large and complex objects,
2) isn't typically passed at initialization, but rather constructed afterwards, and
3) should be an attribute rather than a property because it's too expensive to construct more than once

These classes have thus required custom serialization logic so that such attributes are not serialized. Classes that have this issue include many benchmarking classes that work with surrogates or neural nets, such as `SurrogateRunner`, `PyTorchCNNTorchvisionRunner`, and `PyTorchCNNTorchvisionBenchmarkProblem`, as well as MBM classes.

A simpler solution is to use dataclasses, by
- specifying features that satisfy (1)-(3) with `InitVar`, and, if they are needed immediately, constructing them in the `__post_init__`
- only serializing fields; `InitVar`s are not fields

This gives more flexibility in what we serialize without taking any away: If an attribute is constructed in the post-init and *should* be serialized, that is still supported by marking it as a `field` and not an `InitVar`. Attributes that are constructed in the init will be serialized, even if they are modified elsewhere.

## Downside
It is not quite the normal usage to use an `InitVar` to define a persistent non-field attribute; instead `InitVar` is intended for something that is needed only for initializing fields. So the usage pattern outlined in the unit test causes Pyre errors, and someone who uses an `InitVar` that way might be surprised that it can't be recovered. However, it might not need to be, since the other fields would be recovered. Also, `InitVar` is a rarely used feature.

# This PR:

* Changes Ax's JSON serialization for dataclasses to exclude non-fields
* Adds a unit test

Reviewed By: danielcohenlive

Differential Revision: D61665461
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61665461

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 726bb84.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants