-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] How to handle BC breaking changes on Model weights or hyper-parameters #2955
Comments
I think the future proof way of handing is the option 3. and having a factory function for versioning. I think we need at least two versioning for them, one for code (model version) and one for parameter (param version). Model version can be added when there is significant change on the code, which is BC-breaking for the previous model.
|
Here is an example of parameters we don't currently store in state and it's unclear whether they should be considered part of the code or part of the params:
It's worth discussing whether it makes sense to store these inside the state of the module. |
On a general note I also agree with option 3: Since the weights affect the behavior of the model, you could, for the purposes of this discussion, think of them as code. With Python code we use versioning (git), so why not also with model weights? From that perspective we should rigorously version every model (whether a change is planned or not) and tag it with the version of the code that was used to generate it. The Dropout probability parameter is an interesting one, since it doesn't affect inference, but will affect fine-tuning. We should make a decision as to the level of BC-compatability we provide. Also, on another note, this affects the entirety of PyTorch domains and also projects such as torch serve. From a technical perspective, a low-tech way of associating model weights to versions is by using md5 or such. We can further encode that map into the link we use to store the model. |
🚀 Feature
In order to fix bugs we are sometimes forced to introduce BC breaking changes. While the process of such introductions is clear when it comes to code changes, it's not when it comes to model weights or hyper-parameters. Thus we should define when, why and how to introduce BC-breaking changes when it comes to model weights or model hyper-parameters.
Motivation
We have recently bumped to a few issues that motivate this. Here are a few examples:
Approaches
There are quite a few different approaches for this:
It's worth discussing whether we want to adapt our approach depending on the characteristics of the problem or if we want to go with one approach for all cases. Moreover it's worth investigating whether we need to handle differently changes on weights vs changes on hyper-parameters used on inference.
cc @fmassa @cpuhrsch @vfdev-5 @mthrok
The text was updated successfully, but these errors were encountered: