Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make leela2onnx Conv-nodes compatible with onnx2pytorch #1924

Merged
merged 9 commits into from
Nov 14, 2023

Conversation

patrik-ha
Copy link
Contributor

Adds kernel_shape as a property to all generated Conv-nodes for generated ONNX-networks, to partially address #1736 . This was specifically done for the onnx2pytorch-package requiring. The ONNX-standard states that this field is optional, and if not provided it should be inferred just looking at the dimensionality of the weights. These changes make some networks generated by leela2onnx readable by onnx2pytorch.

I verified that it works with the network that comes bundled with the latest lc0-release (791556) at least. However, many of the networks fail since onnx2pytorch hasn't implemented layer-normalisation, (as was the case with many of the networks I tried (mostly from https://lczero.org/play/networks/bestnets/). I also considered making a PR over there implementing that, but it seems pretty dead.

I'm not sure if it is correct that this project should add stuff like this to accomodate such requirements in other projects. It is practical that it works with other software that people are likely to use, especially in this case since it's quite a small change (and follows an optional part of the standard). I'm a big advocate for making the barrier as low as possible for other people to tinker with lc0-networks (and for that it's nice that conversion between pytorch and ONNX works as well as it could), but probably not at the cost of more bloat. I haven't done any large contributions, so that would be up to someone else to judge, but I wouldn't lose any sleep if this wasn't merged.

@patrik-ha patrik-ha changed the title Make some leela2onnx-generated networks play nice with onnx2pytorch Make leela2onnx Conv-nodes compatible with onnx2pytorch Oct 10, 2023
@@ -135,12 +135,15 @@ std::string PopulateStdNodeFields(pblczero::NodeProto* node,
std::string OnnxBuilder::Conv(const std::string& name,
const std::string& input_name,
const OnnxConst& kernel_weights,
const OnnxConst& bias_weights, int pads) {
const OnnxConst& bias_weights,
int filters,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do a 'kernel_weights.GetDimensions().back()` instead of passing the number of filters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's better

@borg323
Copy link
Member

borg323 commented Oct 10, 2023

You can find an optional LayerNormalization replacement in borg323@bc18bec

@patrik-ha
Copy link
Contributor Author

Thanks for that. I confirmed that all the nets I tested can now be loaded correctly by onnx2pytorch using your alternative LayerNorm-definition.

Is it worth considering making this the default implementation of LayerNorm for the ONNX converter? It introduces more complexity and makes the ONNX graph more complex to look at I guess, but since it's just the same operation defined more explicitly it's nice that it just works without any extra hassle. But that might again come down to how much one wants to accomodate projects like onnx2pytorch. (a somewhat dead project, but still the first thing that most people run into when they want to convert leela-onnx nets to pytorch)

I pushed some changes making that implementation the default for LayerNorm to this branch, but that can be ignored if it shouldn't be used.

@borg323
Copy link
Member

borg323 commented Oct 11, 2023

The last commit went too far, removing the LayerNormalization() operator unconditionally. I think the best way is to add a --onnx2pytorch switch to leele2onnx to control this. And while you are there, are there any other converter options that could be useful to make available to the user?

@patrik-ha
Copy link
Contributor Author

Yeah I understand, I've reverted that and put it behind a switch instead. I've made the switch for the user explicitly for onnx2pytorch, but internally I think it's nicer that that option toggles whatever needs to be different for the converter to make networks that are compatible. (I.e. if the user sets onnx2pytorch, it internally enables alt_ln or whatever relevant behaviour that's needed. The main thing is that there shouldn't be an internal option just for onnx2pytorch/compatibility I mean)

Regarding other relevant things, I'm not completely sure, but I don't think so. I was thinking if it was possible to put behaviour solving #1735 behind a similar switch, but in that case it doesn't look viable anyway. It might be nice to add explicit switches to enable the alternate implementations of mish and layer-norm since they already have options ready to go internally, but I don't know if it's it's that relevant for most people. They could maybe be added as hidden to avoid cluttering --help.

@borg323
Copy link
Member

borg323 commented Oct 12, 2023

Do you think options to change the onnx opset or the ability to generate a fp16 modes (with either fp16 or fp32 input/outputs) would be useful?
Regarding the alternative implementations we already have, LayerNormalization is a special case since onnxruntime accepts it for every opset even though it was only added in opset 17, so a separate switch makes sense.
This is all unrelated to this PR, just asking here since you are using the onnx models generated by lc0.

@patrik-ha
Copy link
Contributor Author

I think it's already quite easy to convert fp32 leela-models to fp16/mixed precision in general. The few times I have needed to, the converter I used worked for every net I tried at least. (I used https://onnxruntime.ai/docs/performance/model-optimizations/float16.html) It's also supposedly quite good with how it converts certain nodes if they aren't supported in fp16 for example, meaning that the leela-onnx-converter doesn't have to maintain workarounds for it in the meantime. And it's still something that might have to be maintained in the future if it was an explicit feature.

I haven't run into any cases where I've been explicitly limited by the opset of my model (i.e. "I couldn't run my model because my device/software/whatever doesn't support this opset"), but I've just been using them on up-to-date systems where I can install whatever I need, so I'm not a good reference point on that. It also seems like a burden to have to support it. But there may be some use-cases I'm not seeing, and if it's already something that has been committed to being maintained I guess it could be useful to expose that as an option.

So I guess for both of them I would say that they would be nice to have, but that I think that they might not be worth maintaining as explicit features. (solely based on my own usage though)

@@ -44,6 +44,7 @@ struct WeightsToOnnxConverterOptions {
int batch_size = -1;
int opset = 17;
bool alt_mish = false;
bool alt_ln = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it so something slightly less cryptic? (alternative_layer_normalization or alt_layer_norm or at the very least add a comment). And I'd say the same with the command line option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from my original code that was only meant for tests...

Copy link
Contributor Author

@patrik-ha patrik-ha Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, fixed (should have considered that after using it)

@borg323 borg323 merged commit ed4e669 into LeelaChessZero:master Nov 14, 2023
PikaCat-OuO pushed a commit to official-pikafish/px0 that referenced this pull request Dec 18, 2023
…ro#1924)

* Accomodating onnx2pytorch for conv-blocks
* alternate onnx layernorm implementation with explicit fp32 casting
* Switch for supporting onnx2pytorch

Co-authored-by: borg323 <[email protected]>
(cherry picked from commit ed4e669)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants