Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for named nn.Sequential in pytorch parser #848

Merged
merged 10 commits into from
Aug 28, 2023

Conversation

JanFSchulte
Copy link
Contributor

Parsing of nn.Sequentials that are named members of a model class results in a naming convention for the tensors in the state_dict of the model different from what the parser expects, since it was so far tested only on unnamed nn.Sequentials. This PR catches this and adjusts the name of the tensors we are importing from the state_dict accordingly. A test is added to ensure that we keep parsing both cases successfully.

Type of change

For a new feature or function, please create an issue first to discuss it
with us before submitting a pull request.

Note: Please delete options that are not relevant.

  • Bug fix (non-breaking change that fixes an issue)

Tests

To reproduce, this will fail with this PR:


import torch.nn as nn

from hls4ml.converters import convert_from_pytorch_model
from hls4ml.utils.config import config_from_pytorch_model



#simple model with namend sequential
class SeqModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )   

    def forward(self, x):
        output = self.layer(x)
        return output

model = SeqModel()

config = config_from_pytorch_model(model)
output_dir = 'test_pytorch'

convert_from_pytorch_model(
        model, (None, 1, 5, 5), hls_config=config, output_dir=output_dir)

pytests have been added to verify that this keeps working.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmitrevs jmitrevs added this to the v0.8.0 milestone Aug 12, 2023
@jmitrevs jmitrevs added the please test Trigger testing by creating local PR branch label Aug 12, 2023
@JanFSchulte JanFSchulte mentioned this pull request Aug 16, 2023
7 tasks
if "layer." in key:
layerInKeyName = True

if '_' in layer_name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow the logic here, but I am not a pytorch expert. Can you double-check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look and this is a bit convoluted, but it works. The issue is that the layers inside a torch.nn.Sequential get named in the pattern nameOfSquential_n where n just numbers the layers inside the sequential. If the sequential does not have a name, which happens if you just go model = nn.Sequential(..), the layers will have names that just start with an underscore, which hls4ml doesn't like. So we add the prefix "layer" to those in the loop over the layers, which we have to remove when we go and load the tensors. The changes in this PR account for the fact that someone could create a named nn.Sequential just named layer, which then clashes with our previous assumption that if a layer name starts with layer_ it is because we added it by hand to get around the issue with layer names starting with just an _.

A further complication is that while torch.FX reports these structured layer names with underscores while in the state_dict a . is used, so we have to replace the _ with . here. The last complication is the case of layers that used multiple times in the same model. torch.FX reports them as different layers, adding an _n for the n-th time it is used to the layer name. But the tensors in the state dict will not have those modifications, so we also have to account for this.

Bit of a mess, but I have tested all these cases and this implementations catches all edge cases that I'm aware off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has now been solved significantly nicer by Vladimir :)

@JanFSchulte
Copy link
Contributor Author

pre-commit.ci autofix

@vloncar
Copy link
Contributor

vloncar commented Aug 25, 2023

This now includes the changes from #840. There'll be a follow-up PR with the remaining bits we need to parse sPHENIX tracking GNN.

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Aug 27, 2023
@jmitrevs jmitrevs merged commit 5cac79c into fastmachinelearning:main Aug 28, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants