Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor repo with more complete models and documentation #70

Merged
merged 54 commits into from
Jul 29, 2021

Conversation

darsnack
Copy link
Member

@darsnack darsnack commented Aug 28, 2020

WIP: DO NOT MERGE!

This is a major refactor to Metalhead.jl with the following changes:

  • Datasets are removed
  • Classification/prediction logic is removed
  • Updated VGG, GoogLeNet, ResNet to code from FluxModels.jl
  • Updated DenseNet, SqueezeNet (post-FluxModels.jl)
  • Added AlexNet, Inception v3

Overall, the repository will now operate as the de-facto source of vision models for the Flux ecosystem (similar to torchvision.models). The codebase has been slimmed down so that we can provide users with a lightweight dependency to get standard pre-trained models. The models are also updated to use the latest layers in Flux like SkipConnection, Parallel, and adaptive pooling.

Most importantly, the model is code is now more flexible. For example, the ResNet code is generic enough that users can extend it to create variants of ResNet that are not in the original paper. This is useful for research where varying model architecture parameters is important (e.g. double descent). We also provide all the ResNet variants from the paper, not just ResNet-50 like most other ecosystems. The same can be said for other models like VGG, etc.

Future work:

  • Add pre-trained weights
    • alexnet (future PR)
    • VGG
      • vgg11 (future PR)
      • vgg11bn (future PR)
      • vgg13 (future PR)
      • vgg13bn (future PR)
      • vgg16 (future PR)
      • vgg16bn (future PR)
      • vgg19
      • vgg19bn (future PR)
    • ResNet
      • resnet18 (future PR)
      • resnet34 (future PR)
      • resnet50
      • resnet101 (future PR)
      • resnet152 (future PR)
    • googlenet
    • inceptionnet
    • squeezenet
    • DenseNet
      • densenet_121
      • densenet_161 (future PR)
      • densenet_169 (future PR)
      • densenet_201 (future PR)
  • Add documentation
  • Tests

This PR will supersede #69. cc @DhairyaLGandhi

@darsnack
Copy link
Member Author

Currently tests are failing without FluxML/Flux.jl#1305

@dmolina
Copy link

dmolina commented Aug 28, 2020

Nice, great work!. However, because there is missing functionallity from previous version, I think it could be better to update it in another branch, and when the missing functionality (specially the pre-trained weights and missing models) merge it with master. It is a topic of style, but I way it to avoid problems to people that could be using #master version. Don't think I don't believe that MetalHead should be changed, I'm convinced that it should be clearly improved.But we must avoid temporarily generating problems to users.

@darsnack
Copy link
Member Author

I guess I can't mark a PR as a draft after it's already been created? But I added a disclaimer at the top of the PR. I agree, we shouldn't merge this into master until it restores all the lost functionality.

@darsnack darsnack changed the title Refactor with vision models from FluxModels.jl WIP: Refactor with vision models from FluxModels.jl Aug 28, 2020
@DhairyaLGandhi
Copy link
Member

(There should be a little text field under the reviewers area marked "convert to draft")

@darsnack darsnack marked this pull request as draft September 21, 2020 15:49
@DhairyaLGandhi
Copy link
Member

Why is this removing densenet and squeezenet?

@darsnack
Copy link
Member Author

It isn't. I just haven't updated the PR to have them which is why I marked it as WIP.

Copy link
Member Author

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabhvarshney14 looks good. I added a couple of comments to use utility functions instead of repeating code.

Are you interested in working on the other remaining models (SqueezeNet and DenseNet)?

src/inception.jl Outdated Show resolved Hide resolved
src/resnet.jl Outdated Show resolved Hide resolved
src/resnet.jl Outdated Show resolved Hide resolved
src/resnet.jl Outdated Show resolved Hide resolved
src/vgg.jl Outdated Show resolved Hide resolved
@dmolina
Copy link

dmolina commented Nov 23, 2020

A comment about the commit about inception, thank you for that, sorry but I do not know how to comment the commit, I think the test has a small error, because in the test the model is googlenet, not inception.

@darsnack
Copy link
Member Author

Good catch! @rishabhvarshney14 can you address the test error too?

@rishabhvarshney14
Copy link
Contributor

@dmolina Thanks for noticing I have fixed that.
@darsnack Sure I can work on other models too and also make the required changes Thank You.

@darsnack
Copy link
Member Author

Awesome, thank you!

Copy link
Member Author

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It looks great. After these changes, I'll spin some cycles on my GPU to add pre-trained weights.

src/densenet.jl Outdated Show resolved Hide resolved
src/densenet.jl Outdated Show resolved Hide resolved
src/densenet.jl Show resolved Hide resolved
src/densenet.jl Outdated Show resolved Hide resolved
src/vgg.jl Outdated Show resolved Hide resolved
Copy link
Member Author

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks. I'll add the pre-trained weights, and then we can merge this PR!

@cyborg1995
Copy link
Contributor

@darsnack, referencing this comment, do you mean to train the models from scratch on Imagenet dataset myself or writing the code for training?

Since it is such a massive dataset, It would be difficult for me to train it myself but I can surely write the code which can be used by you/another contributor with the appropriate resources to train the model.

@darsnack
Copy link
Member Author

Yeah even the code would help. Can you run for a few epochs? If that's feasible, then just doing that and posting a gist of the script that trains the models will allow me to just run that script on GPU machine. Then we'd have the trained weights in short order.

@cyborg1995
Copy link
Contributor

@darsnack, I have created the gist for training alexnet using the architecture here but haven't tried running it though.
Please try to train it on your end and let me know if it works (or not!)
Currently, checkpointing the model weights is not implemented.
The weights are saved only after the model has completed training.

@darsnack
Copy link
Member Author

Thanks @cyborg1995, I'll give it a go tonight and report back.

@rishabhvarshney14
Copy link
Contributor

rishabhvarshney14 commented Dec 16, 2020

@darsnack @cyborg1995 Sorry if I misunderstand this but in Transformers.jl it uses pretrained weights of Hugging Face models to provide pretrained models in Julia/Flux. It might be better if we use pretrained weights of other models and use it here instead of training it from the start.

@darsnack
Copy link
Member Author

@rishabhvarshney14 Unfortunately, that isn't a simple translation since a Flux model and a PyTorch model aren't the same structure.

I think it's an easier "once and done" to just train from scratch. It's just a question of someone with GPUs (e.g. me) to have the bandwidth to do it.

@darsnack
Copy link
Member Author

Actually I take that back. There would be a lot of parsing work involved to map the pre-trained weights from PyTorch to the corresponding layers in the Flux models. If someone wants to attempt to tackle that, then that would be great. Otherwise, I'll have to find some time to train these models later this weekend.

@cyborg1995
Copy link
Contributor

@darsnack, isn't it possible to convert the pre-trained ONNX models to Flux models using ONNX.jl

@darsnack
Copy link
Member Author

Not quite. It's easy to read the weights in with ONNX.jl, but since ONNX serializes to more primitive operations, you need some translation code to figure out which higher level Flux layer contains the convolution etc. that you read from ONNX.

@cyborg1995
Copy link
Contributor

I think doing it from ONNX would be easier. I'll try to figure it out.

@darsnack
Copy link
Member Author

@cyborg1995 You might look at ONNXmutable.jl which is more up to date/maintained. If you do figure out some translation code that works, then do you mind sharing it in this issue? Having a good translation flow from ONNX to Flux is one of our goals.

@DhairyaLGandhi
Copy link
Member

I think I'd prefer to have the backbone and classifier separated since there are many models that drop the classifier to use resnet as a backbone, so this should make it easier.

Any reason why we couldn't achieve the same results of "indexing" using structs. It seems much more Julian.

@darsnack
Copy link
Member Author

It would still be easy to drop the classifier. model = Chain(resnet.backbone, myclassifier) vs model = Chain(resnet[1], myclassifier).

I would argue the struct approach is not Julian and more Python. Defining tons of little classes or wrapper classes is very PyTorch-like. You achieve the same "indexing" result, but if all you need is named indexing, then I don't see why you wouldn't use a NamedTuple-like solution. If you introduce a new struct type just for naming, it comes at a higher cost. You have boilerplate to @functor the type, and you need to define the forward pass. In the current approach we need to do neither. We take advantage of the higher level layers available in Flux to avoid doing any of that boiler plate.

More importantly, going with the struct solution means we only solve the naming problem for these models in this repo. It isn't a flexible solution that helps anyone else. I'd say that on top of anything is the best reason to solve this more generically.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Jul 29, 2021

Well if we see Chains everywhere anyway, its harder to tell what exactly is happening in the forwards pass, and named tuples aren't great for when you have large objects inside them. structs are good for containerisation and labelling, printing etc. It makes it easier to track how something is implemented too. I agree we should have a nicer solution to avoid the boilerplate, which is where the functional aspect is better. Not a fan of boilerplate here :)

@DhairyaLGandhi
Copy link
Member

Its good that Functors can already handle most of the plumbing we would want from these models

@DhairyaLGandhi
Copy link
Member

We might need to ease up on the batch size for CI. Could we make it 2 or something?

@darsnack
Copy link
Member Author

Is there a performance degradation when using NamedTuples vs. Tuples for large objects? Chain already uses Tuples. TBH the biggest difference w.r.t. knowing what the forward pass does has been @mcabbott's "big show" method. Without that, it wouldn't really matter what the internal representation was cause it would just print like a big blob.

I'm still a bit unconvinced about this issue. Can we solve this issue in another PR to Metalhead or Flux (depending on which solution we select)? Right now, the outermost object is a struct. It's possible to swap out parts of the model easily like I described above. We can always enrich the repo later instead of bikeshedding on this issue now.

@DhairyaLGandhi
Copy link
Member

Wouldn't you want a separate show method for different containers? Looking at Chains is the bottleneck to me. I'd love to see this in once CI is happy.

We should try to bring in bindings for datasets etc back and have docs using FluxML - we can do that later.

Copy link
Member

@DhairyaLGandhi DhairyaLGandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be too much for GA to handle large models on CI - we'll set up a buildkite but in the meanwhile.

test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
@DhairyaLGandhi
Copy link
Member

Thanks all! I'll let this be on master for a couple days before releasing.

@DhairyaLGandhi DhairyaLGandhi merged commit 88b7ca6 into FluxML:master Jul 29, 2021
@lorenzoh
Copy link
Member

Great! I'll update the install instructions in FastAI.jl as soon as a new release is tagged.

@lorenzoh lorenzoh mentioned this pull request Sep 3, 2022
37 tasks
@ablaom
Copy link

ablaom commented Aug 24, 2021

Any chance we could a new release tagged? This was merged quite a while ago.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Aug 24, 2021

see FluxML/FluxBench.jl#4

I also want to know the status of the training pipeline for pretrained weights as a todo. I plan on releasing this this week or next.

@DhairyaLGandhi
Copy link
Member

We also need to get rid of some unnecessary typing for example

struct SqueezeNet{T}
and the like, since that leads to an extended compile time without offering runtime benefits.

@darsnack
Copy link
Member Author

So would you want to just leave them untyped?

@DhairyaLGandhi
Copy link
Member

I think it would be fine to have some kind of helpful hints that are useful for dispatch, but avoid a massive type when its not necessary for inference. Currently, this would yield a massive type, which slows down the compiler and type inference significantly. Since the layers themselves are typed already, we get the inference either way. In fact, we may want to consider dropping the type param from Chain and checking the impact of that on small models. Both latency and runtime wise.

@ToucheSir
Copy link
Member

Can't find my earlier comment on this, but could we define type aliases for certain basic blocks in order to help with display? AIUI that was a big reason for having dedicated types in the first place, right?

@ablaom
Copy link

ablaom commented Sep 28, 2021

Are we waiting for some more work before tagging a new release?

@DhairyaLGandhi
Copy link
Member

Yeah, there's a few cleanups that we need before a full release. I'll get a pr in a couple days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants