Refactor repo with more complete models and documentation #70

darsnack · 2020-08-28T15:55:06Z

~~WIP: DO NOT MERGE!~~

This is a major refactor to Metalhead.jl with the following changes:

Datasets are removed
Classification/prediction logic is removed
Updated VGG, GoogLeNet, ResNet to code from FluxModels.jl
Updated DenseNet, SqueezeNet (post-FluxModels.jl)
Added AlexNet, Inception v3

Overall, the repository will now operate as the de-facto source of vision models for the Flux ecosystem (similar to torchvision.models). The codebase has been slimmed down so that we can provide users with a lightweight dependency to get standard pre-trained models. The models are also updated to use the latest layers in Flux like SkipConnection, Parallel, and adaptive pooling.

Most importantly, the model is code is now more flexible. For example, the ResNet code is generic enough that users can extend it to create variants of ResNet that are not in the original paper. This is useful for research where varying model architecture parameters is important (e.g. double descent). We also provide all the ResNet variants from the paper, not just ResNet-50 like most other ecosystems. The same can be said for other models like VGG, etc.

Future work:

This PR will supersede #69. cc @DhairyaLGandhi

darsnack · 2020-08-28T15:56:11Z

Currently tests are failing without FluxML/Flux.jl#1305

dmolina · 2020-08-28T17:02:13Z

Nice, great work!. However, because there is missing functionallity from previous version, I think it could be better to update it in another branch, and when the missing functionality (specially the pre-trained weights and missing models) merge it with master. It is a topic of style, but I way it to avoid problems to people that could be using #master version. Don't think I don't believe that MetalHead should be changed, I'm convinced that it should be clearly improved.But we must avoid temporarily generating problems to users.

darsnack · 2020-08-28T17:17:51Z

I guess I can't mark a PR as a draft after it's already been created? But I added a disclaimer at the top of the PR. I agree, we shouldn't merge this into master until it restores all the lost functionality.

DhairyaLGandhi · 2020-09-13T21:17:22Z

(There should be a little text field under the reviewers area marked "convert to draft")

DhairyaLGandhi · 2020-09-28T05:37:26Z

Why is this removing densenet and squeezenet?

darsnack · 2020-09-28T11:00:51Z

It isn't. I just haven't updated the PR to have them which is why I marked it as WIP.

darsnack

@rishabhvarshney14 looks good. I added a couple of comments to use utility functions instead of repeating code.

Are you interested in working on the other remaining models (SqueezeNet and DenseNet)?

src/inception.jl

src/resnet.jl

src/vgg.jl

dmolina · 2020-11-23T17:19:51Z

A comment about the commit about inception, thank you for that, sorry but I do not know how to comment the commit, I think the test has a small error, because in the test the model is googlenet, not inception.

darsnack · 2020-11-23T17:24:15Z

Good catch! @rishabhvarshney14 can you address the test error too?

rishabhvarshney14 · 2020-11-23T17:41:45Z

@dmolina Thanks for noticing I have fixed that.
@darsnack Sure I can work on other models too and also make the required changes Thank You.

darsnack · 2020-11-24T22:25:37Z

Awesome, thank you!

darsnack

Thank you! It looks great. After these changes, I'll spin some cycles on my GPU to add pre-trained weights.

src/densenet.jl

src/vgg.jl

darsnack

Looks good, thanks. I'll add the pre-trained weights, and then we can merge this PR!

cyborg1995 · 2020-12-14T19:51:09Z

@darsnack, referencing this comment, do you mean to train the models from scratch on Imagenet dataset myself or writing the code for training?

Since it is such a massive dataset, It would be difficult for me to train it myself but I can surely write the code which can be used by you/another contributor with the appropriate resources to train the model.

darsnack · 2020-12-14T21:51:35Z

Yeah even the code would help. Can you run for a few epochs? If that's feasible, then just doing that and posting a gist of the script that trains the models will allow me to just run that script on GPU machine. Then we'd have the trained weights in short order.

cyborg1995 · 2020-12-15T12:37:07Z

@darsnack, I have created the gist for training alexnet using the architecture here but haven't tried running it though.
Please try to train it on your end and let me know if it works (or not!)
Currently, checkpointing the model weights is not implemented.
The weights are saved only after the model has completed training.

darsnack · 2020-12-15T14:30:02Z

Thanks @cyborg1995, I'll give it a go tonight and report back.

rishabhvarshney14 · 2020-12-16T09:31:17Z

@darsnack @cyborg1995 Sorry if I misunderstand this but in Transformers.jl it uses pretrained weights of Hugging Face models to provide pretrained models in Julia/Flux. It might be better if we use pretrained weights of other models and use it here instead of training it from the start.

darsnack · 2020-12-16T13:26:27Z

@rishabhvarshney14 Unfortunately, that isn't a simple translation since a Flux model and a PyTorch model aren't the same structure.

I think it's an easier "once and done" to just train from scratch. It's just a question of someone with GPUs (e.g. me) to have the bandwidth to do it.

darsnack · 2020-12-16T14:07:47Z

Actually I take that back. There would be a lot of parsing work involved to map the pre-trained weights from PyTorch to the corresponding layers in the Flux models. If someone wants to attempt to tackle that, then that would be great. Otherwise, I'll have to find some time to train these models later this weekend.

cyborg1995 · 2020-12-17T01:53:47Z

@darsnack, isn't it possible to convert the pre-trained ONNX models to Flux models using ONNX.jl

darsnack · 2020-12-17T02:02:50Z

Not quite. It's easy to read the weights in with ONNX.jl, but since ONNX serializes to more primitive operations, you need some translation code to figure out which higher level Flux layer contains the convolution etc. that you read from ONNX.

cyborg1995 · 2020-12-17T02:25:44Z

I think doing it from ONNX would be easier. I'll try to figure it out.

darsnack · 2020-12-17T12:02:25Z

@cyborg1995 You might look at ONNXmutable.jl which is more up to date/maintained. If you do figure out some translation code that works, then do you mind sharing it in this issue? Having a good translation flow from ONNX to Flux is one of our goals.

DhairyaLGandhi · 2021-07-29T14:17:55Z

I think I'd prefer to have the backbone and classifier separated since there are many models that drop the classifier to use resnet as a backbone, so this should make it easier.

Any reason why we couldn't achieve the same results of "indexing" using structs. It seems much more Julian.

darsnack · 2021-07-29T14:28:03Z

It would still be easy to drop the classifier. model = Chain(resnet.backbone, myclassifier) vs model = Chain(resnet[1], myclassifier).

I would argue the struct approach is not Julian and more Python. Defining tons of little classes or wrapper classes is very PyTorch-like. You achieve the same "indexing" result, but if all you need is named indexing, then I don't see why you wouldn't use a NamedTuple-like solution. If you introduce a new struct type just for naming, it comes at a higher cost. You have boilerplate to @functor the type, and you need to define the forward pass. In the current approach we need to do neither. We take advantage of the higher level layers available in Flux to avoid doing any of that boiler plate.

More importantly, going with the struct solution means we only solve the naming problem for these models in this repo. It isn't a flexible solution that helps anyone else. I'd say that on top of anything is the best reason to solve this more generically.

DhairyaLGandhi · 2021-07-29T16:12:22Z

Well if we see Chains everywhere anyway, its harder to tell what exactly is happening in the forwards pass, and named tuples aren't great for when you have large objects inside them. structs are good for containerisation and labelling, printing etc. It makes it easier to track how something is implemented too. I agree we should have a nicer solution to avoid the boilerplate, which is where the functional aspect is better. Not a fan of boilerplate here :)

DhairyaLGandhi · 2021-07-29T16:14:03Z

Its good that Functors can already handle most of the plumbing we would want from these models

DhairyaLGandhi · 2021-07-29T16:19:21Z

We might need to ease up on the batch size for CI. Could we make it 2 or something?

darsnack · 2021-07-29T16:25:39Z

Is there a performance degradation when using NamedTuples vs. Tuples for large objects? Chain already uses Tuples. TBH the biggest difference w.r.t. knowing what the forward pass does has been @mcabbott's "big show" method. Without that, it wouldn't really matter what the internal representation was cause it would just print like a big blob.

I'm still a bit unconvinced about this issue. Can we solve this issue in another PR to Metalhead or Flux (depending on which solution we select)? Right now, the outermost object is a struct. It's possible to swap out parts of the model easily like I described above. We can always enrich the repo later instead of bikeshedding on this issue now.

DhairyaLGandhi · 2021-07-29T16:37:36Z

Wouldn't you want a separate show method for different containers? Looking at Chains is the bottleneck to me. I'd love to see this in once CI is happy.

We should try to bring in bindings for datasets etc back and have docs using FluxML - we can do that later.

DhairyaLGandhi

It may be too much for GA to handle large models on CI - we'll set up a buildkite but in the meanwhile.

test/runtests.jl

DhairyaLGandhi · 2021-07-29T17:22:50Z

Thanks all! I'll let this be on master for a couple days before releasing.

lorenzoh · 2021-07-29T23:34:26Z

Great! I'll update the install instructions in FastAI.jl as soon as a new release is tagged.

ablaom · 2021-08-24T03:03:29Z

Any chance we could a new release tagged? This was merged quite a while ago.

DhairyaLGandhi · 2021-08-24T17:13:57Z

see FluxML/FluxBench.jl#4

I also want to know the status of the training pipeline for pretrained weights as a todo. I plan on releasing this this week or next.

DhairyaLGandhi · 2021-08-24T18:34:56Z

We also need to get rid of some unnecessary typing for example

Metalhead.jl/src/squeezenet.jl

Line 60 in 3f458cd

struct SqueezeNet{T}

and the like, since that leads to an extended compile time without offering runtime benefits.

darsnack · 2021-08-24T19:16:51Z

So would you want to just leave them untyped?

DhairyaLGandhi · 2021-08-24T22:01:09Z

I think it would be fine to have some kind of helpful hints that are useful for dispatch, but avoid a massive type when its not necessary for inference. Currently, this would yield a massive type, which slows down the compiler and type inference significantly. Since the layers themselves are typed already, we get the inference either way. In fact, we may want to consider dropping the type param from Chain and checking the impact of that on small models. Both latency and runtime wise.

ToucheSir · 2021-08-24T23:49:09Z

Can't find my earlier comment on this, but could we define type aliases for certain basic blocks in order to help with display? AIUI that was a big reason for having dedicated types in the first place, right?

ablaom · 2021-09-28T02:11:55Z

Are we waiting for some more work before tagging a new release?

DhairyaLGandhi · 2021-09-28T05:08:13Z

Yeah, there's a few cleanups that we need before a full release. I'll get a pr in a couple days.

darsnack changed the title ~~Refactor with vision models from FluxModels.jl~~ WIP: Refactor with vision models from FluxModels.jl Aug 28, 2020

darsnack marked this pull request as draft September 21, 2020 15:49

This was referenced Oct 26, 2020

Add vision models FluxML/FluxML-Community-Call-Minutes#2

Closed

Refresh Model Zoo FluxML/FluxML-Community-Call-Minutes#9

Open

This was referenced Nov 15, 2020

ONNX import/export FluxML/FluxML-Community-Call-Minutes#10

Closed

added untrained model Inception v3 darsnack/FluxModels.jl#19

Closed

darsnack commented Nov 23, 2020

View reviewed changes

src/inception.jl Outdated Show resolved Hide resolved

src/resnet.jl Outdated Show resolved Hide resolved

src/resnet.jl Outdated Show resolved Hide resolved

src/resnet.jl Outdated Show resolved Hide resolved

src/vgg.jl Outdated Show resolved Hide resolved

darsnack commented Nov 26, 2020

View reviewed changes

src/densenet.jl Outdated Show resolved Hide resolved

src/densenet.jl Outdated Show resolved Hide resolved

src/densenet.jl Show resolved Hide resolved

src/densenet.jl Outdated Show resolved Hide resolved

src/vgg.jl Outdated Show resolved Hide resolved

darsnack commented Nov 26, 2020

View reviewed changes

darsnack added 2 commits July 29, 2021 09:34

Add gradient tests

37538af

Break classifier and backbone into two sub-Chains

a587b3d

DhairyaLGandhi closed this Jul 29, 2021

DhairyaLGandhi reopened this Jul 29, 2021

Reduce BS

f9e7be5

DhairyaLGandhi reviewed Jul 29, 2021

View reviewed changes

skip backwards pass tests

07925ea

ToucheSir mentioned this pull request Jul 29, 2021

Support NamedTuples for Container Layers FluxML/Flux.jl#1680

Closed

DhairyaLGandhi merged commit 88b7ca6 into FluxML:master Jul 29, 2021

lorenzoh mentioned this pull request Sep 3, 2022

fastai parity FluxML/FastAI.jl#259

Open

37 tasks

darsnack deleted the darsnack/vision-refactor branch October 7, 2021 15:25

ToucheSir mentioned this pull request Oct 13, 2021

flatten layers appear as closures #94

Closed

darsnack mentioned this pull request Feb 10, 2022

Very slow first-time gradient calculation FluxML/Zygote.jl#1119

Open

Refactor repo with more complete models and documentation #70

Refactor repo with more complete models and documentation #70

Conversation

darsnack commented Aug 28, 2020 • edited by DhairyaLGandhi Loading

darsnack commented Aug 28, 2020

dmolina commented Aug 28, 2020

darsnack commented Aug 28, 2020

DhairyaLGandhi commented Sep 13, 2020

DhairyaLGandhi commented Sep 28, 2020

darsnack commented Sep 28, 2020

darsnack left a comment

Choose a reason for hiding this comment

dmolina commented Nov 23, 2020

darsnack commented Nov 23, 2020

rishabhvarshney14 commented Nov 23, 2020

darsnack commented Nov 24, 2020

darsnack left a comment

Choose a reason for hiding this comment

darsnack left a comment

Choose a reason for hiding this comment

cyborg1995 commented Dec 14, 2020

darsnack commented Dec 14, 2020

cyborg1995 commented Dec 15, 2020

darsnack commented Dec 15, 2020

rishabhvarshney14 commented Dec 16, 2020 • edited Loading

darsnack commented Dec 16, 2020

darsnack commented Dec 16, 2020

cyborg1995 commented Dec 17, 2020

darsnack commented Dec 17, 2020

cyborg1995 commented Dec 17, 2020

darsnack commented Dec 17, 2020

DhairyaLGandhi commented Jul 29, 2021

darsnack commented Jul 29, 2021

DhairyaLGandhi commented Jul 29, 2021 • edited Loading

DhairyaLGandhi commented Jul 29, 2021

DhairyaLGandhi commented Jul 29, 2021

darsnack commented Jul 29, 2021

DhairyaLGandhi commented Jul 29, 2021

DhairyaLGandhi left a comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Jul 29, 2021

lorenzoh commented Jul 29, 2021

ablaom commented Aug 24, 2021

DhairyaLGandhi commented Aug 24, 2021 • edited Loading

DhairyaLGandhi commented Aug 24, 2021

darsnack commented Aug 24, 2021

DhairyaLGandhi commented Aug 24, 2021

ToucheSir commented Aug 24, 2021

ablaom commented Sep 28, 2021

DhairyaLGandhi commented Sep 28, 2021

darsnack commented Aug 28, 2020 •

edited by DhairyaLGandhi

Loading

rishabhvarshney14 commented Dec 16, 2020 •

edited

Loading

DhairyaLGandhi commented Jul 29, 2021 •

edited

Loading

DhairyaLGandhi commented Aug 24, 2021 •

edited

Loading