Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypernetwork - Variable Dropout Structure #4288

Closed
wants to merge 11 commits into from

Conversation

aria1th
Copy link
Collaborator

@aria1th aria1th commented Nov 4, 2022

Note : this pr continues from here

We were using fixed 0.3 constant possibility, it was just an example, which should be converted to variable.

This patch includes supports for previous HNs made with dropouts or without dropouts.

Someone might argue that dropout is not meaningful - but no, I have one working example made from dropout.

prompt information -
1girl, golden hair, masterpiece, looking at viewer, school uniform
Steps: 34, Sampler: Euler a, CFG scale: 6.5, Seed: 2272754403, Size: 512x512, Model hash: 925997e9
image

This example does not mean whether dropout is good or bad way to do it.

UI change

image

Users will be able to input dropout probabilities if they want.

If the dropout structure is 'empty' or use dropout option is False dropout won't be applied.

First and last sequence must be zero.

All numbers should be between [0, 1).

Dropout structure length should match with layer structure length.

This uses shared options, which can be changed async.
HN Release should be done with this option OFF, unless they're planning to allow others to continue training from it.
Tested with my own All HNs.
@enn-nafnlaus
Copy link

enn-nafnlaus commented Nov 4, 2022

  • I notice you have dropout still after activation and normalization. Shouldn't it be before activation?

  • I haven't dug enough into your code. Are you going to suggest reasonable dropout values? I would STRONGLY recommend either suggesting reasonable values given their chosen layer structure, or simply hiding the values from the user and picking reasonable ones behind the scenes. Otherwise, I guarantee you people are going to pick bad ones endlessly and be frustrated by it.

@aria1th
Copy link
Collaborator Author

aria1th commented Nov 4, 2022

@enn-nafnlaus Typically dropouts are applied after activation, But LayerNorm (or norms) does not matter, actually in practice, its being used in both way.

I'd say that both are practical. Actually we might want very general way to handle ANY HN structures, not like current way. I'm still waiting for brilliant torch.load by someone...

In Practice for classification tasks, recommend values are between 0.2 to 0.5.

But for RL or Transfer learning, its variously used, casually between 0 to 0.35

If you have large dataset (possibly contains general images too), dropout ratio does not have to be bigger. It is meant to be used for smaller dataset to avoid overfitting.

Personally I use small values when its close to input, and bigger values at output. for example, [0, 0.1, 0.2, 0]

Thus you try to drop small amount of actual inputs, but try to drop big amount of hidden layer connections.

@enn-nafnlaus
Copy link

enn-nafnlaus commented Nov 4, 2022

As per this:

#2670 (reply in thread)

It seems we're applying far, far too large dropouts because of the small size of our hypernetworks. That the smaller the network, the lower the dropout should be. A 5 layer network might ideally use, say, a mere 1% dropout.

Again, I think it's very important that we set reasonable defaults, and that these at least be suggested to the user, if not outright chosen behind the scenes.

@aria1th
Copy link
Collaborator Author

aria1th commented Nov 4, 2022

Dropout ratio highly depends on representation of our data. If we assume if we have shallowly- decomposed latent space, dropout ratio should be bigger. If we assume very critical - or sparsely decomposed latent space, we need smaller dropout ratio.

The problem is - do we have those appropriate ratio? well, no.... we only rely on practices. . But note that we should have low dropout rate if its closer to input.

The only certain thing I could say is, to not use too high dropout ratio like 0.5.

But I'll suggest structure as default for [0, 0.05, 0.15, 0], which matches with small for input, big for hidden layer.

@aria1th aria1th changed the title Hypernetwork - Variable Dropout Structure [Draft]Hypernetwork - Variable Dropout Structure Nov 5, 2022
@aria1th aria1th marked this pull request as draft November 5, 2022 08:18
@aria1th aria1th marked this pull request as ready for review November 5, 2022 16:06
@aria1th aria1th changed the title [Draft]Hypernetwork - Variable Dropout Structure Hypernetwork - Variable Dropout Structure Nov 5, 2022
@enn-nafnlaus
Copy link

So, I'm trying this out now. Thanks so much for adding this. :) That said, the implementation is rather weird. Basically, we have to lead with a dummy zero that doesn't actually mean anything, to account for the fact that there's one fewer dropout sites than number of layers? Why not just omit it altogether and have the number of dropout sites be 1 less than the number of layers, as it actually is?

I had to open up the code to figure out what was going on.

I'd advise having the dropout specification be what's actually used (no leading dummy zero, that will just confuse people, as it did me), and have better documentation in the UI.

Anyway, though, thanks a bunch for adding this! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants