Hypernetwork - Variable Dropout Structure #4288

aria1th · 2022-11-04T17:35:15Z

Note : this pr continues from here

We were using fixed 0.3 constant possibility, it was just an example, which should be converted to variable.

This patch includes supports for previous HNs made with dropouts or without dropouts.

Someone might argue that dropout is not meaningful - but no, I have one working example made from dropout.

prompt information -
1girl, golden hair, masterpiece, looking at viewer, school uniform
Steps: 34, Sampler: Euler a, CFG scale: 6.5, Seed: 2272754403, Size: 512x512, Model hash: 925997e9

This example does not mean whether dropout is good or bad way to do it.

UI change

Users will be able to input dropout probabilities if they want.

If the dropout structure is 'empty' or use dropout option is False dropout won't be applied.

First and last sequence must be zero.

All numbers should be between [0, 1).

Dropout structure length should match with layer structure length.

This uses shared options, which can be changed async. HN Release should be done with this option OFF, unless they're planning to allow others to continue training from it.

Tested with my own All HNs.

enn-nafnlaus · 2022-11-04T18:02:06Z

I notice you have dropout still after activation and normalization. Shouldn't it be before activation?
I haven't dug enough into your code. Are you going to suggest reasonable dropout values? I would STRONGLY recommend either suggesting reasonable values given their chosen layer structure, or simply hiding the values from the user and picking reasonable ones behind the scenes. Otherwise, I guarantee you people are going to pick bad ones endlessly and be frustrated by it.

aria1th · 2022-11-04T18:24:09Z

@enn-nafnlaus Typically dropouts are applied after activation, But LayerNorm (or norms) does not matter, actually in practice, its being used in both way.

I'd say that both are practical. Actually we might want very general way to handle ANY HN structures, not like current way. I'm still waiting for brilliant torch.load by someone...

In Practice for classification tasks, recommend values are between 0.2 to 0.5.

But for RL or Transfer learning, its variously used, casually between 0 to 0.35

If you have large dataset (possibly contains general images too), dropout ratio does not have to be bigger. It is meant to be used for smaller dataset to avoid overfitting.

Personally I use small values when its close to input, and bigger values at output. for example, [0, 0.1, 0.2, 0]

Thus you try to drop small amount of actual inputs, but try to drop big amount of hidden layer connections.

enn-nafnlaus · 2022-11-04T18:35:08Z

As per this:

#2670 (reply in thread)

It seems we're applying far, far too large dropouts because of the small size of our hypernetworks. That the smaller the network, the lower the dropout should be. A 5 layer network might ideally use, say, a mere 1% dropout.

Again, I think it's very important that we set reasonable defaults, and that these at least be suggested to the user, if not outright chosen behind the scenes.

aria1th · 2022-11-04T18:47:21Z

Dropout ratio highly depends on representation of our data. If we assume if we have shallowly- decomposed latent space, dropout ratio should be bigger. If we assume very critical - or sparsely decomposed latent space, we need smaller dropout ratio.

The problem is - do we have those appropriate ratio? well, no.... we only rely on practices. . But note that we should have low dropout rate if its closer to input.

The only certain thing I could say is, to not use too high dropout ratio like 0.5.

But I'll suggest structure as default for [0, 0.05, 0.15, 0], which matches with small for input, big for hidden layer.

This reverts commit d60fa52, reversing changes made to 01ceaad.

enn-nafnlaus · 2022-11-05T17:28:02Z

So, I'm trying this out now. Thanks so much for adding this. :) That said, the implementation is rather weird. Basically, we have to lead with a dummy zero that doesn't actually mean anything, to account for the fact that there's one fewer dropout sites than number of layers? Why not just omit it altogether and have the number of dropout sites be 1 less than the number of layers, as it actually is?

I had to open up the code to figure out what was going on.

I'd advise having the dropout specification be what's actually used (no leading dummy zero, that will just confuse people, as it did me), and have better documentation in the UI.

Anyway, though, thanks a bunch for adding this! :)

aria1th added 6 commits October 29, 2022 16:36

add optimizer save option to shared.opts

2db513f

Save and load optimizer state dict.

17849f0

This uses shared options, which can be changed async. HN Release should be done with this option OFF, unless they're planning to allow others to continue training from it.

We have duplicate linear now

ae4e70f

sequential pipeline

aa5a16e

Rework dropout structure

0539381

Tested with my own All HNs.

Add to UI

1e31769

aria1th requested a review from AUTOMATIC1111 as a code owner November 4, 2022 17:35

aria1th added 4 commits November 5, 2022 03:52

Add recommendations for dropout

01ceaad

avoid merge conflicts

9de8ecf

Merge branch 'sequential-HN-work' into dropout-rework

d60fa52

Revert "Merge branch 'sequential-HN-work' into dropout-rework"

ad3af3a

This reverts commit d60fa52, reversing changes made to 01ceaad.

aria1th changed the title ~~Hypernetwork - Variable Dropout Structure~~ [Draft]Hypernetwork - Variable Dropout Structure Nov 5, 2022

aria1th marked this pull request as draft November 5, 2022 08:18

Merge branch 'AUTOMATIC1111:master' into dropout-rework

474a51e

aria1th marked this pull request as ready for review November 5, 2022 16:06

aria1th changed the title ~~[Draft]Hypernetwork - Variable Dropout Structure~~ Hypernetwork - Variable Dropout Structure Nov 5, 2022

aria1th mentioned this pull request Nov 10, 2022

Structured HN Processing and Variable Dropout #4549

Closed

aria1th closed this Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypernetwork - Variable Dropout Structure #4288

Hypernetwork - Variable Dropout Structure #4288

aria1th commented Nov 4, 2022

enn-nafnlaus commented Nov 4, 2022 •

edited

Loading

aria1th commented Nov 4, 2022

enn-nafnlaus commented Nov 4, 2022 •

edited

Loading

aria1th commented Nov 4, 2022 •

edited

Loading

enn-nafnlaus commented Nov 5, 2022

Hypernetwork - Variable Dropout Structure #4288

Hypernetwork - Variable Dropout Structure #4288

Conversation

aria1th commented Nov 4, 2022

UI change

enn-nafnlaus commented Nov 4, 2022 • edited Loading

aria1th commented Nov 4, 2022

enn-nafnlaus commented Nov 4, 2022 • edited Loading

aria1th commented Nov 4, 2022 • edited Loading

enn-nafnlaus commented Nov 5, 2022

enn-nafnlaus commented Nov 4, 2022 •

edited

Loading

enn-nafnlaus commented Nov 4, 2022 •

edited

Loading

aria1th commented Nov 4, 2022 •

edited

Loading