`nothing` does not correspond to updating the state with a zero gradient. #140

CarloLucibello · 2023-04-07T14:32:02Z

As mentioned in #137 (comment), when a nothing gradient is encountered the apply! rule is not called at all and the state is not updated. So these two calls

Optimisers.update!(st, x, nothing)
Optimisers.update!(st, x, zero(x))

give different results. In the same discussion @mcabbott said

I suspect this is more an accident than a design, but I'm not sure it's an awful one.
If you are doing ordinary AD and happen to get an array of zeros on some batch, probably you do want that to update the momenta etc.
But you won't get nothing just because of the data in that batch. Instead, you'll get it because you are e.g. doing transfer learning, or the generator & discriminator on even/odd steps, or something like that. You will get nothing not for one array, but for a whole part of the model. And it seems like you probably don't want to update the momenta for the part of the model not being trained, but instead just ignore them completely.

but i think these examples should correspond to the opt_tree only having part of the model or to using different trees for discriminator and generator.

So in this issue I argue we should treat nothing exactly as semantically equivalent to a zero gradient, and define another type e.g. NoUpdate to signal that the apply! rule should not be called at all (so no momentum updates etc...)

The text was updated successfully, but these errors were encountered:

CarloLucibello · 2024-11-06T10:04:13Z

After some extra thought, I convinced myself it is ok to have nothing signaling no update. Otherwise, we would have to materialize the zero gradient and possibly higher order derivatives, but this should be responsibility of the user.

We just need to document this.

CarloLucibello added this to the 0.4 milestone Nov 6, 2024

CarloLucibello closed this as not planned Won't fix, can't repro, duplicate, stale Nov 6, 2024

CarloLucibello mentioned this issue Nov 6, 2024

docs for nothing behavior and for walking a tree with keypath #191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`nothing` does not correspond to updating the state with a zero gradient. #140

`nothing` does not correspond to updating the state with a zero gradient. #140

CarloLucibello commented Apr 7, 2023 •

edited

Loading

CarloLucibello commented Nov 6, 2024

nothing does not correspond to updating the state with a zero gradient. #140

nothing does not correspond to updating the state with a zero gradient. #140

Comments

CarloLucibello commented Apr 7, 2023 • edited Loading

CarloLucibello commented Nov 6, 2024

`nothing` does not correspond to updating the state with a zero gradient. #140

`nothing` does not correspond to updating the state with a zero gradient. #140

CarloLucibello commented Apr 7, 2023 •

edited

Loading