`sparse.grad` only returns the gradient with respect to the first element of a PyTree #16582

al-jshen · 2023-06-28T21:12:56Z

Description

When applying sparse.grad from jax.experimental.sparse to a function which take in a Pytree as the first argument, only the gradient with respect to the first item in the Pytree is returned. This is both unexpected and inconsistent with the behaviour of jax.grad, which returns a gradient which has the same tree structure as the input.

Here is a small working example demonstrating this behaviour:

import jax
import jax.numpy as jnp
from jax.experimental import sparse

def foo1(wb, x, y): # first item is a tuple with W first and B second
    w, b = wb
    return ((w @ x + b) - y).sum()

def foo2(bw, x, y): # first item is a tuple with B first and W second
    b, w = bw
    return ((w @ x + b) - y).sum()

rng = jax.random.PRNGKey(0)
keys = jax.random.split(rng, 4)
w = jax.random.normal(keys[0], (3, 3))
b = jax.random.normal(keys[1], (3,))
x = jax.random.normal(keys[2], (3,))
y = jax.random.normal(keys[3], (3,))
"""

Here are the outputs for the different tests:

# normal jax.grad, W first and B second

jax.grad(foo1)((w, b), x, y)

(Array([[-0.47994015,  0.42577833,  0.765658  ],
        [-0.47994015,  0.42577833,  0.765658  ],
        [-0.47994015,  0.42577833,  0.765658  ]], dtype=float32),
 Array([1., 1., 1.], dtype=float32))

# ============================

# normal jax.grad, B first and W second

jax.grad(foo2)((b, w), x, y)

(Array([1., 1., 1.], dtype=float32),
 Array([[-0.47994015,  0.42577833,  0.765658  ],
        [-0.47994015,  0.42577833,  0.765658  ],
        [-0.47994015,  0.42577833,  0.765658  ]], dtype=float32))

# ============================

# sparse.grad, W first and B second. only the gradient with respect to W is returned!

sparse.grad(foo1)((w, b), x, y)

Array([[-0.47994015,  0.42577833,  0.765658  ],
       [-0.47994015,  0.42577833,  0.765658  ],
       [-0.47994015,  0.42577833,  0.765658  ]], dtype=float32)

# ============================

# sparse.grad, B first and W second. only the gradient with respect to B is returned!

sparse.grad(foo2)((b, w), x, y)

Array([1., 1., 1.], dtype=float32)

What jax/jaxlib version are you using?

jax v0.4.13

Which accelerator(s) are you using?

No response

Additional system info

No response

NVIDIA GPU info

No response

The text was updated successfully, but these errors were encountered:

Blair-Johnson · 2024-02-08T14:39:23Z

Have there been any updates on this bug or hints about where it may originate from?

jakevdp · 2024-02-08T15:06:30Z

Hey - sorry for being silent here. This is a bug in how sparse.grad is implemented. I don't think we have any plans to fix it at the moment: jax.experimental.sparse is experimental, and you should expect it to have some rough edges.

Blair-Johnson · 2024-02-12T21:42:05Z

It looks like this is bug in the logic for postprocessing gradients.
https://github.com/google/jax/blob/c1f234a95cb0932cd23ad63a9ddbe0a8d43333b7/jax/experimental/sparse/ad.py#L69-L71
Currently, if argnums indexes a single pytree, such as a dictionary of parameters, this triggers only the first of the computed gradients to be returned. The logic doesn't account for pytrees being unpacked into multiple arguments when flattened.

I opened a draft PR which passes the current sparse tests and should match the behavior of jax.grad() when argnums indexes a pytree. The current sparse testing lacks coverage of this pytree repacking behavior (clearly), so I think writing tests for that is the next step if this looks reasonable.

Blair-Johnson · 2024-03-27T20:29:15Z

Added some tests and took the PR out of draft.

al-jshen added the bug Something isn't working label Jun 28, 2023

jakevdp self-assigned this Jul 6, 2023

Blair-Johnson mentioned this issue Feb 12, 2024

Update sparse.grad() to support re-packing gradients into PyTrees #19760

Merged

copybara-service bot closed this as completed in #19760 Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sparse.grad` only returns the gradient with respect to the first element of a PyTree #16582

`sparse.grad` only returns the gradient with respect to the first element of a PyTree #16582

al-jshen commented Jun 28, 2023 •

edited

Loading

Blair-Johnson commented Feb 8, 2024

jakevdp commented Feb 8, 2024 •

edited

Loading

Blair-Johnson commented Feb 12, 2024

Blair-Johnson commented Mar 27, 2024

sparse.grad only returns the gradient with respect to the first element of a PyTree #16582

sparse.grad only returns the gradient with respect to the first element of a PyTree #16582

Comments

al-jshen commented Jun 28, 2023 • edited Loading

Description

What jax/jaxlib version are you using?

Which accelerator(s) are you using?

Additional system info

NVIDIA GPU info

Blair-Johnson commented Feb 8, 2024

jakevdp commented Feb 8, 2024 • edited Loading

Blair-Johnson commented Feb 12, 2024

Blair-Johnson commented Mar 27, 2024

`sparse.grad` only returns the gradient with respect to the first element of a PyTree #16582

`sparse.grad` only returns the gradient with respect to the first element of a PyTree #16582

al-jshen commented Jun 28, 2023 •

edited

Loading

jakevdp commented Feb 8, 2024 •

edited

Loading