-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient of scan fails when it involves a shared variable #555
Comments
This works: def step(x, epsilon, mu, sigma):
next_x = x + mu + sigma * epsilon
return next_x
rng = pytensor.shared(np.random.default_rng())
new_rng, epsilons = pt.random.normal(size=10, rng=rng)
traj, updates = pytensor.scan(step, outputs_info=[x0], non_sequences=[mu, sigma], sequences=[epsilons])
df_ds = pt.grad(traj[-1], sigma)
f = pytensor.function([x0, mu, sigma], df_ds, updates={rng:new_rng}) So as long as I can write the sequence as conditionally independent it works? It seems like it should be possible to get the gradient without doing that, though. |
This is a case I am not sure we want to be too clever for the user sake. If you're taking gradients in stochastic graphs perhaps you should know exactly what you're doing and do the reparametrization yourself (you can create your own suite of rewrites to change the graph before calling grad). Note we never do any random rewrites by default (other than when certain distributions are missing in a backend) because depending on how the random generator routine is used it can alter the results. This is a decision Theano devs took for reproducibility/ease of debug that we can revisit, but should do so consciously. |
I guess the tags I chose for this issue are quite bad because I don't think I want any kind of special automatic handling here. More that it seems like when scan is constructing it's gradient, it is failing because it's asking the random generator for a gradient, which it (obviously) doesn't have. Shouldn't these have a pass-through? If there's another complication (because an actual random variable -- NOT a generator -- is on the backwards graph) it can and should still error, I agree. |
Can you provide a full example? Your original one has The only issue with scans with gradient stuff I know is that they must be passed explicitly: #6 |
mu = pt.dscalar('mu')
sigma = pt.dscalar('sigma')
x0 = pt.dscalar('x0')
rng = pytensor.shared(np.random.default_rng(), 'rng')
def step(x, mu, sigma, rng):
new_rng, epsilon = pm.Normal.dist(0, 1, rng=rng).owner.outputs
next_x = x + mu + sigma * epsilon
return next_x, {rng:new_rng}
traj, updates = pytensor.scan(step, outputs_info=[x0], non_sequences=[mu, sigma, rng], n_steps=10)
pt.grad(traj[-1], sigma) Gives: Traceback
NullTypeGradError Traceback (most recent call last) File ~/mambaforge/envs/cge-dev/lib/python3.11/site-packages/pytensor/gradient.py:616, in grad(cost, wrt, consider_constant, disconnected_inputs, add_names, known_grads, return_disconnected, null_gradients) NullTypeGradError: |
This seems to be an old known bug/limitation of Scan: https://groups.google.com/g/theano-users/c/dAwr1j8-QOY/m/8fmDmQPkPJkJ Maybe something we can also address better in the Scan refactor, since we don't treat shared variables as magical entities anymore. |
scan
graphs
First refactor all of pytensor (and pymc) to remove shared variables? :D |
Shared variables are fine-ish, it's the treating them differently in PyTensor internals that's a source of unnecessary complexity. This special treatment is also a thorn in OpFromGraph: #473 Which led me to basically reimplement it completely to be usable in PyMC: pymc-devs/pymc#6947 |
I just read #473, so your thoughts on |
In sum, I think shared variables should be explicit inputs everywhere except in the outer PyTensor function where there is an ambiguity of whether the call signature would require them or not |
I reopened, until we have a solution I think it's good to track the issue |
Before
Currently, this graph has valid gradients with respect to
mu
andsigma
:But this graph does not:
After
I imagine that in cases where the "reparameterization trick" is used, stochastic gradients can be computed for scan graphs.
Context for the issue:
The "reparameterization trick" is well known in the machine learning literature as a way to get stochastic gradients from graphs with sampling operations. It seems like we already support this, because this graph can be differentiated:
But this graph cannot:
The fact that even the "good" version breaks down in scan is I suppose a bug? Or a missing feature? Or neither? In the equation:
with
I should get back the sum of the random draws for the sequence.
Context: I'm trying to use pytensor to compute greeks for options, which involves taking the derivative of sampled trajectories.
The text was updated successfully, but these errors were encountered: