How to structure aesara gradients? #1015
-
Following my discussion with @aseyboldt, I worked out the jacobian for the discrete Lyapunov equation... I think. I'm a bit like a blind squirrel finding a nut. I put all my work into a gist I am hoping someone could check it out and answer a couple questions:
A. I think my notation is clearly broken, because I ended up with the implicit jacobians dX/dA and dX/dB when I set dF/dA = 0 and solved for dX. Can anyone point out what I am writing down incorrectly?
A. I am using the Magnus and Neudecker methodology for solving matrix-by-matrix differentials. The shapes and ordering of their results are different from what I get back from What are the shape expectations of B. Given that the jacobian is a tensor (i think?), is there anything special that needs to be done in the R_op and L_op methods? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Here is my first crack at the Op: def commutation_matrix(p, q):
"""
Create the commutation matrix K_{p,q} satisfying vec(A') = K_{p,q} vec(A)
Copied from statsmodels.tsa.tsatools:
https://github.com/statsmodels/statsmodels/blob/main/statsmodels/tsa/tsatools.py
Parameters
----------
p : int
q : int
Returns
-------
K : ndarray (pq x pq)
"""
p = int(p)
q = int(q)
K = np.eye(p * q)
indices = np.arange(p * q).reshape((p, q), order="F")
return K.take(indices.ravel(), axis=0)
class DiscreteLyapunovGrad(at.Op):
__props__ = ()
def make_node(self, A, B):
A = at.as_tensor_variable(A)
B = at.as_tensor_variable(B)
# self._check_input_dims(A, B)
out_dtype = aesara.scalar.upcast(A.dtype, B.dtype)
dA = at.matrix(dtype=out_dtype)
dB = at.matrix(dtype=out_dtype)
return aesara.graph.basic.Apply(self, [A, B], [dA, dB])
def perform(self, node, inputs, outputs):
(A, B) = inputs
X = linalg.solve_discrete_lyapunov(A, B)
n, _ = A.shape
I = np.eye(n)
K = commutation_matrix(n, n)
big_I = linalg.kron(I, I)
A_kron_A = linalg.kron(A, A)
inv_term = -linalg.solve(A_kron_A - big_I, big_I)
dA_term = linalg.kron(A @ X.T, I) - \
linalg.kron(I, A @ X) @ K
dA = inv_term @ dA_term
dB = inv_term @ big_I
outputs[0][0] = dA
outputs[0][1] = dB
# def infer_shape(self, fgraph, node, shapes):
# return [[x ** 2 for x in shape] for shape in shapes]
class SolveDiscreteLyapunov(at.Op):
__props__ = ()
def make_node(self, A, B):
A = at.as_tensor_variable(A)
B = at.as_tensor_variable(B)
# self._check_input_dims(A, B)
out_dtype = aesara.scalar.upcast(A.dtype, B.dtype)
X = at.matrix(dtype=out_dtype)
return aesara.graph.basic.Apply(self, [A, B], [X])
def perform(self, node, inputs, output_storage):
(A, B) = inputs
X = output_storage[0]
X[0] = linalg.solve_discrete_lyapunov(A, B)
def infer_shape(self, fgraph, node, shapes):
return [shapes[0]]
def grad(self, inputs, output_grads):
A, B = inputs
return DiscreteLyapunovGrad()(A, B) The forward part works fine (it's just a glorified wrapper around A = at.dmatrix('A')
B = at.dmatrix('B')
a = np.array([[1, 2], [3, 4]], dtype='float64')
b = np.array([[9, 10], [11, 12]], dtype='float64')
X = SolveDiscreteLyapunov()(A, B)
aesara.gradient.jacobian(X.ravel(), A).eval({A:a, B:b})
Obviously I don't understand what precisely should be returned by |
Beta Was this translation helpful? Give feedback.
-
I'm assuming that the answer was given in #1015 (reply in thread). If so, let's mark that as the answer (somehow). |
Beta Was this translation helpful? Give feedback.
I'm assuming that the answer was given in #1015 (reply in thread). If so, let's mark that as the answer (somehow).