How to structure aesara gradients? #1015

jessegrabowski · 2022-06-26T11:28:19Z

jessegrabowski
Jun 26, 2022

Following my discussion with @aseyboldt, I worked out the jacobian for the discrete Lyapunov equation... I think. I'm a bit like a blind squirrel finding a nut. I put all my work into a gist I am hoping someone could check it out and answer a couple questions:

Math Questions:

A. I think my notation is clearly broken, because I ended up with the implicit jacobians dX/dA and dX/dB when I set dF/dA = 0 and solved for dX. Can anyone point out what I am writing down incorrectly?

Aesara questions:

A. I am using the Magnus and Neudecker methodology for solving matrix-by-matrix differentials. The shapes and ordering of their results are different from what I get back from aesara.gradient.jacobian -- you can see in the notebook I have to transpose the input function, then permute and reshape the output reshape to get it to match my analytic solutions.

What are the shape expectations of Op.grad? My function will take in 2 matrices of shape $N \times N$ and return a matrix of shape $N \times N$, so the grad method should return two $N \times N \times N$ tensors, rather than two $N^2 \times N^2$ matrices? Are there other functions that use kroneker products to define the jacobian i can look at as a guide?

B. Given that the jacobian is a tensor (i think?), is there anything special that needs to be done in the R_op and L_op methods?

Answered by brandonwillard

Aug 4, 2022

I'm assuming that the answer was given in #1015 (reply in thread). If so, let's mark that as the answer (somehow).

View full answer

jessegrabowski · 2022-06-27T07:08:31Z

jessegrabowski
Jun 27, 2022
Author

Here is my first crack at the Op:

def commutation_matrix(p, q):
    """
    Create the commutation matrix K_{p,q} satisfying vec(A') = K_{p,q} vec(A)
    Copied from statsmodels.tsa.tsatools:
    https://github.com/statsmodels/statsmodels/blob/main/statsmodels/tsa/tsatools.py
   
    Parameters
    ----------
    p : int
    q : int

    Returns
    -------
    K : ndarray (pq x pq)
    """
    p = int(p)
    q = int(q)

    K = np.eye(p * q)
    indices = np.arange(p * q).reshape((p, q), order="F")
    return K.take(indices.ravel(), axis=0)

class DiscreteLyapunovGrad(at.Op):
    __props__ = ()
    
    def make_node(self, A, B):
        A = at.as_tensor_variable(A)
        B = at.as_tensor_variable(B)        
#         self._check_input_dims(A, B)
        
        out_dtype = aesara.scalar.upcast(A.dtype, B.dtype)
        dA = at.matrix(dtype=out_dtype)
        dB = at.matrix(dtype=out_dtype)
        
        return aesara.graph.basic.Apply(self, [A, B], [dA, dB])
    
    def perform(self, node, inputs, outputs):
        (A, B) = inputs
        X = linalg.solve_discrete_lyapunov(A, B)
        
        n, _ = A.shape
        I = np.eye(n)
        K = commutation_matrix(n, n)
        
        big_I = linalg.kron(I, I)
        A_kron_A = linalg.kron(A, A) 
        
        inv_term = -linalg.solve(A_kron_A - big_I, big_I)
        dA_term = linalg.kron(A @ X.T, I) - \
            linalg.kron(I, A @ X) @ K
        
        dA = inv_term @ dA_term
        dB = inv_term @ big_I
        
        outputs[0][0] = dA
        outputs[0][1] = dB
        
#     def infer_shape(self, fgraph, node, shapes):
#         return [[x ** 2 for x in shape] for shape in shapes]

class SolveDiscreteLyapunov(at.Op):
    __props__ = ()
    
    def make_node(self, A, B):
        A = at.as_tensor_variable(A)
        B = at.as_tensor_variable(B)
        
#         self._check_input_dims(A, B)
        
        out_dtype = aesara.scalar.upcast(A.dtype, B.dtype)
        X = at.matrix(dtype=out_dtype)
        
        return aesara.graph.basic.Apply(self, [A, B], [X])
    
    def perform(self, node, inputs, output_storage):
        (A, B) = inputs
        X = output_storage[0]
        
        X[0] = linalg.solve_discrete_lyapunov(A, B)
        
    def infer_shape(self, fgraph, node, shapes):
        return [shapes[0]]
    
    def grad(self, inputs, output_grads):
        A, B = inputs
        return DiscreteLyapunovGrad()(A, B)

The forward part works fine (it's just a glorified wrapper around scipy.linalg.solve_discrete_lyapunov), but trying to call aesara.gradient.jacobian errors out as follows:

A = at.dmatrix('A')
B = at.dmatrix('B')

a = np.array([[1, 2], [3, 4]], dtype='float64')
b = np.array([[9, 10], [11, 12]], dtype='float64')

X = SolveDiscreteLyapunov()(A, B)

aesara.gradient.jacobian(X.ravel(), A).eval({A:a, B:b})

 IndexError: list assignment index out of range
Apply node that caused the error: DiscreteLyapunovGrad(A, B)
Toposort index: 0
Inputs types: [TensorType(float64, (None, None)), TensorType(float64, (None, None))]
Inputs shapes: [(2, 2), (2, 2)]
Inputs strides: [(16, 8), (16, 8)]
Inputs values: [array([[1., 2.],
       [3., 4.]]), array([[ 9., 10.],
       [11., 12.]])]
Outputs clients: [[Alloc(DiscreteLyapunovGrad.0, Elemwise{Mul}[(0, 0)].0, Shape_i{0}.0, Shape_i{1}.0), Shape_i{1}(DiscreteLyapunovGrad.0), Shape_i{0}(DiscreteLyapunovGrad.0)], []]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Obviously I don't understand what precisely should be returned by grad. Guidance would be appreciated.

7 replies

ricardoV94 Jun 27, 2022

I'll try to answer when I have time (also to figure it out).

Just wanted to say that, as someone who as often stared at the documentation and codebase, improving the documentation on how Aesara handles gradients, R_op, L_op (and whatever a user needs to know) would be an invaluable contribution on its own.

In other words don't throw away whatever you learn in this process!

aseyboldt Jun 28, 2022

Turns out the simple solution where we just take the differential of the equation doesn't really work for the backward gradient.

The derivative shouldn't be too hard to work out though

I just shouldn't say things like this, it's just bad luck...

Anyway, I found a paper that computes exactly what we need: https://arxiv.org/pdf/2011.11430.pdf

And a basic implementation based on what you wrote:

# https://arxiv.org/pdf/2011.11430.pdf

class SolveDiscreteLyapunov(at.Op):
    __props__ = ()
    
    def make_node(self, A, B):
        A = at.as_tensor_variable(A)
        B = at.as_tensor_variable(B)
        
#         self._check_input_dims(A, B)
        
        out_dtype = aesara.scalar.upcast(A.dtype, B.dtype)
        X = at.matrix(dtype=out_dtype)
        
        return aesara.graph.basic.Apply(self, [A, B], [X])
    
    def perform(self, node, inputs, output_storage):
        (A, B) = inputs
        X = output_storage[0]
        
        X[0] = linalg.solve_discrete_lyapunov(A, B)
        
    def infer_shape(self, fgraph, node, shapes):
        return [shapes[0]]
    
    def grad(self, inputs, output_grads):
        A, Q = inputs
        (dX,) = output_grads
        
        X = self(A, Q)
        
        S = solve_discrete_lyapunov(A, dX)
        
        return [
            at.dot(at.dot(S, A), X.T) + at.dot(at.dot(S.T, A), X),
            S,
        ]

    
_solve_discrete_lyapunov = SolveDiscreteLyapunov()

def solve_discrete_lyapunov(a, q):
    """Solve the discrete Lyapunov equation :math:`AXA^H - X + Q = 0`."""
    
    return _solve_discrete_lyapunov(a, q)

import numpy as np

rng = np.random.default_rng(123)
N = 11
A = rng.normal(size=(N, N))
A = A + A.T
Q = rng.normal(size=(N, N))
#Q = Q + Q.T
aesara.gradient.verify_grad(solve_discrete_lyapunov, pt=[A, Q], rng=rng)

aseyboldt Jun 28, 2022

About what R_op and L_op do: The more I work with those things, the more I'm convinced that I really need to learn more differential geometry. Both operations are really basic operations there: pullback (https://en.wikipedia.org/wiki/Pullback_(differential_geometry)) (the r_op) and pushforward (https://en.wikipedia.org/wiki/Pushforward_(differential)).
I still get tangled up in definitions and notation half time I try to actually compute something with it.

jessegrabowski Jun 28, 2022
Author

This is great, thanks for carrying this over the finish line. I'll add in the shape checks, add a unit test, and submit a PR.

Since the paper you found gives the (pullback?) for Sylvester equations generally, we have everything we need for a SovleContinuousLyapunov op as well. I'll go ahead and submit a separate PR for that as well.

I wouldn't take the L on the differential being "not too hard" to work out -- it wasn't too bad, and it was a good crash course on matrix differentials for me. It wasn't too bad to work out! The pity is, it wasn't actually the object we needed...

If you find any good resources for tackling differential geometry please pass them along!

brandonwillard Jun 29, 2022
Maintainer

To make these discussions more productive for others and the Aesara project, perhaps they could result in some updates to the documentation (e.g. regarding methods like R_op, L_op, etc.)

brandonwillard · 2022-08-04T18:49:37Z

brandonwillard
Aug 4, 2022
Maintainer

I'm assuming that the answer was given in #1015 (reply in thread). If so, let's mark that as the answer (somehow).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to structure aesara gradients? #1015

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to structure aesara gradients? #1015

jessegrabowski Jun 26, 2022

Replies: 2 comments · 7 replies

jessegrabowski Jun 27, 2022 Author

ricardoV94 Jun 27, 2022

aseyboldt Jun 28, 2022

aseyboldt Jun 28, 2022

jessegrabowski Jun 28, 2022 Author

brandonwillard Jun 29, 2022 Maintainer

brandonwillard Aug 4, 2022 Maintainer

jessegrabowski
Jun 26, 2022

Replies: 2 comments 7 replies

jessegrabowski
Jun 27, 2022
Author

jessegrabowski Jun 28, 2022
Author

brandonwillard Jun 29, 2022
Maintainer

brandonwillard
Aug 4, 2022
Maintainer