Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Dense layers work? #219

Open
miladdona opened this issue Sep 26, 2021 · 3 comments
Open

How Dense layers work? #219

miladdona opened this issue Sep 26, 2021 · 3 comments

Comments

@miladdona
Copy link

Hi,

For example we have a dense layer with shape (100, 100) and we will try with shape [[2, 2, 5, 5], [2, 5, 2, 5]] and max_tt_rank=4
based on this example we have these tt_cores
(1, 2, 2, 4)
(4, 2, 5, 4)
(4, 5, 2, 4)
(4, 5, 5, 1)

  1. Are the tt_core always 4-D?
  2. How does a dense layer work? For example in SVD decomposition of a dense layer we have two thinner dense layer :
    I mean for a dense layer with shape of (100, 100) and rank=20 we have two dense layers like (100, 20) and (20, 100).
    I want to know how does T3F library work in this manner?
  3. Is there a way to extract number of operation for each layer? For example for the previous example we have 100 * 100 = 10000 operations in normal way, but I can not extract number of operations in T3F library!

Thank you in advance.
Best regards,
Miladona

@Bihaqo
Copy link
Owner

Bihaqo commented Oct 5, 2021

  1. Yes, tt core is always 4D for TT matrices in this Keras layer. (It's 3D for TT tensors (in cotrast to TT-matrices). Also, sometimes you can consider a batch of TT-objects which can add an extra dimension).
  2. It will reshape your input vector into a tensor of shape (2, 2, 5, 5) and then contract the cores one by one with the input tensor. This can be interpreted as d (4 in this case) weird sparse linear layers.
    It looks something like this:
res = input_vector.reshape(2, 2, 5, 5)
res = einsum('aijb,qwei->qweja', tt_cores[-1], res)  # equivalent to matrix multiplying 4*5x5*1 core by 5x2*5*2 input, i.e. 4*5* 5 * 2*5*2 FLOPs not counting reshapes
res = einsum('ceka,qweaj->qwkjc', tt_cores[-2], res)  # equivalent to matrix multiplying 4*2x4*5 core by 4*5x2*5*2 input, i.e. 4*2* 4*5 * 2*5*2 FLOPs
res = einsum('dwlc,qwkjc->qlkjd', tt_cores[-3], res)
res = einsum('fqnd,qlkjd->nlkj', tt_cores[0], res)
res = res.reshape(2*5*2*5)
res += bias

@miladdona
Copy link
Author

Hi,

How do you extract these strings (in fact operations) : 'aijb,qwei->qweja'
I mean, in this example, input shape has a 4D shape and the cores are 4D tensors. If we have input in 3D or 2D or 5D shape how to define these operations?

for example if we have [[4, 5, 5], [5, 4, 5]] instead [[2, 2, 5, 5], [2, 5, 2, 5]]

Thank you in advance.
Kind regards,
Miladdona

@Bihaqo
Copy link
Owner

Bihaqo commented Jun 8, 2022

Hi,

You mean how to read this notation or how do I come up with this particular formulas? If the former, check out some tutorial, e.g. https://rockt.github.io/2018/04/30/einsum

If the latter, then check out the Tensorizing Neural Networks [1] paper for the definition of a TT layer, formula (5).

You have input vector x which you reshape into e.g. [2, 2, 5, 5] (or [4, 5, 5]) tensor X, and then you need to do the summation w.r.t. j1, j2, j3, j4 (or j1, j2, j3 in case of [4, 5, 5]).
Also, the terms Gk[ik, jk] are themself matrices which are multiplied by each other, so you also need to sum out the intermediate dimensions (which correspond to the rank).

E.g. the first step in the pseudocode above res = einsum('aijb,qwei->qweja', tt_cores[-1], res) does summation w.r.t. j4 (which I call j in the einsum string). It says res[q, w, e, j, a] = sum_{j, b} Gd[i,j](a, b) x[q, w, e, i], where tt_cores[-1] is Gd and res is x.

Note that in these formulas we multiply X @ TTW, while in the paper we do TTW.T @ X, sorry about confusing notation.

[1] https://arxiv.org/pdf/1509.06569.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants