Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is different between max_tt_rank and tt_rank? #218

Open
miladdona opened this issue Sep 24, 2021 · 3 comments
Open

What is different between max_tt_rank and tt_rank? #218

miladdona opened this issue Sep 24, 2021 · 3 comments

Comments

@miladdona
Copy link

Hi guys,

I have a simple model and I want to apply T3F library on a dense layer of the model with shape of (4536, 100).
There are different combination but I want to use this [[2, 2, 2, 567], [2, 2, 5, 5]] and define rank as 10.

Wtt = t3f.to_tt_matrix(W, shape=[[2, 2, 2, 567], [2, 2, 5, 5]], max_tt_rank=10)
tt_layer = t3f.nn.KerasDense(input_dims=[2, 2, 2, 567], output_dims=[2, 2, 5, 5], tt_rank=10, activation='relu')

But after running I get this error:
ValueError: Layer weight shape (1, 2, 2, 20) not compatible with provided weight shape (1, 2, 2, 4)

I think this is related to the max_tt_rank in the first statement and tt_rank in the second statement.
I want to know what is different between them and how can I control this?

Thanks.

@Bihaqo
Copy link
Owner

Bihaqo commented Sep 24, 2021

Hi

A few things

  1. I had reasons to call max_tt_rank and tt_rank differently, but now that you questioned it, I realised that those reasons were never convincing enough and you're totally right, they should have the same name (tt_rank)
  2. You hit a frequent problem common to many TT codebases when your TT rank is bigger than the theoretical maximally useful TT-rank. TT-rank is actually a list, when you define it with a number 10 it gets silently converted into list (1, 10, 10, 10, 1) for you (the list has 5 elements because your underlying tensor is 4 dimensional; it aways have 1 as the first and last element). The second of those TT-ranks is redundantly big. You can change the code to
tt_layer = t3f.nn.KerasDense(input_dims=[2, 2, 2, 567], output_dims=[2, 2, 5, 5], tt_rank=(1, 4, 10, 10, 1), activation='relu')

and I believe it should work
3) Actually, I wouldn't recommend using such an inbalanced tensor shape. Very likely you would be better off to pad your input size 4536 to e.g. 5000 and then use input_dims = (10, 10, 10, 5) or something like this. This would also fix you previous problem: with a more balanced shape, the TT-rank 10 should work out of the box.
4) Also note that TT-layer might be sensitive to the order of inputs and outputs, i.e. it might work a lot worse if you shuffle your output dimensions. It is not a problem if the layer is in the middle of an MLP (because the surrounding dense layers can provide features in any order that is useful for your TT-layer), but it might be problematic if using the TT-layer as the last layer, since the order of outputs would be defined by the (arbitrary) order of your labels. TLDR: if this is the last layer in your network, I would try to also add yet another dense layer on top of it of size 100 x 100.

@miladdona
Copy link
Author

Thanks.
Is there a way to find list of tt_rank? I mean how did find the tt_rank=(1, 4, 10, 10, 1)? Did you try with running or did you find it with some equations and relations?

@Bihaqo
Copy link
Owner

Bihaqo commented Sep 27, 2021

So the idea is that if your input dims are [a1, a2, a3] and your output dims are [b1, b2, b3], then your TT-ranks should be smaller than np.minimum([1, a1*b1, a1*b1*a2*b2, a1*b1*a2*b2*a3*b3], [a1*b1*a2*b2*a3*b3, a2*b2*a3*b3, a3*b3, 1]).

In this case it's np.minimum([1, 4, 16, 160, 453600], [453600, 113400, 28350, 2835, 1]) = [1, 4, 16, 160, 1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants