use cached matrix of pairwise dot products where possible #95

senderle · 2020-01-15T20:56:22Z

Currently the 3CosAdd and 3CosMul methods run very slowly. It should be possible to speed both operations with the strategy, used by hyperwords, of precalculating all pairwise cosine similarity scores once, and simply adding, subtracting, multiplying, or dividing the resulting scalars for tests.

This is possible because cos(b*, a* − a + b) can be distributed out to cos(b*, a*) − cos(b*, a) + cos(b*, b). (The second looked worse to me until I realized you can precompute all possible pairwise dot products and never have to calculate any more, whereas every new combination of a*, a, b in the first case requires executing a new dot product.)

Given the current architecture this might not be a trivial operation, but at least for smaller vocabularies, it should definitely be possible, and could yield a ~100x speedup by leveraging fast linear algebra routines.

This is something I'd be interested in working on, though I may not be able to devote time to it in the near term. If that would be of interest, though, please let me know.

The text was updated successfully, but these errors were encountered:

undertherain · 2020-01-16T00:53:01Z

Hi!
Yeah that's interesting and does not look like it will take a lot of code to implement.
One reservation I have, as you pointed out, is that with larger-ish vocab size the cached matrix of dot products will take super-huge memory space. For instance I typically see an order of 100K tokens vocabs, meaning matrix size will be 40GB. So Ideally we'd check memory available RAM size before doing this and give appropriate feedback to user

senderle · 2020-01-16T12:46:57Z

Yeah I was going back over this in my head and remembering that what makes it practical is that the full VxV matrix doesn't need to be stored. It should be possible to store a VxT matrix of cosine values, where T is the number of words in the analogy test set.

Then the solution for a given analogy is (VT[:, a_] - VT[:, a] + VT[:, b]).argmax().

Maybe you still couldn't do it with all words for all 40 tests at once. But I bet it would still give a good speedup if you did it just once for each test — that would be, what, 2500 words or so? For 32 bit floats seems like that's 2 gigs, well within the capacity of most modern laptops.

senderle · 2020-01-16T13:04:36Z

Wait, no! It's way smaller because the number of distinct words per test is far smaller than the number of tests, since the tests look at every pair against every other pair. There must only be ~2k distinct terms in the entire BATS test set — is that right? In that case you could definitely do the full VxT matrix. Though the approach of splitting them by test might be more practical from an implementation standpoint.

undertherain · 2020-01-16T13:41:53Z

You can not restrict the space of possible answers to word from the test set. basically vocabulary size and content affects performance of different embeddings tremendously. For fair comparison between two sets of embeddings we first make vocabularies identical. If only words from the test are allowed, the answer would most often be correct.

senderle · 2020-01-16T14:16:34Z

This won't restrict the space of possible answers though — we need all possible outputs (the V dimension) but we only need a subset of inputs (the T dimension).

senderle · 2020-01-16T15:01:46Z

I wrote a quick example to show what I mean. I believe it works as it should, but let me know if you see something wrong.

(This used to be a gist, but gist wasn't playing nicely with the sample vector file so I just created a standard repo.)

https://github.com/senderle/3cosadd-demo

senderle changed the title ~~use cached matrix of pairwise distances where possible~~ use cached matrix of pairwise dot products where possible Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use cached matrix of pairwise dot products where possible #95

use cached matrix of pairwise dot products where possible #95

senderle commented Jan 15, 2020 •

edited

Loading

undertherain commented Jan 16, 2020

senderle commented Jan 16, 2020

senderle commented Jan 16, 2020

undertherain commented Jan 16, 2020

senderle commented Jan 16, 2020 •

edited

Loading

senderle commented Jan 16, 2020 •

edited

Loading

use cached matrix of pairwise dot products where possible #95

use cached matrix of pairwise dot products where possible #95

Comments

senderle commented Jan 15, 2020 • edited Loading

undertherain commented Jan 16, 2020

senderle commented Jan 16, 2020

senderle commented Jan 16, 2020

undertherain commented Jan 16, 2020

senderle commented Jan 16, 2020 • edited Loading

senderle commented Jan 16, 2020 • edited Loading

senderle commented Jan 15, 2020 •

edited

Loading

senderle commented Jan 16, 2020 •

edited

Loading

senderle commented Jan 16, 2020 •

edited

Loading