-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory collapses with precomputed block matrix #180
Comments
Please provide a minimal code example where this fails so we can reproduce and debug this. There really isn't anything obvious that would cause the memory to balloon. Does the same thing happen when using |
Okay, with negative_gradient_method="fft" it works, but I sometimes get the RuntimeWarning: invalid value encountered in log A minimal example would be:
with the matrix in matrix.csv Note: It doesn't crash always, but every third or fourth time, so I think its due to some initialization? |
What I suspect may be happening is that if the points are too close together in the embedding, the BH tree balloons. However, I should have some safeguards in place so that this doesn't happen. Maybe those safeguards aren't enough. This might be plausible since you're trying to force all the points super close together. But I'd need to investigate it further. I suggest you use |
Actually the points should not be on top of each other in the embedding. When some points have zero distance in the original embedding, they should get finite affinities and should end up close but not overlapping in the embedding. If you get overlapping points in sklearn, that would be weird. Can you post a picture of sklearn embedding, and also of what you get from openTSNE when it does not crash? |
Wow. None of that makes any sense to me! Unless I am missing something, the points should not overlap at all, so we should see 200 points separately. BH-openTSNE makes most sense, but even there there are many fewer than 200 points visible. In FFT-openTSNE there is |
I agree, none of these plots seem correct. The repulsive forces should push the points apart at least a little bit. However, the spans of the embeddings are quite big, so maybe it could be a visual plotting thing that they look like a single point. Are the final coordinates really just a single point, or are they just really close to one another and the large point sizes and large spans just make it look that way? |
It seems that has nothing to do with
This prints 6 🤯 Update: with sklearn I get points that are almost overlapping but are not identical. However, this result seems fishy to me too. Update2: with FFT I get 7 distinct points. So points are exactly overlapping. Update3: I checked and the same thing happens with Also note that adding a minuscule amount of noise like |
I realized that I was wrong when I said that the optimal embedding should not have overlapping points. On reflection, it should be possible to reduce KL to zero by making all points from each class overlap exactly. All pairs of points overlapping in high dimensions have identical affinities, and if they overlap in 2D they will have identical low-dimensional similarities too. If so, this means that sklearnTSNE returns the correct result. But I see a lot of thing going weird when optimizing it with openTSNE... Using the same data as above
It works fine with starting from random initialization:
both reduce KL to near-zero and make the points roughly (but not exactly) overlap. But using PCA initializatoin (where points in the initialization overlap exactly) the optimization goes wild.
BH yield negatative KL values which is mathematically impossible!
While FFT...
...diverges!
|
Update: sklearn also shows negative KL values when using PCA initialization (but not with random initialization)!
results in
so it seems that sklearnTSNE and openTSNE have the same problem here. |
Hmm, this is all very strange, and I'll need to look into it some more. But I think that in my implementation of BH, one of the assumptions I made was that the embedding points would never actually overlap like that. So the BH one doesn't surprise me that much. And also, such things are pretty difficult to solve with BH, if memory serves. I am a bit surprised about FIt-SNE not working, since it doesn't make any assumptions about how the points are spaced out. Perhaps the negative KL values are the most puzzling of all. I definitely consider this an edge case, and a very unusual usage. Adding just a little bit of noise to the distance matrix makes everything run smoothly. |
Aren't KL values during BH optimization computed using some expressions computed via BH? If so, then it's not surprising that KL values end up being very wrong.
I have no idea about this. But if this is really true, then a dirty workaround would be to add some tiny noise to the initialization (with a warning) if the repulsion method is BH and if there are overlapping points... I'm just not sure what's the best way to detect that there are overlapping points in the initialization.
In fact, we may do the same for FFT as well if we cannot otherwise find out what's going on there. |
Maybe something like
can work to detect if there are duplicates in the initialization array... |
Somewhat fleshing out this workaround:
works in <1 second for n=1mln:
So my suggestion would be to run this on any initialization (explicitly provided by the user or computed using PCA) and print a warning when adding jitter. |
Thanks for this solution, I'm actually amazed it works this fast! At first I was kind of on the fence about this, thinking: "How often would this actually happen?" I think a block distance matrix is something you're really rarely ever see in practice. However, duplicate entries aren't that rare at all, e.g. even in the iris data set, there are two identical rows. However, for iris, this isn't a problem, and the two identical data points get embedded into the exact same position. So that seems to work okay. Then the question becomes "why doesn't it work when there are a lot of duplicate entries?" For iris, I tried setting a bunch of points to being duplicate entries, and seeing where it failed. Interestingly enough, it failed when I put 30 duplicate points. This doesn't correlate with the perplexity, as increasing it to e.g. 70 still results in negative KL values. So this is definitely very strange. However, setting using 30 duplicate rows and setting perplexity to 70 still results in negative KL values. So it seems like the problem has always been there, but the KL values sometimes don't reach negative values. It also seems like it works if the number of duplicate entries isn't too big. I'm fine with using your fix as a temporary solution. Clearly, there is a bug somewhere, and it needs to be found. However, this isn't something that would happen very often though I think. If you want to open a PR with this temporary fix, I'll be happy to merge it. |
Just want to comment here briefly that I implemented this in a branch but then observed some odd behaviour (I was getting some "near-duplicates" detected in the initialization when there should not have been any) and did not get to investigating it further since then... |
Expected behaviour
When I run tSNE on a symmetric 200x200 block matrix such as this one
I expect TSNE to return 4 distinct clusters (actually 4 points only). Sklearn yields this.
Actual behaviour
Using openTSNE the terminal crashes with full memory (50% of the time). If it survives the clusters are visible, however the result is not as satisfying.
Steps to reproduce the behavior
matrix = Block matrix
tsne = TSNE(metric='precomputed', initialization='spectral', negative_gradient_method='bh')
embedding = tsne.fit(matrix)
NOTE: I am using the direct installation from GitHub this morning.
The text was updated successfully, but these errors were encountered: