Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Fixes #223.
The
is_duplicate
function in the BH implementation had a threshold set to 1e-6, likely taken from other implementations, perhaps scikit-learn. This meant that during the EE phase, where everything was squished together, this condition was triggered a lot, meaning BH wasn't splitting up the points into different quadrants, but was pretending it was dealing with a single point. This mean that there were no repulsive forces being applied to the points at all, leading to an increasing collapse of all the points.Description of changes
I simply changed the threshold for duplicate detection to machine precision, and the issue goes away.
I also copied over the test by @dkobak from #234, but I've also let it run the standard phase. We want to make sure that the embedding isn't collapsed at the end of the optimization, and we don't really care if it is super compressed at the end of the EE phase.
EDIT: Not machine precision, because then, duplicate detection actually doesn't work, and the tests fail on iris. Setting the threshold to 1e-16 resolves both issues.
Includes