Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid Distance Keywrods Not Specified #22

Open
DarioS opened this issue Nov 30, 2023 · 1 comment
Open

Valid Distance Keywrods Not Specified #22

DarioS opened this issue Nov 30, 2023 · 1 comment

Comments

@DarioS
Copy link

DarioS commented Nov 30, 2023

distance: A string specifying the distance metric to use.

O.K. but what are permitted values? Perhaps KmknnParam should have distance = c('A', 'B', 'C') and inside use match.arg. I am keen to try Louvain clustering with a Pearson correlation distance, for instance.

@LTLA
Copy link
Member

LTLA commented Nov 30, 2023

Probably should say "See ?buildKmknn for details", as it just forwards its arguments to that. Happy to take a PR for the doc fix (might consider doing it for all constructors). I couldn't be bothered repeating the whole vector of options in the KmknnParam constructor, then I would have to update things in two places if I ever added a new distance metric.

It seems that all methods are restrained to distance = c("Euclidean", "Manhattan", "Cosine") (see ?bndistance). I suppose we could also have method-specific distances, for those that support them, e.g., Hamming for Annoy. I haven't come across a demand for that yet, though again, PRs are welcome if someone can be bothered.

Note that my "Cosine" is really just L2-normalized euclidean because the typical "1 - cosine similarity" isn't a real distance metric. I mention this because, as you may already know, "1 - correlation" isn't a distance metric either. Instead I typically center and scaled the input values for each sample, and then use the typical Euclidean distance, which turns out to be a monotonic function of the Pearson correlation (sqrt((1 - correlation)/2), or something like that, see here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants