Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable clustering by distance matrix input #33

Merged
merged 4 commits into from
Jul 29, 2024
Merged

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Jul 26, 2024

Adds a mutually exclusive input argument, --distance-matrix, to the pathogen-cluster command and corresponding logic to load this matrix and find clusters from its using HDBSCAN's precomputed metric option. When an embedding is provided instead, pathogen-cluster runs HDBSCAN with the default Euclidean distance metric.

huddlej added 4 commits July 26, 2024 15:14
Adds a mutually exclusive input argument, `--distance-matrix`, to the
`pathogen-cluster` command and corresponding logic to load this matrix
and find clusters from its using HDBSCAN's precomputed metric option.
When an embedding is provided instead, `pathogen-cluster` runs HDBSCAN
with the default Euclidean distance metric.
Moves imports of subcommand functions after arguments have been parsed
which massively speeds up time to print help output for each command.
Speeds up runtime of individual commands by moving command-specific
imports into each command's function, avoiding expensive imports like
HDBSCAN when they won't be used.
@huddlej huddlej merged commit fd2baa1 into main Jul 29, 2024
4 checks passed
@huddlej huddlej deleted the cluster-distances branch July 29, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant