GitHub - koaning/cluestar: Gain clues from clustering!

cluestar

Gain a clue by clustering!

This library contains visualisation tools that might help you get started with classification tasks. The idea is that if you can inspect clusters easily, you might gain a clue on what good labels for your dataset might be!

It generates charts that looks like this:

There's even a fancy chart that can compare embedding techniques.

Install

python -m pip install cluestar

Interactive Demo

You can see an interactive demo of the generated widgets here.

You can also toy around with the demo notebook found here.

Usage

The first step is to encode textdata in two dimensions, like below.

from sklearn.pipeline import make_pipeline
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

pipe = make_pipeline(TfidfVectorizer(), TruncatedSVD(n_components=2))

X = pipe.fit_transform(texts)

From here you can make an interactive chart via;

from cluestar import plot_text

plot_text(X, texts)

The best results are likely found when you use umap together with something like universal sentence encoder.

You might also improve the understandability by highlighting points that have a certain word in it.

plot_text(X, texts, color_words=["plastic", "voucher", "deliver"])

You can also use a numeric array, one that contains proba-values for prediction, to influence the color.

# First, get an array of pvals from some model
p_vals = some_model.predict(texts)[:, 0]
# Use these to assign pretty colors.
plot_text(X, texts, color_array=p_vals)

You can also compare two embeddings interactively. To do this:

from cluestar import plot_text_comparison

plot_text(X1=X, X2=X, texts)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
cluestar		cluestar
data		data
docs		docs
notebooks		notebooks
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cluestar.png		cluestar.png
gif-multi.gif		gif-multi.gif
gif.gif		gif.gif
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cluestar

Install

Interactive Demo

Usage

About

Releases

Packages

Contributors 2

Languages

License

koaning/cluestar

Folders and files

Latest commit

History

Repository files navigation

cluestar

Install

Interactive Demo

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages