embedland

Theoretically this is a universe of code for playing with embeddings. In reality it contains one file. More to come, I hope.

bench.py

This file benchmarks various embeddings using the Enron email corpus. Once you install the various libraries it needs, you can run it with python bench.py. It will:

Download the Enron email dataset.
Unzip it.
Attempt to run embeddings on it (with OpenAI's embedder as a default, you can change that at the end of the file to T5, or some other engine.)
Cluster the embeddings.
Label the clusters by sampling the subject lines from the clusters and sending them to GPT-3.
Show you a pretty chart, like the one you see above.

viz.py

Visualization helper. This file helps you go from "a list of embeddings" to "something pretty to look at".

TODO:

Make longer embeddings work by chunking and averaging out the results.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
bench.py		bench.py
viz.py		viz.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

embedland

bench.py

viz.py

TODO:

About

Releases

Packages

Languages

danielgross/embedland

Folders and files

Latest commit

History

Repository files navigation

embedland

bench.py

viz.py

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages