Skip to content

danielgross/embedland

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

embedland

Theoretically this is a universe of code for playing with embeddings. In reality it contains one file. More to come, I hope.

bench.py

This file benchmarks various embeddings using the Enron email corpus. Once you install the various libraries it needs, you can run it with python bench.py. It will:

  • Download the Enron email dataset.
  • Unzip it.
  • Attempt to run embeddings on it (with OpenAI's embedder as a default, you can change that at the end of the file to T5, or some other engine.)
  • Cluster the embeddings.
  • Label the clusters by sampling the subject lines from the clusters and sending them to GPT-3.
  • Show you a pretty chart, like the one you see above.

viz.py

Visualization helper. This file helps you go from "a list of embeddings" to "something pretty to look at".

TODO:

  • Make longer embeddings work by chunking and averaging out the results.

About

A collection of text embedding experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages