Open-source Text Embeddings

We made a free OpenAI-like API so you can instantly use better, open-source text embedding models.

Model	MTEB Score	Dimensions	Seq Length	Cost per 10M tokens
bge-large-en	63.98%	1024	512	$0
bge-base-en	63.36%	768	512	$0
gte-large	63.13%	1024	512	$0
gte-base	62.39%	768	512	$0
text-embedding-ada-002	60.99%	1536	8192	$1

Legend

Model is the name of the model to use in the API. If it has a red background, we do not support it.

MTEB Score is the Massive Text Embedding Benchmark score. It is the average of the 10 downstream tasks in the Text Embedding Benchmark.

Dimensions is the number of dimensions of the embedding vectors. Lower is better, as it is cheaper to store in a vector DB.

Seq Length is the maximum number of tokens in a sequence. Typically anything above 256 is enough for most use cases, as most embedding models perform poorly with large chunks of text.

Cost is the cost per 10M tokens.

Limitations

To keep this API free and available to as many people as possible, you may only make 1,000 requests per hour. If this is insufficient for your needs, feel free to self-host on your own infrastructure. We recommend at least 4 CPU cores and 16GB of RAM. A GPU is not required, but will speed up requests.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
scripts		scripts
site		site
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
depot.json		depot.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-source Text Embeddings

Legend

Limitations

About

Releases

Packages

Languages

License

Trelent/opentextembeddings

Folders and files

Latest commit

History

Repository files navigation

Open-source Text Embeddings

Legend

Limitations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages