Skip to content

Commit

Permalink
Updated for latest LLM
Browse files Browse the repository at this point in the history
  • Loading branch information
simonw authored Sep 26, 2023
1 parent 665ed19 commit 2fa1004
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions llms/embed-paragraphs.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,11 +111,13 @@ llm embed-multi blog-paragraphs -m lv2 \
```
This creates a new collection of embeddings called `blog-paragraphs` using the E5-large-v2 embedding model. Then it reads through the CSV file and generates embeddings for each line. The `--store` line causes it to store the original text content in the database as well.

This took about 25 minutes to run (on an M2 Macbook Pro with 64GB of RAM). The result was an embeddings collection containing 18,918 rows.
The calculated embeddings will be stored in the default embeddings database, which on a Mac is at `~/Library/Application Support/io.datasette.llm/embeddings.db`. You can add `-d my-embeddings.db` to store them in a different location.

The command took about 25 minutes to run (on an M2 Macbook Pro with 64GB of RAM). The result was an embeddings collection containing 18,918 rows.

Here's a query to preview three rows from that collection:
```bash
sqlite-utils "$(llm embed-db path)" '
sqlite-utils "$(llm collections path)" '
select id, content, length(embedding)
from embeddings where collection_id = (
select id from collections where name = "blog-paragraphs"
Expand Down

0 comments on commit 2fa1004

Please sign in to comment.