Skip to content

Commit

Permalink
docs: add memory profile (#841)
Browse files Browse the repository at this point in the history
* docs: add memory profile

* docs: add memory profile

* docs: remove reference and polish words
  • Loading branch information
jemmyshin authored Oct 11, 2022
1 parent 7ee58c8 commit 87fdc54
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions docs/user-guides/retriever.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,22 @@ The results will look like this, the most relevant doc is "she smiled, with pain
You can set the `limit` parameter (default is `10`) to control the number of the most similar documents to be retrieved.


### Memory Estimation

Here, we will show how to estimate the memory usage of `AnnLite` indexer.
This is useful for determining the amount of memory required for indexing and querying.

In `AnnLite`, the memory usage is determined by the following two components:

- `HNSW` indexer: N * 1.1 * (4 bytes * `dimension` + 8 bytes * `max_connection`), where N is the number of embedding vectors, `dimension` is the dimension of the embedding vectors, and `max_connection` is the maximum number of connections in the graph.
- `cell_table`: it's almost linear to the number of columns and number of data. If the default setting is used (no columns used for filtering), the memory usage of `cell_table` is 0.12GB per million data.
Columns used for filtering are stored in string type so the memory usage is depended on the length of the string.

```{Notice}
If you use `AnnLiteIndexer` in your Jina Flow, the memory usage will be slightly higher since we keep a `SQLite` table in memory in order to indexing in `DocumentArray`.
```


## Support large-scale dataset

When we want to index a large number of documents, for example, 100 million data or even 1 billion data,
Expand Down

0 comments on commit 87fdc54

Please sign in to comment.