Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Postgres for the doc and index store #1706

Merged
merged 14 commits into from
Mar 14, 2024
Merged
2 changes: 2 additions & 0 deletions fern/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ navigation:
contents:
- page: Vector Stores
path: ./docs/pages/manual/vectordb.mdx
- page: Node Stores
path: ./docs/pages/manual/nodestore.mdx
- section: Advanced Setup
contents:
- page: LLM Backends
Expand Down
66 changes: 66 additions & 0 deletions fern/docs/pages/manual/nodestore.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## NodeStores
PrivateGPT supports **Simple** and [Postgres](https://www.postgresql.org/) providers. Simple being the default.

In order to select one or the other, set the `nodestore.database` property in the `settings.yaml` file to `simple` or `postgres`.

```yaml
nodestore:
database: simple
```

### Simple Document Store

Setting up simple document store: Persist data with in-memory and disk storage.

Enabling the simple document store is an excellent choice for small projects or proofs of concept where you need to persist data while maintaining minimal setup complexity. To get started, set the nodestore.database property in your settings.yaml file as follows:

```yaml
nodestore:
database: simple
```
The beauty of the simple document store is its flexibility and ease of implementation. It provides a solid foundation for managing and retrieving data without the need for complex setup or configuration. The combination of in-memory processing and disk persistence ensures that you can efficiently handle small to medium-sized datasets while maintaining data consistency across runs.

### Postgres Document Store

To enable Postgres, set the `nodestore.database` property in the `settings.yaml` file to `postgres` and install the `storage-nodestore-postgres` extra. Note: Vector Embeddings Storage in Postgres is configured separately

```bash
poetry install --extras storage-nodestore-postgres
```

The available configuration options are:
| Field | Description |
|---------------|-----------------------------------------------------------|
| **host** | The server hosting the Postgres database. Default is `localhost` |
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
| **database** | The specific database to connect to. Default is `postgres` |
| **user** | The username for database access. Default is `postgres` |
| **password** | The password for database access. (Required) |
| **schema_name** | The database schema to use. Default is `private_gpt` |

For example:
```yaml
nodestore:
database: postgres

postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: <PASSWORD>
schema_name: private_gpt
```

Given the above configuration, Two PostgreSQL tables will be created upon successful connection: one for storing metadata related to the index and another for document data itself.

```
postgres=# \dt private_gpt.*
List of relations
Schema | Name | Type | Owner
-------------+-----------------+-------+--------------
private_gpt | data_docstore | table | postgres
private_gpt | data_indexstore | table | postgres

postgres=#
```
4 changes: 2 additions & 2 deletions fern/docs/pages/manual/vectordb.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ By default `chroma` will use a disk-based database stored in local_data_path / "

### PGVector

To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `pgvector` and install the `pgvector` extra.
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `pgvector` and install the `vector-stores-postgres` extra.

```bash
poetry install --extras pgvector
poetry install --extras vector-stores-postgres
```

PGVector settings can be configured by setting values to the `pgvector` property in the `settings.yaml` file.
Expand Down
34 changes: 32 additions & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 49 additions & 16 deletions private_gpt/components/node_store/node_store_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from llama_index.core.storage.index_store.types import BaseIndexStore

from private_gpt.paths import local_data_path
from private_gpt.settings.settings import Settings

logger = logging.getLogger(__name__)

Expand All @@ -16,19 +17,51 @@ class NodeStoreComponent:
doc_store: BaseDocumentStore

@inject
def __init__(self) -> None:
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()

try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()
def __init__(self, settings: Settings) -> None:
match settings.nodestore.database:
case "simple":
try:
self.index_store = SimpleIndexStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local index store not found, creating a new one")
self.index_store = SimpleIndexStore()

try:
self.doc_store = SimpleDocumentStore.from_persist_dir(
persist_dir=str(local_data_path)
)
except FileNotFoundError:
logger.debug("Local document store not found, creating a new one")
self.doc_store = SimpleDocumentStore()

case "postgres":
try:
from llama_index.core.storage.docstore.postgres_docstore import (
PostgresDocumentStore,
)
from llama_index.core.storage.index_store.postgres_index_store import (
PostgresIndexStore,
)
except ImportError:
raise ImportError(
"Postgres dependencies not found, install with `poetry install --extras storage-nodestore-postgres`"
) from None

if settings.postgres is None:
raise ValueError("Postgres index/doc store settings not found.")

self.index_store = PostgresIndexStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)
self.doc_store = PostgresDocumentStore.from_params(
**settings.postgres.model_dump(exclude_none=True)
)

case _:
# Should be unreachable
# The settings validator should have caught this
raise ValueError(
f"Database {settings.nodestore.database} not supported"
)
19 changes: 14 additions & 5 deletions private_gpt/settings/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,10 @@ class VectorstoreSettings(BaseModel):
database: Literal["chroma", "qdrant", "pgvector"]


class NodeStoreSettings(BaseModel):
database: Literal["simple", "postgres"]


class LlamaCPPSettings(BaseModel):
llm_hf_repo_id: str
llm_hf_model_file: str
Expand Down Expand Up @@ -204,7 +208,7 @@ class UISettings(BaseModel):
)


class PGVectorSettings(BaseModel):
class PostgresSettings(BaseModel):
host: str = Field(
"localhost",
description="The server hosting the Postgres database",
Expand All @@ -225,14 +229,17 @@ class PGVectorSettings(BaseModel):
"postgres",
description="The database to use to connect to the Postgres database",
)
schema_name: str = Field(
"public",
description="The name of the schema in the Postgres database to use",
)


class PGVectorSettings(PostgresSettings):
embed_dim: int = Field(
384,
description="The dimension of the embeddings stored in the Postgres database",
)
schema_name: str = Field(
"public",
description="The name of the schema in the Postgres database where the embeddings are stored",
)
table_name: str = Field(
"embeddings",
description="The name of the table in the Postgres database where the embeddings are stored",
Expand Down Expand Up @@ -305,7 +312,9 @@ class Settings(BaseModel):
openai: OpenAISettings
ollama: OllamaSettings
vectorstore: VectorstoreSettings
nodestore: NodeStoreSettings
qdrant: QdrantSettings | None = None
postgres: PostgresSettings | None = None
pgvector: PGVectorSettings | None = None


Expand Down
8 changes: 7 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ llama-index-embeddings-openai = {version ="^0.1.6", optional = true}
llama-index-vector-stores-qdrant = {version ="^0.1.3", optional = true}
llama-index-vector-stores-chroma = {version ="^0.1.4", optional = true}
llama-index-vector-stores-postgres = {version ="^0.1.2", optional = true}
llama-index-storage-docstore-postgres = {version ="^0.1.2", optional = true}
llama-index-storage-index-store-postgres = {version ="^0.1.2", optional = true}
# Postgres
psycopg2-binary = {version ="^2.9.9", optional = true}
dbzoo marked this conversation as resolved.
Show resolved Hide resolved
asyncpg = {version="^0.29.0", optional = true}

# Optional Sagemaker dependency
boto3 = {version ="^1.34.51", optional = true}
# Optional UI
Expand All @@ -46,7 +52,7 @@ embeddings-sagemaker = ["boto3"]
vector-stores-qdrant = ["llama-index-vector-stores-qdrant"]
vector-stores-chroma = ["llama-index-vector-stores-chroma"]
vector-stores-postgres = ["llama-index-vector-stores-postgres"]

storage-nodestore-postgres = ["llama-index-storage-docstore-postgres","llama-index-storage-index-store-postgres","psycopg2-binary","asyncpg"]

[tool.poetry.group.dev.dependencies]
black = "^22"
Expand Down
43 changes: 43 additions & 0 deletions settings-ollama-pg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Using ollama and postgres for the vector, doc and index store. Ollama is also used for embeddings.
# To use install these extras:
# poetry install --extras "llms-ollama ui vector-stores-postgres embeddings-ollama storage-nodestore-postgres"
server:
env_name: ${APP_ENV:ollama}

llm:
mode: ollama
max_new_tokens: 512
context_window: 3900

embedding:
mode: ollama

ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434

nodestore:
database: postgres

vectorstore:
database: pgvector

pgvector:
host: localhost
port: 5432
database: postgres
user: postgres
password: admin
embed_dim: 768
schema_name: private_gpt
table_name: embeddings

postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: admin
schema_name: private_gpt

11 changes: 11 additions & 0 deletions settings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ huggingface:
vectorstore:
database: qdrant

nodestore:
database: simple

qdrant:
path: local_data/private_gpt/qdrant

Expand All @@ -69,6 +72,14 @@ pgvector:
schema_name: private_gpt
table_name: embeddings

postgres:
host: localhost
port: 5432
database: postgres
user: postgres
password: postgres
schema_name: private_gpt

sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
Expand Down
Loading