From e203b1b92acf0064e89c3714d381c51d127e1c50 Mon Sep 17 00:00:00 2001 From: Scott Lyons Date: Fri, 15 Nov 2024 17:37:01 -0800 Subject: [PATCH] Standardizing and enhancing Database connector documentation (#610) * Enhancing MSSQL documentation * Standardizing and enhancing MySQL connector docs * Standardizing and enhancing Postgres docs * Standardizing ODBC documentation * Updating secrets section * Update spiceaidocs/docs/components/data-connectors/mssql.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/mssql.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/mssql.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/odbc.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/mysql.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/odbc.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/postgres/index.md Co-authored-by: Phillip LeBlanc * Update spiceaidocs/docs/components/data-connectors/postgres/index.md Co-authored-by: Phillip LeBlanc * Updating postgres based on suggestions * Fixing MSSQL `from` section * Fixing MySQL `from` section * Updating Postgres `from` section with clearer table names * Adding Postgres example to ODBC * Re-adding ODBC docker section * Improving information about the default database --------- Co-authored-by: Phillip LeBlanc --- .../docs/components/data-connectors/mssql.md | 72 +++++-- .../docs/components/data-connectors/mysql.md | 178 +++++++++++++----- .../docs/components/data-connectors/odbc.md | 151 ++++++++++++--- .../data-connectors/postgres/index.md | 170 +++++++++++------ 4 files changed, 414 insertions(+), 157 deletions(-) diff --git a/spiceaidocs/docs/components/data-connectors/mssql.md b/spiceaidocs/docs/components/data-connectors/mssql.md index 60e0cf11..c98188cd 100644 --- a/spiceaidocs/docs/components/data-connectors/mssql.md +++ b/spiceaidocs/docs/components/data-connectors/mssql.md @@ -4,7 +4,16 @@ sidebar_label: 'Microsoft SQL Server' description: 'Microsoft SQL Server Data Connector' --- -The Microsoft SQL Server Data Connector enables federated SQL queries on data stored in [Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server) databases. +[Microsoft SQL Server](https://www.microsoft.com/en-us/sql-server) is a relational database management system developed by Microsoft. + +The Microsoft SQL Server Data Connector enables federated/accelerated SQL queries on data stored in MSSQL databases. + +:::warning[Limitations] + +1. The connector supports SQL Server authentication (SQL Login and Password) only. +1. Spatial types (`geography`) are not supported, and columns with these types will be ignored. + +::: ```yaml datasets: @@ -16,21 +25,51 @@ datasets: ## Configuration +### `from` + +The `from` field takes the form `mssql:database.schema.table` where `database.schema.table` is the fully-qualified table name in the SQL server. + +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: mssql:path.to.my_dataset + name: cool_dataset + params: + ... +``` + +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` + +### `params` + The data connector supports the following `params`. Use the [secret replacement syntax](../secret-stores/index.md) to load the secret from a secret store, e.g. `${secrets:my_mssql_conn_string}`. -- `mssql_connection_string`: The ADO connection string to use to connect to the server. This can be used instead of providing individual connection parameters. -- `mssql_host`: The hostname or IP address of the Microsoft SQL Server instance. -- `mssql_port`: (Optional) The port of the Microsoft SQL Server instance. Default value is 1433. -- `mssql_database`: (Optional) The name of the database to connect to. The default database will be used if not specified. -- `mssql_username`: The username for the SQL Server authentication. -- `mssql_password`: The password for the SQL Server authentication. -- `mssql_encrypt`: (Optional) Specifies whether encryption is required for the connection. - - `true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect. - - `false`: This mode will not attempt to use an SSL connection, even if the server supports it. Only the login procedure is encrypted -- `mssql_trust_server_certificate`: Optional parameter to specify whether the server certificate should be trusted without validation when encryption is enabled - - `true`: The server certificate will not be validated and it is accepted as-is - - `false`: (default) Server certificate will be validated against system's certificate storage +| Parameter Name | Description | +| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `mssql_connection_string` | The ADO connection string to use to connect to the server. This can be used instead of providing individual connection parameters. | +| `mssql_host` | The hostname or IP address of the Microsoft SQL Server instance. | +| `mssql_port` | (Optional) The port of the Microsoft SQL Server instance. Default value is 1433. | +| `mssql_database` | (Optional) The name of the database to connect to. The default database (`master`) will be used if not specified. | +| `mssql_username` | The username for the SQL Server authentication. | +| `mssql_password` | The password for the SQL Server authentication. | +| `mssql_encrypt` | (Optional) Specifies whether encryption is required for the connection.
  • `true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
  • `false`: This mode will not attempt to use an SSL connection, even if the server supports it. Only the login procedure is encrypted.
| +| `mssql_trust_server_certificate` | (Optional) Specifies whether the server certificate should be trusted without validation when encryption is enabled.
  • `true`: The server certificate will not be validated and it is accepted as-is.
  • `false`: (default) Server certificate will be validated against system's certificate storage.
| +### Example ```yaml datasets: @@ -45,9 +84,6 @@ datasets: mssql_trust_server_certificate: true ``` -:::warning[Limitations] +## Secrets -1. The connector supports SQL Server authentication (SQL Login and Password) only. -1. Spatial types (`geography`) are not supported, and columns with these types will be ignored. - -::: +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file diff --git a/spiceaidocs/docs/components/data-connectors/mysql.md b/spiceaidocs/docs/components/data-connectors/mysql.md index c2c4068a..a74391e3 100644 --- a/spiceaidocs/docs/components/data-connectors/mysql.md +++ b/spiceaidocs/docs/components/data-connectors/mysql.md @@ -4,13 +4,13 @@ sidebar_label: 'MySQL Data Connector' description: 'MySQL Data Connector Documentation' --- -## Federated SQL query +MySQL is an open-source relational database management system that uses structured query language (SQL) for managing and manipulating databases. -To connect to any MySQL database as connector for federated SQL query, specify `mysql` as the selector in the `from` value for the dataset. +The MySQL Data Connector enables federated/accelerated SQL queries on data stored in MySQL databases. ```yaml datasets: - - from: mysql:path.to.my_dataset + - from: mysql:mytable name: my_dataset params: mysql_host: localhost @@ -22,21 +22,115 @@ datasets: ## Configuration +### `from` + +The `from` field takes the form `mysql:database_name.table_name` where `database_name` is the fully-qualified table name in the SQL server. + +If the `database_name` is omitted in the `from` field, the connector will use the database specified in the `mysql_db` parameter. If the `mysql_db` parameter is not provided, it will default to the user's default database. + +These two examples are identical: + +```yaml +datasets: + - from: mysql:mytable + name: my_dataset + params: + mysql_db: my_database + ... +``` + +```yaml +datasets: + - from: mysql:my_database.mytable + name: my_dataset + params: + ... +``` + +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: mysql:path.to.my_dataset + name: cool_dataset + params: + ... +``` + +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` + +### `params` + The MySQL data connector can be configured by providing the following `params`. Use the [secret replacement syntax](../secret-stores/index.md) to load the secret from a secret store, e.g. `${secrets:my_mysql_conn_string}`. -- `mysql_connection_string`: The connection string to use to connect to the MySQL server. This can be used instead of providing individual connection parameters. -- `mysql_host`: The hostname of the MySQL server. -- `mysql_tcp_port`: The port of the MySQL server. -- `mysql_db`: The name of the database to connect to. -- `mysql_user`: The MySQL username. -- `mysql_pass`: The password to connect with. -- `mysql_sslmode`: Optional. Specifies the SSL/TLS behavior for the connection, supported values: - - `required`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect. - - `preferred`: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL. - - `disabled`: This mode will not attempt to use an SSL connection, even if the server supports it. -- `mysql_sslrootcert`: Optional parameter specifying the path to a custom PEM certificate that the connector will trust. +| Parameter Name | Description | +| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `mysql_connection_string` | The connection string to use to connect to the MySQL server. This can be used instead of providing individual connection parameters. | +| `mysql_host` | The hostname of the MySQL server. | +| `mysql_tcp_port` | The port of the MySQL server. | +| `mysql_db` | The name of the database to connect to. | +| `mysql_user` | The MySQL username. | +| `mysql_pass` | The password to connect with. | +| `mysql_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:
  • `required`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
  • `preferred`: This mode will try to establish a secure SSL connection if possible, but will connect insecurely if the server does not support SSL.
  • `disabled`: This mode will not attempt to use an SSL connection, even if the server supports it.
| +| `mysql_sslrootcert` | Optional parameter specifying the path to a custom PEM certificate that the connector will trust. | -Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query. +## Types + +The table below shows the MySQL data types supported, along with the type mapping to Apache Arrow types in Spice. + +| MySQL Type | Arrow Type | +| ------------ | ------------------------------ | +| `TINYINT` | `Int8` | +| `SMALLINT` | `Int16` | +| `INT` | `Int32` | +| `MEDIUMINT` | `Int32` | +| `BIGINT` | `Int64` | +| `DECIMAL` | `Decimal128` / `Decimal256` | +| `FLOAT` | `Float32` | +| `DOUBLE` | `Float64` | +| `DATETIME` | `Timestamp(Microsecond, None)` | +| `TIMESTAMP` | `Timestamp(Microsecond, None)` | +| `YEAR` | `Int16` | +| `TIME` | `Time64(Nanosecond)` | +| `DATE` | `Date32` | +| `CHAR` | `Utf8` | +| `BINARY` | `Binary` | +| `VARCHAR` | `Utf8` | +| `VARBINARY` | `Binary` | +| `TINYBLOB` | `Binary` | +| `TINYTEXT` | `Utf8` | +| `BLOB` | `Binary` | +| `TEXT` | `Utf8` | +| `MEDIUMBLOB` | `Binary` | +| `MEDIUMTEXT` | `Utf8` | +| `LONGBLOB` | `LargeBinary` | +| `LONGTEXT` | `LargeUtf8` | +| `SET` | `Utf8` | +| `ENUM` | `Dictionary(UInt16, Utf8)` | +| `BIT` | `UInt64` | + +:::note + +- MySQL `TIMESTAMP` value is the local time to the MySQL server timezone, the corresponding arrow `Timestamp(Microsecond, None)` type has the same local time value as MySQL `TIMESTAMP` value. + +::: + +## Examples + +### Connecting using username and password ```yaml datasets: @@ -50,6 +144,8 @@ datasets: mysql_pass: ${secrets:mysql_pass} ``` +### Connecting using SSL + ```yaml datasets: - from: mysql:path.to.my_dataset @@ -64,6 +160,8 @@ datasets: mysql_sslrootcert: ./custom_cert.pem ``` +### Connecting using a Connection String + ```yaml datasets: - from: mysql:path.to.my_dataset @@ -72,43 +170,19 @@ datasets: mysql_connection_string: mysql://${secrets:my_user}:${secrets:my_password}@localhost:3306/my_db ``` -## Types - -The table below shows the MySQL data types supported, along with the type mapping to Apache Arrow types in Spice. - -| MySQL Type | Arrow Type | -| ---------- | ---------------------------- | -| TINYINT | Int8 | -| SMALLINT | Int16 | -| INT | Int32 | -| MEDIUMINT | Int32 | -| BIGINT | Int64 | -| DECIMAL | Decimal128 / Decimal256 | -| FLOAT | Float32 | -| DOUBLE | Float64 | -| DATETIME | Timestamp(Microsecond, None) | -| TIMESTAMP | Timestamp(Microsecond, None) | -| YEAR | Int16 | -| TIME | Time64(Nanosecond) | -| DATE | Date32 | -| CHAR | Utf8 | -| BINARY | Binary | -| VARCHAR | Utf8 | -| VARBINARY | Binary | -| TINYBLOB | Binary | -| TINYTEXT | Utf8 | -| BLOB | Binary | -| TEXT | Utf8 | -| MEDIUMBLOB | Binary | -| MEDIUMTEXT | Utf8 | -| LONGBLOB | LargeBinary | -| LONGTEXT | LargeUtf8 | -| SET | Utf8 | -| ENUM | Dictionary(UInt16, Utf8) | -| BIT | UInt64 | +### Connecting to the default database -:::note +```yaml +datasets: + - from: mysql:mytable + name: my_dataset + params: + mysql_host: localhost + mysql_tcp_port: 3306 + mysql_user: my_user + mysql_pass: ${secrets:mysql_pass} +``` -- MySQL `TIMESTAMP` value is the local time to the MySQL server timezone, the corresponding arrow `Timestamp(Microsecond, None)` type has the same local time value as MySQL `TIMESTAMP` value. +## Secrets -::: +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). diff --git a/spiceaidocs/docs/components/data-connectors/odbc.md b/spiceaidocs/docs/components/data-connectors/odbc.md index e8c606bc..dfee8fc1 100644 --- a/spiceaidocs/docs/components/data-connectors/odbc.md +++ b/spiceaidocs/docs/components/data-connectors/odbc.md @@ -4,24 +4,35 @@ sidebar_label: 'ODBC Data Connector' description: 'ODBC Data Connector Documentation' --- -## Setup +ODBC (Open Database Connectivity) is a standard API that allows applications to connect to and interact with various database management systems using a common interface. To connect to any ODBC database for federated/accelerated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required. + :::warning -ODBC support is not included in the released binaries. To use ODBC with Spice, you need to [checkout and compile the code](https://github.com/spiceai/spiceai/blob/trunk/CONTRIBUTING.md#building) with the `--features odbc` flag (`cargo build --release --features odbc`). +Spice must be [built with the `odbc` feature](#building-spice-with-odbc), and the host/container must have a [valid ODBC configuration](https://www.unixodbc.org/odbcinst.html). Alternatively, use the official Spice Docker image. To use the official Spice Docker image from [DockerHub](https://hub.docker.com/r/spiceai/spiceai): -```bash # Pull the latest official Spice image +```bash docker pull spiceai/spiceai:latest +``` -# Pull the official v0.17.1-beta Spice image -docker pull spiceai/spiceai:0.17.1-beta +# Pull the official v0.20.0-beta Spice image +```bash +docker pull spiceai/spiceai:0.20.0-beta ``` ::: +```yaml +datasets: + - from: odbc:path.to.my_dataset + name: my_dataset + params: + odbc_connection_string: Driver={Foo Driver};Host=db.foo.net;Param=Value +``` + An ODBC connection requires a compatible ODBC driver and valid driver configuration. ODBC drivers are available from their respective vendors. Here are a few examples: - [PostgreSQL](https://odbc.postgresql.org/) @@ -34,18 +45,6 @@ Non-Windows systems additionally require the installation of an ODBC Driver Mana - Ubuntu: `sudo apt-get install unixodbc` - MacOS: `brew install unixodbc` -## Federated SQL query - -To connect to any ODBC database for federated SQL queries, specify `odbc` as the selector in the `from` value for the dataset. The `odbc_connection_string` parameter is required. Spice must be built with the `odbc` feature, and the host/container must have a [valid ODBC configuration](https://www.unixodbc.org/odbcinst.html). - -```yaml -datasets: - - from: odbc:path.to.my_dataset - name: my_dataset - params: - odbc_connection_string: Driver={Foo Driver};Host=db.foo.net;Param=Value -``` - :::info For the best `JOIN` performance, ensure all ODBC datasets from the same database are configured with the exact same `odbc_connection_string` in Spice. @@ -83,15 +82,47 @@ datasets: ## Configuration -In addition to the connection string, the following [arrow_odbc builder parameters](https://docs.rs/arrow-odbc/latest/arrow_odbc/struct.OdbcReaderBuilder.html) are exposed as params: +### `from` + +The `from` field takes the form `odbc:path.to.my.dataset` where `path.to.my.dataset` is the table name in the ODBC-supporting server to read from. + +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: odbc:my.cool.table + name: cool_dataset + params: + ... +``` + +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` + +### `params` -| Parameter | Type | Description | Default | -|-------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------| -| sql_dialect | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. | Unset (auto-detected) | -| odbc_max_bytes_per_batch | number (bytes) | Upper allocation limit for transit buffer. | `512_000_000` | -| odbc_max_num_rows_per_batch | number (rows) | Upper limit for number of rows fetched for one batch. | `65536` | -| odbc_max_text_size | number (bytes) | Upper limit for value buffers bound to columns with text values. | Unset (allocates driver-reported max column size) | -| odbc_max_binary_size | number (bytes) | Upper limit for value buffers bound to columns with binary values. | Unset (allocates driver-reported max column size) | +The following [arrow_odbc builder parameters](https://docs.rs/arrow-odbc/latest/arrow_odbc/struct.OdbcReaderBuilder.html) are exposed as params: + +| Parameter | Type | Description | +| ----------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `sql_dialect` | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). | +| `odbc_max_bytes_per_batch` | number (bytes) | Upper allocation limit for transit buffer. Default is `512_000_000`. | +| `odbc_max_num_rows_per_batch` | number (rows) | Upper limit for number of rows fetched for one batch. Default is `65536`. | +| `odbc_max_text_size` | number (bytes) | Upper limit for value buffers bound to columns with text values. Default is unset (allocates driver-reported max column size). | +| `odbc_max_binary_size` | number (bytes) | Upper limit for value buffers bound to columns with binary values. Default is unset (allocates driver-reported max column size). | +| `odbc_connection_string` | string | Connection string to use to connect to the ODBC server | ```yaml datasets: @@ -101,7 +132,7 @@ datasets: odbc_connection_string: Driver={Foo Driver};Host=db.foo.net;Param=Value ``` -### Selecting SQL Dialect +## Selecting SQL Dialect The default SQL dialect may not be supported by every ODBC connection. The `sql_dialect` parameter allows overriding the selected SQL dialect for a specified connection. @@ -124,9 +155,24 @@ datasets: odbc_connection_string: Driver={Foo Driver};Host=db.foo.net;Param=Value ``` +## Building Spice with ODBC + +ODBC support is not included in the released binaries. To use ODBC with Spice, you need to [checkout and compile the code](https://github.com/spiceai/spiceai/blob/trunk/CONTRIBUTING.md#building) with the `--features odbc` flag (`cargo build --release --features odbc`). + +Alternatively, use the official Spice Docker image. To use the official Spice Docker image from [DockerHub](https://hub.docker.com/r/spiceai/spiceai): + +```bash +# Pull the latest official Spice image +docker pull spiceai/spiceai:latest + +# Pull the official v0.20.0-beta Spice image +docker pull spiceai/spiceai:0.20.0-beta +``` + + ## Baking an image with ODBC Support -There are many dozens of ODBC adapters; this recipe covers making your own image and configuring it to work with Spice. +There are many dozens of ODBC adapters; this recipe covers making a custom image and configuring it to work with Spice. ```Dockerfile FROM spiceai/spiceai:latest @@ -235,3 +281,54 @@ sql> select * from spice_test; Query took: 1.8504053329999999 seconds. 3/3 rows displayed. ``` + +## Examples + +### Connecting to an SQLite database + +```yaml +version: v1beta1 +kind: Spicepod +name: sqlite +datasets: +- from: odbc:spice_test + name: spice_test + mode: read + acceleration: + enabled: false + params: + odbc_connection_string: DRIVER={SQLite3};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes +``` + +### Connecting to Postgres + +Ensure that the Postgres ODBC driver is installed. On Unix systems, this will create an entry in `/etc/odbcinst.ini` similar to: + +```ini +[PostgreSQL Unicode] +Description=PostgreSQL ODBC driver (Unicode version) +Driver=psqlodbcw.so +Setup=libodbcpsqlS.so +Debug=0 +CommLog=1 +UsageCount=1 +``` + +Then, in your `spicepod.yaml` the `odbc_connection_string` parameter can be used for the ODBC connection string: + +```yaml +version: v1beta1 +kind: Spicepod +name: odbc-demo +datasets: +- from: odbc:taxi_trips + name: taxi_trips + params: + odbc_connection_string: Driver={PostgreSQL Unicode};Server=localhost;Port=5432;Database=spice_demo;Uid=postgres +``` + +See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc/README.md) for more help on getting started with ODBC and Postgres. + +## Secrets + +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file diff --git a/spiceaidocs/docs/components/data-connectors/postgres/index.md b/spiceaidocs/docs/components/data-connectors/postgres/index.md index 8c7f2e11..ee1483b6 100644 --- a/spiceaidocs/docs/components/data-connectors/postgres/index.md +++ b/spiceaidocs/docs/components/data-connectors/postgres/index.md @@ -4,92 +4,76 @@ sidebar_label: 'PostgreSQL Data Connector' description: 'PostgreSQL Data Connector Documentation' --- -## Dataset Source/Federated SQL Query +PostgreSQL is an advanced open-source relational database management system known for its robustness, extensibility, and support for SQL compliance. -To use PostgreSQL as a dataset source or for federated SQL query, specify `postgres` as the selector in the `from` value for the dataset. +The PostgreSQL Server Data Connector enables federated/accelerated SQL queries on data stored in PostgreSQL databases. ```yaml datasets: - - from: postgres:path.to.my_dataset + - from: postgres:my_table name: my_dataset + params: + ... ``` -:::warning[Limitations] - -- The Postgres federated queries may result in unexpected result types due to the difference in DataFusion and Postgres size increase rules. Please explicitly specify the expected output type of aggregation functions when writing query involving Postgres table in Spice. For example, rewrite `SUM(int_col)` into `CAST (SUM(int_col) as BIGINT`. - -::: - ## Configuration -The connection to PostgreSQL can be configured by providing the following `params`: +### `from` - +The `from` field takes the form `postgres:my_table` where `my_table` is the table identifer in the PostgreSQL server to read from. -- `pg_host`: The hostname of the PostgreSQL server. -- `pg_port`: The port of the PostgreSQL server. -- `pg_db`: The name of the database to connect to. -- `pg_user`: The username to connect with. -- `pg_pass`: The password to connect with. Use the [secret replacement syntax](../../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_pg_pass}`. -- `pg_sslmode`: Optional. Specifies the SSL/TLS behavior for the connection, supported values: - - `verify-full`: (default) This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate. - - `verify-ca`: This mode requires a TLS connection and a valid root certificate. - - `require`: This mode requires a TLS connection. - - `prefer`: This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS. - - `disable`: This mode will not attempt to use a TLS connection, even if the server supports it. -- `pg_sslrootcert`: Optional parameter specifying the path to a custom PEM certificate that the connector will trust. -- `connection_pool_size`: Optional. The maximum number of connections to keep open in the connection pool. Default is 10. - -Configuration `params` are provided either in the top level `dataset` for a dataset source and federated SQL query, or in the `acceleration` section for a data store. +The fully-qualified table name (`database.schema.table`) can also be used in the `from` field. ```yaml datasets: - - from: postgres:path.to.my_dataset + - from: postgres:my_database.my_schema.my_table name: my_dataset params: - pg_host: localhost - pg_port: 5432 - pg_db: my_database - pg_user: my_user - pg_pass: ${secrets:my_pg_pass} + ... ``` +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: ```yaml datasets: - - from: postgres:path.to.my_dataset - name: my_dataset + - from: postgres:my_database.my_schema.my_table + name: cool_dataset params: - pg_host: localhost - pg_port: 5432 - pg_db: my_database - pg_user: my_user - pg_pass: ${secrets:my_pg_pass} - pg_sslmode: verify-ca - pg_sslrootcert: ./custom_cert.pem + ... ``` -Specify different secrets for a PostgreSQL source and acceleration: +```sql +SELECT COUNT(*) FROM cool_dataset; +``` -```yaml -datasets: - - from: spice.ai:path.to.my_dataset - name: my_dataset - params: - pg_host: localhost - pg_port: 5432 - pg_db: data_store - pg_user: my_user - pg_pass: ${secrets:pg1_pass} - acceleration: - engine: postgres - params: - pg_host: localhost - pg_port: 5433 - pg_db: acceleration - pg_user: two_user_two_furious - pg_pass: ${secrets:pg2_pass} +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ ``` +### `params` + +The connection to PostgreSQL can be configured by providing the following `params`: + + + +| Parameter Name | Description | +| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `pg_host` | The hostname of the PostgreSQL server. | +| `pg_port` | The port of the PostgreSQL server. | +| `pg_db` | The name of the database to connect to. | +| `pg_user` | The username to connect with. | +| `pg_pass` | The password to connect with. Use the [secret replacement syntax](../../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_pg_pass}`. | +| `pg_sslmode` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:
  • `verify-full`: (default) This mode requires an SSL connection, a valid root certificate, and the server host name to match the one specified in the certificate.
  • `verify-ca`: This mode requires a TLS connection and a valid root certificate.
  • `require`: This mode requires a TLS connection.
  • `prefer`: This mode will try to establish a secure TLS connection if possible, but will connect insecurely if the server does not support TLS.
  • `disable`: This mode will not attempt to use a TLS connection, even if the server supports it.
| +| `pg_sslrootcert` | Optional parameter specifying the path to a custom PEM certificate that the connector will trust. | +| `connection_pool_size` | Optional. The maximum number of connections to keep open in the connection pool. Default is 10. | + ## Types The table below shows the PostgreSQL data types supported, along with the type mapping to Apache Arrow types in Spice. @@ -128,3 +112,69 @@ The table below shows the PostgreSQL data types supported, along with the type m | `geography` | `Binary` | | `enum` | `Dictionary(Int8, Utf8)` | | Composite Types | `Struct` | + +:::info + +The Postgres federated queries may result in unexpected result types due to the difference in DataFusion and Postgres size increase rules. Please explicitly specify the expected output type of aggregation functions when writing query involving Postgres table in Spice. For example, rewrite `SUM(int_col)` into `CAST (SUM(int_col) as BIGINT`. + +::: + +## Examples + +### Connecting using Username/Password + +```yaml +datasets: + - from: postgres:my_database.my_schema.my_table + name: my_dataset + params: + pg_host: localhost + pg_port: 5432 + pg_db: my_database + pg_user: my_user + pg_pass: ${secrets:my_pg_pass} +``` + +### Connect using SSL + +```yaml +datasets: + - from: postgres:my_database.my_schema.my_table + name: my_dataset + params: + pg_host: localhost + pg_port: 5432 + pg_db: my_database + pg_user: my_user + pg_pass: ${secrets:my_pg_pass} + pg_sslmode: verify-ca + pg_sslrootcert: ./custom_cert.pem +``` + +### Separate dataset/accelerator secrets + +Specify different secrets for a PostgreSQL source and acceleration: + +```yaml +datasets: + - from: postgres:my_schema.my_table + name: my_dataset + params: + pg_host: localhost + pg_port: 5432 + pg_db: my_database + pg_user: my_user + pg_pass: ${secrets:pg1_pass} + acceleration: + engine: postgres + params: + pg_host: localhost + pg_port: 5433 + pg_db: acceleration + pg_user: two_user_two_furious + pg_pass: ${secrets:pg2_pass} +``` + +## Secrets + +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file