diff --git a/spiceaidocs/docs/components/data-connectors/file.md b/spiceaidocs/docs/components/data-connectors/file.md index 169a1048..2d208d53 100644 --- a/spiceaidocs/docs/components/data-connectors/file.md +++ b/spiceaidocs/docs/components/data-connectors/file.md @@ -4,10 +4,8 @@ sidebar_label: 'File Data Connector' description: 'File Data Connector Documentation' --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; -The File Data Connector enables federated SQL queries on files stored by locally accessible filesystems. It supports querying individual files or entire directories, where all child files within the directory will be loaded and queried. +The File Data Connector enables federated/accelerated SQL queries on files stored by locally accessible filesystems. It supports querying individual files or entire directories, where all child files within the directory will be loaded and queried. File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). @@ -19,20 +17,45 @@ datasets: name: customer params: file_format: parquet +``` + +## Configuration + +### `from` + +The `from` field for the File connector takes the form `file://path` where `path` is the path to the file to read from. See the [examples](#examples) below for examples of relative and absolute paths + +### `name` - - from: file://path/to/orders.csv - name: orders +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: file://path/to/customer.parquet + name: cool_dataset params: - file_format: csv - csv_has_header: false + ... ``` -## Parameters +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` -| Parameter name | Description | -|------------------------|-------------------------------------------------------------------------------------------------------| -| `file_format` | Specifies the data file format. Required if the format cannot be inferred from the `from` path. | -| `hive_partitioning_enabled`| Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` | +### `params` + +| Parameter name | Description | +| --------------------------- | ------------------------------------------------------------------------------------------------ | +| `file_format` | Specifies the data file format. Required if the format cannot be inferred from the `from` path. | +| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` | For CSV-specific parameters, see [CSV Parameters](/reference/file_format.md#csv). @@ -52,3 +75,39 @@ datasets: ``` When the file is modified, the acceleration will be refreshed and will include the latest data. + +## Examples + +### Absolute path + +In this example, `path` is an absolute path to the file on the filesystem. + +```yaml +datasets: + - from: file://path/to/customer.parquet + name: customer + params: + file_format: parquet +``` + +### Relative path + +In this example, the path is relative to the directory where the `spicepod.yaml` is located. + +```bash +├── foo +│   └── yellow_tripdata_2024-01.parquet +└── spicepod.yaml +``` + +```yaml +datasets: + - from: file:foo/yellow_tripdata_2024-01.parquet + name: trip_data + params: + file_format: parquet +``` + +## Quickstarts and Samples + +Refer to the [File quickstart](https://github.com/spiceai/quickstarts/tree/trunk/file) to see an example of the File connector in use. \ No newline at end of file diff --git a/spiceaidocs/docs/components/data-connectors/ftp.md b/spiceaidocs/docs/components/data-connectors/ftp.md index 100162bd..7d9a55d5 100644 --- a/spiceaidocs/docs/components/data-connectors/ftp.md +++ b/spiceaidocs/docs/components/data-connectors/ftp.md @@ -4,69 +4,108 @@ sidebar_label: 'FTP/SFTP Data Connector' description: 'FTP/SFTP Data Connector Documentation' --- -import Tabs from '@theme/Tabs'; -import TabItem from '@theme/TabItem'; +FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) are network protocols used for transferring files between a client and server, with FTP being less secure and SFTP providing encrypted file transfer over SSH. -The FTP/SFTP Data Connector enables federated SQL query across Parquet/CSV files stored in FTP/SFTP servers. +The FTP/SFTP Data Connector enables federated/accelerated SQL query across [supported file formats](/components/data-connectors/index.md#object-store-file-formats) stored in FTP/SFTP servers. -If a folder is provided, all child Parquet/CSV files will be loaded. +```yaml +datasets: + - from: sftp://remote-sftp-server.com/path/to/folder/ + name: my_dataset + params: + file_format: csv + sftp_port: 22 + sftp_user: my-sftp-user + sftp_pass: ${secrets:my_sftp_password} +``` ## Configuration - - - ### Parameters - - The connection to FTP can be configured by providing the following params: - - - `file_format`: Specifies the data file format. Required if the format cannot be inferred by from the `from` path. See [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). - - `ftp_port`: Optional, specifies the port of the FTP server. Default is 21. E.g. `ftp_port: 21` - - `ftp_user`: The username for the FTP server. E.g. `ftp_user: my-ftp-user` - - `ftp_pass`: The password for the FTP server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_ftp_pass}`. - - `client_timeout`: Optional. Specifies timeout for FTP connection. E.g. `client_timeout: 30s`. When not set, no timeout will be configured for FTP client. - - `hive_partitioning_enabled`: Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` - - More CSV related parameters can be configured, see [CSV Parameters](/reference/file_format.md#csv) - - ### Examples - ```yaml - - from: ftp://remote-ftp-server.com/path/to/folder/ - name: my_dataset - params: - file_format: csv - ftp_user: my-ftp-user - ftp_pass: ${secrets:my_ftp_password} - hive_partitioning_enabled: false - ``` - - - - ### Parameters - - The connection to SFTP can be configured by providing the following params: - - - `file_format`: Optional, specifies the requested file format. - - `parquet`: (default) Parquet file format. - - `csv`: CSV file format. - - `sftp_port`: Optional, specifies the port of the SFTP server. Default is 22. E.g. `sftp_port: 22` - - `sftp_user`: The username for the SFTP server. E.g. `sftp_user: my-sftp-user` - - `sftp_pass`: The password for the SFTP server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_sftp_pass}`. - - `client_timeout`: Optional. Specifies timeout for SFTP connection. E.g. `client_timeout: 30s`. When not set, no timeout will be configured for SFTP client. - - `hive_partitioning_enabled`: Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` - - More CSV related parameters can be configured, see [CSV Parameters](/reference/file_format.md#csv) - - ### Examples - ```yaml - - from: sftp://remote-sftp-server.com/path/to/folder/ - name: my_dataset - params: - file_format: csv - sftp_port: 20 - sftp_user: my-sftp-user - sftp_pass: ${secrets:my_sftp_password} - hive_partitioning_enabled: true - ``` - - - +### `from` + +The `from` field takes one of two forms: `ftp:///` or `sftp:///` where `` is the host to connect to and `` is the path to the file or directory to read from. + +If a folder is provided, all child files will be loaded. + +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: sftp://remote-sftp-server.com/path/to/folder/ + name: cool_dataset + params: + ... +``` + +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` + +### `params` + +#### FTP + +| Parameter Name | Description | +| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `file_format` | Specifies the data file format. Required if the format cannot be inferred by from the `from` path. See [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). | +| `ftp_port` | Optional, specifies the port of the FTP server. Default is 21. E.g. `ftp_port: 21` | +| `ftp_user` | The username for the FTP server. E.g. `ftp_user: my-ftp-user` | +| `ftp_pass` | The password for the FTP server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_ftp_pass}`. | +| `client_timeout` | Optional. Specifies timeout for FTP connection. E.g. `client_timeout: 30s`. When not set, no timeout will be configured for FTP client. | +| `hive_partitioning_enabled` | Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` | + +#### SFTP +| Parameter Name | Description | +| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `file_format` | Specifies the data file format. Required if the format cannot be inferred by from the `from` path. See [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). | +| `sftp_port` | Optional, specifies the port of the SFTP server. Default is 22. E.g. `sftp_port: 22` | +| `sftp_user` | The username for the SFTP server. E.g. `sftp_user: my-sftp-user` | +| `sftp_pass` | The password for the SFTP server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_sftp_pass}`. | +| `client_timeout` | Optional. Specifies timeout for SFTP connection. E.g. `client_timeout: 30s`. When not set, no timeout will be configured for SFTP client. | +| `hive_partitioning_enabled` | Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` | + +## Examples + +### Connecting to FTP + +```yaml + - from: ftp://remote-ftp-server.com/path/to/folder/ + name: my_dataset + params: + file_format: csv + ftp_user: my-ftp-user + ftp_pass: ${secrets:my_ftp_password} + hive_partitioning_enabled: false +``` + +### Connecting to SFTP + +```yaml + - from: sftp://remote-sftp-server.com/path/to/folder/ + name: my_dataset + params: + file_format: csv + sftp_port: 22 + sftp_user: my-sftp-user + sftp_pass: ${secrets:my_sftp_password} + hive_partitioning_enabled: false +``` + +## Quickstarts and Samples + +Refer to the [FTP quickstart](https://github.com/spiceai/quickstarts/tree/trunk/ftp) to see an example of the FTP connector in use. + +## Secrets + +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file diff --git a/spiceaidocs/docs/components/data-connectors/https.md b/spiceaidocs/docs/components/data-connectors/https.md index dc79c4cd..58754be6 100644 --- a/spiceaidocs/docs/components/data-connectors/https.md +++ b/spiceaidocs/docs/components/data-connectors/https.md @@ -5,26 +5,76 @@ description: 'HTTP(s) Data Connector Documentation' pagination_prev: null --- -The HTTP(s) Data Connector enables federated SQL query against a variety of tabular formatted (e.g. Parquet/CSV) files stored at a HTTP endpoint. +The HTTP(s) Data Connector enables federated/accelerated SQL query across [supported file formats](/components/data-connectors/index.md#object-store-file-formats) stored at an HTTP(s) endpoint. -The connector supports Basic HTTP authentication via `param` values. +```yaml +datasets: + - from: http://static_username@localhost:3001/report.csv + name: local_report + params: + http_password: ${env:MY_HTTP_PASS} +``` + +## Configuration + +### `from` + +The `from` field must contain a valid URI to the location of a [supported file](/components/data-connectors/index.md#object-store-file-formats). For example, `http://static_username@localhost:3001/report.csv`. + +### `name` + +The dataset name. This will be used as the table name within Spice. + +Example: +```yaml +datasets: + - from: http://static_username@localhost:3001/report.csv + name: cool_dataset + params: + ... +``` -### Parameters +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` -- `http_port`: Optional. Port to create HTTP(s) connection over. Default: 80 and 443 for HTTP and HTTPS respectively. -- `http_username`: Optional. Username to provide connection for HTTP basic authentication. Default: None. -- `http_password`: Optional. Password to provide connection for HTTP basic authentication. Default: None. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_http_pass}`. -- `client_timeout`: Optional. Specifies timeout for HTTP operations. Default value is `30s` E.g. `client_timeout: 60s` +### `params` -### Examples +The connector supports Basic HTTP authentication via `param` values. +| Parameter Name | Description | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `http_port` | Optional. Port to create HTTP(s) connection over. Default: 80 and 443 for HTTP and HTTPS respectively. | +| `http_username` | Optional. Username to provide connection for HTTP basic authentication. Default: None. | +| `http_password` | Optional. Password to provide connection for HTTP basic authentication. Default: None. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_http_pass}`. | +| `client_timeout` | Optional. Specifies timeout for HTTP operations. Default value is `30s` E.g. `client_timeout: 60s` | + +## Examples + +### Basic example ```yaml datasets: - from: https://github.com/LAION-AI/audio-dataset/raw/7fd6ae3cfd7cde619f6bed817da7aa2202a5bc28/metadata/freesound/parquet/freesound_parquet.parquet name: laion_freesound +``` +### Using Basic Authentication +```yaml +datasets: - from: http://static_username@localhost:3001/report.csv name: local_report params: http_password: ${env:MY_HTTP_PASS} ``` + +## Secrets + +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file