diff --git a/spiceaidocs/docs/data-connectors/s3.md b/spiceaidocs/docs/data-connectors/s3.md index 96df0fee..a9a4279b 100644 --- a/spiceaidocs/docs/data-connectors/s3.md +++ b/spiceaidocs/docs/data-connectors/s3.md @@ -4,25 +4,47 @@ sidebar_label: 'S3 Data Connector' description: 'S3 Data Connector Documentation' --- -S3 as a connector for federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. Minio, Cloudflare R2). +The S3 Data Connector enables federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. MinIO, Cloudflare R2). -## `params` +Support for Iceberg, CSV, and other file-formats are on the roadmap. -- `endpoint`: The S3 endpoint, or equivalent (e.g. Minio endpoint), for the S3-compatible storage. -- `region`: Region of the S3 bucket, if region specific. +If a folder is provided, all child Parquet files will be loaded. -## `auth` +## Dataset Schema Reference -Check [Secrets Stores](/secret-stores). +### `from` -Required attributes: +The S3-compatible URI to a folder or object in form `from: s3:///` -- `key`: The access key authorised to access the S3 data (e.g. `AWS_ACCESS_KEY_ID` for AWS) -- `secret`The secret key authorised to access the S3 data (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) +Example: `from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet` -## Example +### `name` -### Minio +The dataset name. + +Example: `name: cool_dataset` + +### `params` (optional) + +- `endpoint`: The S3 endpoint, or equivalent (e.g. MinIO endpoint), for the S3-compatible storage. E.g. `endpoint: https://my.minio.server` +- `region`: Region of the S3 bucket, if region specific. E.g. `region: us-east-1` + +### `auth` (optional) + +Not required for public endpoints. + +- `key`: The access key (e.g. `AWS_ACCESS_KEY_ID` for AWS) +- `secret`The secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) + +For endpoints protected by access keys, `key` and `secret` are required and must be passed using a [Secrets Store](/secret-stores) or via `spice s3 login`. Support for dataset specific authentication is on the roadmap. + +Example: `spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY` + +## Examples + +### MinIO Example + +Create a dataset named `cool_dataset` from a Parquet file stored in MinIO. ```yaml - from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet @@ -32,12 +54,24 @@ Required attributes: region: 'us-east-1' # Best practice for Minio ``` -#### S3 +### S3 Authenticated Example + +Create a dataset named `cool_dataset` from a protected Parquet file stored in S3. + +First, log in using `spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY` then use the dataset: ```yaml - from: s3://my-startups-data/path/to/parquet/cool_dataset.parquet name: cool_dataset params: - endpoint: http://my-startups-data.s3.amazonaws.com - region: 'ap-southeast-2' + region: 'us-east-1' +``` + +### S3 Public Example + +Create a dataset named `taxi_trips` from a public S3 folder. + +```yaml +- from: s3://spiceai-demo-datasets/taxi_trips/2024/ + name: taxi_trips ```