Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark docs #222

Merged
merged 2 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions spiceaidocs/docs/data-connectors/spark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
title: 'Apache Spark Connector'
sidebar_label: 'Apache Spark Connector'
description: 'Apache Spark Connector Documentation'
pagination_prev: null
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

Apache Spark as a connector for federated SQL query against a Spark Cluster using [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html)

## Configuration

The Apache Spark Connector can be used in two ways: specifying a plaintext connection string using the `spark_remote` parameter or specifying a `spark_remote` secret. The connector will fail if both configurations are set.


### Parameters
- `spark_remote`: A [spark remote](https://spark.apache.org/docs/latest/spark-connect-overview.html#set-sparkremote-environment-variable) connection URI

### Auth

Spark clusters configured to accept authenticated requests should not set `spark_remote` as an inline dataset param, as it will contain sensitive data. For this case, use a secret named `spark` with key `spark_remote`.

Check [Secrets Stores](/secret-stores) for more details.

<Tabs>
<TabItem value="local" label="Local" default>
```bash
spice login spark --spark_remote <spark-remote>
```

Learn more about [File Secret Store](/secret-stores/file).
</TabItem>
<TabItem value="env" label="Env">
```bash
SPICE_SECRET_SPARK_SPARK_REMOTE=<spark-remote> \
spice run
```

`spicepod.yaml`
```yaml
version: v1beta1
kind: Spicepod
name: spice-app

secrets:
store: env

# <...>
```

Learn more about [Env Secret Store](/secret-stores/env).
</TabItem>
<TabItem value="k8s" label="Kubernetes">
```bash
kubectl create secret generic spark \
--from-literal=spark_remote='<spark-remote>'
```

`spicepod.yaml`
```yaml
version: v1beta1
kind: Spicepod
name: spice-app

secrets:
store: kubernetes

# <...>
```

Learn more about [Kubernetes Secret Store](/secret-stores/kubernetes).
</TabItem>
<TabItem value="keyring" label="Keyring">
Add new keychain entry (macOS), with user and password in JSON string

```bash
security add-generic-password -l "Spark Remote" \
-a spiced -s spice_secret_spark \
-w $(echo -n '{"spark_remote": "spark"}')
```

`spicepod.yaml`
```yaml
version: v1beta1
kind: Spicepod
name: spice-app

secrets:
store: keyring

# <...>
```

Learn more about [Keyring Secret Store](/secret-stores/keyring).
</TabItem>
</Tabs>

## Example

```yaml
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: sc://localhost

```
1 change: 1 addition & 0 deletions spiceaidocs/docs/reference/spicepod/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ Where:

- [`spiceai`](../../data-connectors/spiceai.md)
- [`dremio`](../../data-connectors/dremio.md)
- [`spark`](../../data-connectors/spark.md)
- [`databricks`](../../data-connectors/databricks.md)
- [`s3`](../../data-connectors/s3.md)
- [`postgres`](../../data-connectors/postgres/index.md)
Expand Down