Skip to content

Commit

Permalink
[apache#3595] Improvement(spark-connector): update spark-iceberg conn…
Browse files Browse the repository at this point in the history
…ector document (apache#3596)

### What changes were proposed in this pull request?

update spark-iceberg connector document

### Why are the changes needed?

Fix: apache#3595
  • Loading branch information
caican00 authored and diqiu50 committed Jun 13, 2024
1 parent 877254c commit 6fdfc88
Showing 1 changed file with 59 additions and 16 deletions.
75 changes: 59 additions & 16 deletions docs/spark-connector/spark-catalog-iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,31 +12,35 @@ The Gravitino Spark connector offers the capability to read and write Iceberg ta

#### Support DML and DDL operations:

- `CREATE TABLE`

Supports basic create table clause including table schema, properties, partition, does not support distribution and sort orders.

- `CREATE TABLE`
- `Supports basic create table clause including table schema, properties, partition, does not support distribution and sort orders.`
- `DROP TABLE`
- `ALTER TABLE`
- `INSERT INTO&OVERWRITE`
- `SELECT`
- `MERGE INOT`
- `DELETE FROM`
- `UPDATE`
- `CALL`
- `TIME TRAVEL QUERY`
- `DESCRIBE TABLE`

#### Not supported operations:

- View operations.
- Branching and tagging operations.
- Spark procedures.
- Metadata tables, like:
- `{iceberg_catalog}.{iceberg_database}.{iceberg_table}.snapshots`
- Other Iceberg extension SQL, like:
- `ALTER TABLE prod.db.sample ADD PARTITION FIELD xx`
- `ALTER TABLE ... WRITE ORDERED BY`
- `ALTER TABLE prod.db.sample CREATE BRANCH branchName`
- `ALTER TABLE prod.db.sample CREATE TAG tagName`
- AtomicCreateTableAsSelect&AtomicReplaceTableAsSelect

## SQL example

```sql
// Suppose iceberg_a is the Iceberg catalog name managed by Gravitino
-- Suppose iceberg_a is the Iceberg catalog name managed by Gravitino
USE iceberg_a;

CREATE DATABASE IF NOT EXISTS mydatabase;
Expand All @@ -57,7 +61,7 @@ VALUES
(2, 'Bob', 'Marketing', TIMESTAMP '2021-02-01 10:30:00'),
(3, 'Charlie', 'Sales', TIMESTAMP '2021-03-01 08:45:00');

SELECT * FROM employee WHERE date(hire_date) = '2021-01-01'
SELECT * FROM employee WHERE date(hire_date) = '2021-01-01';

UPDATE employee SET department = 'Jenny' WHERE id = 1;

Expand All @@ -74,18 +78,57 @@ USING (SELECT 4 as id, 'David' as name, 'Engineering' as department, TIMESTAMP '
ON employee.id = new_employee.id
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT *;

-- Suppose that the first snapshotId of employee is 1L and the second snapshotId is 2L
-- Rollback the snapshot for iceberg_a.mydatabase.employee to 1L
CALL iceberg_a.system.rollback_to_snapshot('iceberg_a.mydatabase.employee', 1);
-- Set the snapshot for iceberg_a.mydatabase.employee to 2L
CALL iceberg_a.system.set_current_snapshot('iceberg_a.mydatabase.employee', 2);

-- Suppose that the commit timestamp of the first snapshot is older than '2024-05-27 01:01:00'
-- Time travel to '2024-05-27 01:01:00'
SELECT * FROM employee TIMESTAMP AS OF '2024-05-27 01:01:00';
SELECT * FROM employee FOR SYSTEM_TIME AS OF '2024-05-27 01:01:00';

-- Show the details of employee, such as schema and reserved properties(like location, current-snapshot-id, provider, format, format-version, etc)
DESC EXTENDED employee;
```

## Catalog properties
For more details about `CALL`, please refer to the [Spark Procedures description](https://iceberg.apache.org/docs/latest/spark-procedures/#spark-procedures) in Iceberg official document. \

## Iceberg backend-catalog support
- HiveCatalog
- JdbcCatalog
- RESTCatalog

### Catalog properties

Gravitino spark connector will transform below property names which are defined in catalog properties to Spark Iceberg connector configuration.

| Gravitino catalog property name | Spark Iceberg connector configuration | Description | Since Version |
|---------------------------------|---------------------------------------|---------------------------|---------------|
| `catalog-backend` | `type` | Catalog backend type | 0.5.0 |
| `uri` | `uri` | Catalog backend uri | 0.5.0 |
| `warehouse` | `warehouse` | Catalog backend warehouse | 0.5.0 |
| `jdbc-user` | `jdbc.user` | JDBC user name | 0.5.0 |
| `jdbc-password` | `jdbc.password` | JDBC password | 0.5.0 |
#### HiveCatalog

| Gravitino catalog property name | Spark Iceberg connector configuration | Default Value | Required | Description | Since Version |
|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
| `catalog-backend` | `type` | `hive` | Yes | Catalog backend type | 0.5.0 |
| `uri` | `uri` | (none) | Yes | Catalog backend uri | 0.5.0 |
| `warehouse` | `warehouse` | (none) | Yes | Catalog backend warehouse | 0.5.0 |

#### JdbcCatalog

| Gravitino catalog property name | Spark Iceberg connector configuration | Default Value | Required | Description | Since Version |
|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
| `catalog-backend` | `type` | `jdbc` | Yes | Catalog backend type | 0.5.0 |
| `uri` | `uri` | (none) | Yes | Catalog backend uri | 0.5.0 |
| `warehouse` | `warehouse` | (none) | Yes | Catalog backend warehouse | 0.5.0 |
| `jdbc-user` | `jdbc.user` | (none) | Yes | JDBC user name | 0.5.0 |
| `jdbc-password` | `jdbc.password` | (none) | Yes | JDBC password | 0.5.0 |

#### RESTCatalog

| Gravitino catalog property name | Spark Iceberg connector configuration | Default Value | Required | Description | Since Version |
|---------------------------------|---------------------------------------|---------------|----------|---------------------------|---------------|
| `catalog-backend` | `type` | `rest` | Yes | Catalog backend type | 0.5.1 |
| `uri` | `uri` | (none) | Yes | Catalog backend uri | 0.5.1 |
| `warehouse` | `warehouse` | (none) | No | Catalog backend warehouse | 0.5.1 |

Gravitino catalog property names with the prefix `spark.bypass.` are passed to Spark Iceberg connector. For example, using `spark.bypass.io-impl` to pass the `io-impl` to the Spark Iceberg connector.

0 comments on commit 6fdfc88

Please sign in to comment.