Skip to content

Commit

Permalink
[apache#3680] docs(trino-connector): Update Trino connector doc for u…
Browse files Browse the repository at this point in the history
…sing Trino dynamic catalog (apache#3682)

### What changes were proposed in this pull request?

 Update Trino connector doc for using Trino dynamic catalog

### Why are the changes needed?

Fix: apache#3680 

### Does this PR introduce _any_ user-facing change?

Update docs

### How was this patch tested?

No
  • Loading branch information
diqiu50 authored May 31, 2024
1 parent 2935f6e commit 22f0986
Show file tree
Hide file tree
Showing 5 changed files with 82 additions and 109 deletions.
16 changes: 10 additions & 6 deletions docs/trino-connector/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@ license: "Copyright 2023 Datastrato Pvt Ltd.
This software is licensed under the Apache License version 2."
---

| Property | Type | Default Value | Description | Required | Since Version |
|-----------------------------------|---------|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
| connector.name | string | (none) | The `connector.name` defines the type of Trino connector, this value is always 'gravitino'. | Yes | 0.2.0 |
| gravitino.metalake | string | (none) | The `gravitino.metalake` defines which metalake in Gravitino server the Trino connector uses. Trino connector should set it at start, the value of `gravitino.metalake` needs to be a valid name, Trino connector can detect and load the metalake with catalogs, schemas and tables once created and keep in sync. | Yes | 0.2.0 |
| gravitino.uri | string | http://localhost:8090 | The `gravitino.uri` defines the connection URL of the Gravitino server, the default value is `http://localhost:8090`. Trino connector can detect and connect to Gravitino server once it is ready, no need to start Gravitino server beforehand. | Yes | 0.2.0 |
| gravitino.simplify-catalog-names | boolean | true | The `gravitino.simplify-catalog-names` setting omits the metalake prefix from catalog names when set to true. | NO | 0.5.0 |
| Property | Type | Default Value | Description | Required | Since Version |
|----------------------------------|---------|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------|
| connector.name | string | (none) | The `connector.name` defines the type of Trino connector, this value is always 'gravitino'. | Yes | 0.2.0 |
| gravitino.metalake | string | (none) | The `gravitino.metalake` defines which metalake in Gravitino server the Trino connector uses. Trino connector should set it at start, the value of `gravitino.metalake` needs to be a valid name, Trino connector can detect and load the metalake with catalogs, schemas and tables once created and keep in sync. | Yes | 0.2.0 |
| gravitino.uri | string | http://localhost:8090 | The `gravitino.uri` defines the connection URL of the Gravitino server, the default value is `http://localhost:8090`. Trino connector can detect and connect to Gravitino server once it is ready, no need to start Gravitino server beforehand. | Yes | 0.2.0 |
| gravitino.simplify-catalog-names | boolean | true | The `gravitino.simplify-catalog-names` setting omits the metalake prefix from catalog names when set to true. | NO | 0.5.0 |
| trino.jdbc.uri | string | jdbc:trino://localhost:8080 | The jdbc uri of current Trino server. | NO | 0.5.1 |
| trino.catalog.store | string | etc/catalog | The directory that stores the catalog configuration of current Trino. | NO | 0.5.1 |
| trino.jdbc.user | string | admin | The jdbc user name of current Trino. | NO | 0.5.1 |
| trino.jdbc.password | string | (none) | The jdbc password of current Trino. | NO | 0.5.1 |
133 changes: 35 additions & 98 deletions docs/trino-connector/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ To develop the Gravitino connector locally, you need to do the following steps:

### IDEA

1. Clone the Trino repository from the [GitHub](https://github.com/trinodb/trino) repository. We advise you to use the release version 426 or 435.
1. Clone the Trino repository from the [GitHub](https://github.com/trinodb/trino) repository. The released version Trino-435 is the least version that Gravitino supports.
2. Open the Trino project in your IDEA.
3. Create a new module for the Gravitino connector in the Trino project as the following picture (we will use the name `trino-gravitino` as the module name in the following steps). ![trino-gravitino](../assets/trino/create-gravitino-connector.jpg)
4. Add a soft link to the Gravitino trino connector module in the Trino project. Assuming the src java main directory of the Gravitino trino connector in project Gravitino is `gravitino/path/to/gravitino-trino-connector/src/main/java`,
Expand All @@ -54,17 +54,12 @@ then you can see the `gravitino-trino-connecor` source files and directories in

```xml
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2024 Datastrato Pvt Ltd.
This software is licensed under the Apache License version 2.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>io.trino</groupId>
<artifactId>trino-root</artifactId>
<version>426</version>
<version>435</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand All @@ -77,142 +72,81 @@ then you can see the `gravitino-trino-connecor` source files and directories in
</properties>

<dependencies>
<!--
The following dependencies are required for the Gravitino connector. You can install them
locally (./gradlew publishToMavenLocal) or just use the release version like 0.4.0
-->
<dependency>
<groupId>com.datastrato.gravitino</groupId>
<artifactId>bundled-catalog</artifactId>
<version>0.5.0-SNAPSHOT</version>
</dependency>

<dependency>
<groupId>com.datastrato.gravitino</groupId>
<artifactId>client-java-runtime</artifactId>
<version>0.5.0-SNAPSHOT</version>
</dependency>

<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>

<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>bootstrap</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-to-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>configuration</artifactId>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>json</artifactId>
</dependency>

<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-plugin-toolkit</artifactId>
<artifactId>trino-spi</artifactId>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>jakarta.validation</groupId>
<artifactId>jakarta.validation-api</artifactId>
<groupId>io.trino</groupId>
<artifactId>trino-jdbc</artifactId>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
<version>4.4</version>
<groupId>io.trino</groupId>
<artifactId>trino-client</artifactId>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>

<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.2.1</version>
</dependency>

<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<scope>provided</scope>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>slice</artifactId>
<scope>provided</scope>
<groupId>io.trino</groupId>
<artifactId>trino-jdbc</artifactId>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<scope>provided</scope>
<groupId>org.apache.commons</groupId>
<artifactId>commons-collections4</artifactId>
<version>4.4</version>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-context</artifactId>
<scope>provided</scope>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>

<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-spi</artifactId>
<scope>provided</scope>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.9</version>
</dependency>

<!--
You can switch to the snapshot version as you like, for example,
if you want to use the jar of latest main branch,
you can execute the following command to install Gravitino `client-java-runtime` jar locally.
./gradlew publishToMavenLocal
-->
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<scope>provided</scope>
<groupId>com.datastrato.gravitino</groupId>
<artifactId>client-java-runtime</artifactId>
<version>0.5.1</version>
</dependency>

<dependency>
<groupId>io.airlift</groupId>
<artifactId>node</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-memory</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.trino</groupId>
<artifactId>trino-testing</artifactId>
<scope>test</scope>
<groupId>com.datastrato.gravitino</groupId>
<artifactId>bundled-catalog</artifactId>
<version>0.5.1</version>
</dependency>

<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<scope>test</scope>
</dependency>
</dependencies>

</project>
```

Expand Down Expand Up @@ -298,6 +232,9 @@ plugin.bundles=\
../../plugin/trino-gravitino/pom.xml

node-scheduler.include-coordinator=true

# Note: The Gravitino connector olny supports with The dynamic catalog manager
catalog.management=dynamic
```

:::note
Expand Down
28 changes: 27 additions & 1 deletion docs/trino-connector/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,18 @@ Please refer to the [Deploying Trino documentation](https://trino.io/docs/curren
2. Copy the connector directory to the Trino's plugin directory.
Normally, the directory location is `Trino-server-<version>/plugin`, and the directory contains other catalogs used by Trino.
3. Add Trino JVM arguments `-Dlog4j.configurationFile=file:////etc/trino/log4j2.properties` to enable logging for the Gravitino connector.
4. Update Trino coordinator configuration.
You need to set `catalog.management=dynamic` and `catalog.stores=file`
The config location is `Trino-server-<version>/etc/config.properteis`, and the contents like:

```text
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
catalog.management=dynamic
catalog.stores=file
discovery.uri=http://0.0.0.0:8080
```

Alternatively,
you can build the Gravitino connector package from the sources
Expand All @@ -31,7 +43,7 @@ Use the docker command to create a container from the `trinodb/trino` image. Ass
Run it in the background, and map the default Trino port, which is 8080, from inside the container to port 8080 on your machine.

```shell
docker run --name trino-gravitino -d -p 8080:8080 trinodb/trino:426
docker run --name trino-gravitino -d -p 8080:8080 trinodb/trino:435
```

Run `docker ps` to check whether the container is running.
Expand Down Expand Up @@ -64,6 +76,20 @@ cd /lib/trino/plugin

Now you can see the Gravitino connector directory in the plugin directory.

### Configuring the Trino

You can find the Trino configuration file `config.properties` in the directory `/etc/trino`. You need changed the file like this:

```text
#single node install config
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://localhost:8080
catalog.management=dynamic
catalog.stores=file
```

### Configuring the Gravitino connector

Assuming you have now started the Gravitino server on the host `gravitino-server-host` and already created a metalake named `test`, if those have not been prepared, please refer to the [Gravitino getting started](../getting-started.md).
Expand Down
5 changes: 3 additions & 2 deletions docs/trino-connector/requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ This software is licensed under the Apache License version 2."

To install and deploy the Gravitino connector, The following environmental setup is necessary:

- Trino server version should be higher than Trino-server-360, ideally using Trino-server-426.
- Trino server version should be at least Trino-server-435.
Other versions of Trino have not undergone thorough testing.
- Ensure that all nodes running Trino can access the Gravitino server's port, which defaults to 8090.
- Ensure that all nodes running Trino can access the real catalogs resources, such as Hive, Iceberg, MySQL, PostgreSQL, etc.
- Ensure that you have installed the following connectors in Trino: Hive, Iceberg, MySQL, PostgreSQL.
- Ensure that you have set the `catalog.management` to `static` in the Trino configuration.
- Ensure that you have set the `catalog.management` to `dynamic` in the Trino coordinator configuration.
- Ensure that you have set the `catalog.store` to `file` in the Trino coordinator configuration.
9 changes: 7 additions & 2 deletions docs/trino-connector/trino-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ This software is licensed under the Apache License version 2."

Trino can manage and access data using the Trino connector provided by `Gravitino`, commonly referred to as the `Gravitino connector`.
After configuring the Gravitino connector in Trino, Trino can automatically load catalog metadata from Gravitino, allowing users to directly access these catalogs in Trino.
Once integrated with Gravitino, Trino can operate on all Gravitino data without requiring additional configuration.
Once integrated with Gravitino, Trino can operate on all Gravitino data without requiring additional configuration.
The Gravitino connector uses the [Trino dynamic catalog managed mechanism](https://trino.io/docs/current/admin/properties-catalog.html) to load catalogs.
When the Gravitino connector retrieves catalogs from the Gravitino server, it generates a `CREATE CATAGLOG` statement and executes
the statement on the current Trino server to register the catalogs with Trino

:::node
Once metadata such as catalogs, schemas, or tables are changed in Gravitino, Trino can update itself through Gravitino, this process usually takes
Once metadata such as catalogs are changed in Gravitino, Trino can update itself through Gravitino, this process usually takes
about 3~10 seconds.
:::

Expand All @@ -26,3 +29,5 @@ Usage in queries is as follows:
```text
SELECT * from catalog.dbname.tabname
```


0 comments on commit 22f0986

Please sign in to comment.