Skip to content

Commit

Permalink
Improve and expand client protocol docs
Browse files Browse the repository at this point in the history
Include the new spooling protocol and its configuration for CLI and JDBC
driver.
  • Loading branch information
mosabua committed Oct 25, 2024
1 parent d545e45 commit 30d3bfc
Show file tree
Hide file tree
Showing 8 changed files with 302 additions and 35 deletions.
189 changes: 189 additions & 0 deletions docs/src/main/sphinx/admin/properties-client-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Client protocol properties

bla bla intro with links to clients

(prop-protocol-v1)=
## v1 protocol

blabla intro

### `protocol.v1.alternate-header-name`

**Type:** [](prop-type-string)

The 351 release of Trino changes the HTTP client protocol headers to start with
`X-Trino-`. Clients for versions 350 and lower expect the HTTP headers to
start with `X-Presto-`, while newer clients expect `X-Trino-`. You can support these
older clients by setting this property to `Presto`.

The preferred approach to migrating from versions earlier than 351 is to update
all clients together with the release, or immediately afterwards, and then
remove usage of this property.

Ensure to use this only as a temporary measure to assist in your migration
efforts.

### `protocol.v1.prepared-statement-compression.length-threshold`

- **Type:** [](prop-type-integer)
- **Default value:** `2048`

Prepared statements that are submitted to Trino for processing, and are longer
than the value of this property, are compressed for transport via the HTTP
header to improve handling, and to avoid failures due to hitting HTTP header
size limits.

### `protocol.v1.prepared-statement-compression.min-gain`

- **Type:** [](prop-type-integer)
- **Default value:** `512`

Prepared statement compression is not applied if the size gain is less than the
configured value. Smaller statements do not benefit from compression, and are
left uncompressed.

(prop-protocol-spooling)=
## Spooling protocol

intro and more - do we want a separate page in the admin section like we do for
FTE instead or in addition? maybe a separate page since the file system details
need to be configured

### `protocol.spooling.worker-access`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Use worker nodes to retrieve data from spooling location


### `protocol.spooling.direct-storage-access`

- **Type:** [](prop-type-boolean)
- **Default value:** `true`

Retrieve segments directly from the spooling location

### `protocol.spooling.direct-storage-fallback`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Fallback segment retrieval through the coordinator when direct storage access is
not possible.

### `protocol.spooling.initial-segment-size`

- **Type:** [](prop-type-data-size)
- **Default value:** 8MB

Initial size of the spooled segments in bytes

### `protocol.spooling.maximum-segment-size`

- **Type:** [](prop-type-data-size)
- **Default value:** 16MB

tbd

### `protocol.spooling.inline-segments`

- **Type:** [](prop-type-boolean)
- **Default value:** `false`

Allow protocol to inline data

### `protocol.spooling.shared-secret-key`

- **Type:** [](prop-type-string)

256 bit, base64-encoded secret key used to secure segment identifiers.


(prop-spooling-filesystem)=
## Spooling filesystem

mabye this should go onto another page .. but I actually think not .. it fits
here unless there is any other usage for the spooling file system besides the
spooling protocol


### `fs.azure.enabled`

- **Type:** [](prop-type-boolean)

tbd, link to azure object storage for more details, exclusive to other


### `fs.s3.enabled`

- **Type:** [](prop-type-boolean)


### `fs.gcs.enabled`

- **Type:** [](prop-type-boolean)


### `fs.location`

- **Type:**

similar to https://trino.io/docs/current/connector/delta-lake.html#register-table

location with some schema

### `fs.layout`

- **Type:**

layout class, some sort of `SIMPLE` or `PARTITIONED`

Spooling segments file system layout


### `fs.layout.partitions`

integer

default 32

min 2, max 1024


only applicable for fs.layout=PARTITIONED


### `fs.segment.ttl`

- **Type:**

Maximum duration for the client to retrieve spooled segment before it expires


### `fs.segment.encryption`

- **Type:**

Encrypt segments with ephemeral keys


### `fs.segment.pruning.enabled`

- **Type:**

Prune expired segments periodically


### `fs.segment.pruning.interval`

- **Type:**

Interval to prune expired segments


### `fs.segment.pruning.batch-size`

- **Type:**

Prune expired segments in batches of provided size
35 changes: 0 additions & 35 deletions docs/src/main/sphinx/admin/properties-general.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,41 +33,6 @@ across nodes in the cluster. It can be disabled, when it is known that the
output data set is not skewed, in order to avoid the overhead of hashing and
redistributing all the data across the network.

## `protocol.v1.alternate-header-name`

**Type:** `string`

The 351 release of Trino changes the HTTP client protocol headers to start with
`X-Trino-`. Clients for versions 350 and lower expect the HTTP headers to
start with `X-Presto-`, while newer clients expect `X-Trino-`. You can support these
older clients by setting this property to `Presto`.

The preferred approach to migrating from versions earlier than 351 is to update
all clients together with the release, or immediately afterwards, and then
remove usage of this property.

Ensure to use this only as a temporary measure to assist in your migration
efforts.

## `protocol.v1.prepared-statement-compression.length-threshold`

- **Type:** {ref}`prop-type-integer`
- **Default value:** `2048`

Prepared statements that are submitted to Trino for processing, and are longer
than the value of this property, are compressed for transport via the HTTP
header to improve handling, and to avoid failures due to hitting HTTP header
size limits.

## `protocol.v1.prepared-statement-compression.min-gain`

- **Type:** {ref}`prop-type-integer`
- **Default value:** `512`

Prepared statement compression is not applied if the size gain is less than the
configured value. Smaller statements do not benefit from compression, and are
left uncompressed.

(file-compression)=
## File compression and decompression

Expand Down
1 change: 1 addition & 0 deletions docs/src/main/sphinx/admin/properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ properties, refer to the {doc}`connector documentation </connector/>`.
:titlesonly: true
General <properties-general>
Client protocol <properties-client-protocol>
HTTP server <properties-http-server>
Resource management <properties-resource-management>
Query management <properties-query-management>
Expand Down
13 changes: 13 additions & 0 deletions docs/src/main/sphinx/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@ user interface directly. Clients like the [JDBC driver](/client/jdbc), provide a
mechanism for other applications, including your own custom applications, to
connect to Trino.


protocol stuff

```{toctree}
:maxdepth: 1
client/client-protocol
```

The following clients are available as part of every Trino release:

```{toctree}
Expand All @@ -24,3 +33,7 @@ The Trino project maintains the following other client libraries:

In addition, other communities and vendors provide [numerous other client
libraries, drivers, and applications](https://trino.io/ecosystem/client)

Configure support for the [spooling protocol on the cluster](prop-protocol-spooling)
and your client to improve performance for client interactions with higher data
transfer demands.
32 changes: 32 additions & 0 deletions docs/src/main/sphinx/client/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,38 @@ Query 20200707_170726_00030_2iup9 failed: line 1:25: Column 'region' cannot be r
SELECT nationkey, name, region FROM tpch.sf1.nation LIMIT 3
```

(cli-spooling-protocol)=
## Spooling protocol


--encoding=<encoding> Experimental spooled protocol encoding [available: json, json+zstd, json+lz4]

validate encoding?

CLI options


--encoding-id - renamed to --encoding yet?

validate encoding in CLI?


segment expiry configuration

spolling location configuration
accessbile from all nodes


sizing of spooling storage location
management of old spooled data
survives cluster restart? Or is picked up cleanly after restart?


https://www.linkedin.com/posts/mateuszgajewski_trino-trino-performance-activity-7244658731991392258-0udu?utm_source=share&utm_medium=member_desktop


link to server config docs

(cli-output-format)=
## Output formats

Expand Down
50 changes: 50 additions & 0 deletions docs/src/main/sphinx/client/client-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Client protocol


two modes

spool and direct (v1)


Table with steps


client protocol page in clients section

new one is default

small add in dev guide

query finished faster

json can be compressed (20% ratio) 5x


fs.location and how it interacts with config of fs from object storage


client gets all detials to ready data (url, credentials, and keys for decruption)

uses encryption on object storage

client reads segments and deletes them

pruning done by coordinator, only for stuff that wasnt already cleaned up by client

spooling separate for each cluster

need good connectiivty from cluster to object storage, same reagion, same availability zone,

same as objects storage connector, or also FTE exchange

assumption that storage is unbounded, if storage fills up it will fail



\
remove protocol.v1.alternate-header-name`

maybe rename `protocol.v1.prepared-statement-compression.length-threshold`


have list of clients that support spooling
10 changes: 10 additions & 0 deletions docs/src/main/sphinx/client/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,3 +262,13 @@ may not be specified using both methods.
network overhead and uses smaller HTTP headers and requires Trino 431 or
greater.
:::


(jdbc-spooling-protocol)=
## Spooling protocol


--encoding=<encoding> Experimental spooled protocol encoding [available: json, json+zstd, json+lz4]


link to server config docs, server config must be in place
7 changes: 7 additions & 0 deletions docs/src/main/sphinx/develop/client-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,3 +260,10 @@ subsequent requests to be consistent with the response headers received.
Class `io.trino.client.ProtocolHeaders` in module `trino-client` in the
`client` directory of Trino source enumerates all the HTTP request and
response headers allowed by the Trino client REST API.


(spooled-protocol)=
## Spooled protocol

https://www.linkedin.com/posts/mateuszgajewski_trino-trino-performance-activity-7244658731991392258-0udu?utm_source=share&utm_medium=member_desktop

0 comments on commit 30d3bfc

Please sign in to comment.