From 0ff72f76a24ad2134c95270efa5f528467c3d907 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Thu, 17 Oct 2024 16:08:18 -0700 Subject: [PATCH] Improve and expand client protocol docs Include the new spooling protocol and its configuration for CLI and JDBC driver. --- .../admin/properties-client-protocol.md | 164 ++++++++++++++++++ .../main/sphinx/admin/properties-general.md | 35 ---- docs/src/main/sphinx/admin/properties.md | 1 + docs/src/main/sphinx/client.md | 4 + docs/src/main/sphinx/client/cli.md | 30 ++++ docs/src/main/sphinx/client/jdbc.md | 7 + .../main/sphinx/develop/client-protocol.md | 7 + 7 files changed, 213 insertions(+), 35 deletions(-) create mode 100644 docs/src/main/sphinx/admin/properties-client-protocol.md diff --git a/docs/src/main/sphinx/admin/properties-client-protocol.md b/docs/src/main/sphinx/admin/properties-client-protocol.md new file mode 100644 index 00000000000000..5531c65c1aa418 --- /dev/null +++ b/docs/src/main/sphinx/admin/properties-client-protocol.md @@ -0,0 +1,164 @@ +# Client protocol properties + +bla bla intro with links to clients + +(prop-protocol-v1)= +## v1 protocol + +blabla intro + +### `protocol.v1.alternate-header-name` + +**Type:** [](prop-type-string) + +The 351 release of Trino changes the HTTP client protocol headers to start with +`X-Trino-`. Clients for versions 350 and lower expect the HTTP headers to +start with `X-Presto-`, while newer clients expect `X-Trino-`. You can support these +older clients by setting this property to `Presto`. + +The preferred approach to migrating from versions earlier than 351 is to update +all clients together with the release, or immediately afterwards, and then +remove usage of this property. + +Ensure to use this only as a temporary measure to assist in your migration +efforts. + +### `protocol.v1.prepared-statement-compression.length-threshold` + +- **Type:** [](prop-type-integer) +- **Default value:** `2048` + +Prepared statements that are submitted to Trino for processing, and are longer +than the value of this property, are compressed for transport via the HTTP +header to improve handling, and to avoid failures due to hitting HTTP header +size limits. + +### `protocol.v1.prepared-statement-compression.min-gain` + +- **Type:** [](prop-type-integer) +- **Default value:** `512` + +Prepared statement compression is not applied if the size gain is less than the +configured value. Smaller statements do not benefit from compression, and are +left uncompressed. + +(prop-protocol-spooling)= +## Spooling protocol + +intro and more - do we want a separate page in the admin section like we do for +FTE instead or in addition? maybe a separate page since the file system details +need to be configured + +### `protocol.spooling.worker-access` + +- **Type:** [](prop-type-boolean) +- **Default value:** `false` + +Use worker nodes to retrieve data from spooling location + + +### `protocol.spooling.direct-storage-access` + +- **Type:** [](prop-type-boolean) +- **Default value:** `true` + +Retrieve segments directly from the spooling location + +### `protocol.spooling.direct-storage-fallback` + +- **Type:** [](prop-type-boolean) +- **Default value:** `false` + +Fallback segment retrieval through the coordinator when direct storage access is +not possible. + +### `protocol.spooling.initial-segment-size` + +- **Type:** [](prop-type-data-size) +- **Default value:** 8MB + +Initial size of the spooled segments in bytes + +### `protocol.spooling.maximum-segment-size` + +- **Type:** [](prop-type-data-size) +- **Default value:** 16MB + +tbd + +### `protocol.spooling.inline-segments` + +- **Type:** [](prop-type-boolean) +- **Default value:** `false` + +Allow protocol to inline data + +### `protocol.spooling.shared-secret-key` + +- **Type:** [](prop-type-string) + +256 bit, base64-encoded secret key used to secure segment identifiers. + + +(prop-spooling-filesystem)= +## Spooling filesystem + +mabye this should go onto another page .. but I actually think not .. it fits +here unless there is any other usage for the spooling file system besides the +spooling protocol + + +### `fs.azure.enabled` + +- **Type:** [](prop-type-boolean) + +tbd, link to azure object storage for more details, exclusive to other + + +### `fs.s3.enabled` + +- **Type:** [](prop-type-boolean) + + +### `fs.gcs.enabled` + +- **Type:** [](prop-type-boolean) + + +### `fs.location` + + + +### `fs.layout` + + +layout class, some sort of `SIMPLE` or `PARTITIONED` + +Spooling segments file system layout + + +### `fs.segment.ttl` + +Maximum duration for the client to retrieve spooled segment before it expires + + +### `fs.segment.encryption` + + +Encrypt segments with ephemeral keys + + +### `fs.segment.pruning.enabled` + +Prune expired segments periodically + + +### `fs.segment.pruning.interval` + +Interval to prune expired segments + + +### `fs.segment.pruning.batch-size` + + +Prune expired segments in batches of provided size \ No newline at end of file diff --git a/docs/src/main/sphinx/admin/properties-general.md b/docs/src/main/sphinx/admin/properties-general.md index 11fa3282b6678b..5b2e56a25b5836 100644 --- a/docs/src/main/sphinx/admin/properties-general.md +++ b/docs/src/main/sphinx/admin/properties-general.md @@ -33,41 +33,6 @@ across nodes in the cluster. It can be disabled, when it is known that the output data set is not skewed, in order to avoid the overhead of hashing and redistributing all the data across the network. -## `protocol.v1.alternate-header-name` - -**Type:** `string` - -The 351 release of Trino changes the HTTP client protocol headers to start with -`X-Trino-`. Clients for versions 350 and lower expect the HTTP headers to -start with `X-Presto-`, while newer clients expect `X-Trino-`. You can support these -older clients by setting this property to `Presto`. - -The preferred approach to migrating from versions earlier than 351 is to update -all clients together with the release, or immediately afterwards, and then -remove usage of this property. - -Ensure to use this only as a temporary measure to assist in your migration -efforts. - -## `protocol.v1.prepared-statement-compression.length-threshold` - -- **Type:** {ref}`prop-type-integer` -- **Default value:** `2048` - -Prepared statements that are submitted to Trino for processing, and are longer -than the value of this property, are compressed for transport via the HTTP -header to improve handling, and to avoid failures due to hitting HTTP header -size limits. - -## `protocol.v1.prepared-statement-compression.min-gain` - -- **Type:** {ref}`prop-type-integer` -- **Default value:** `512` - -Prepared statement compression is not applied if the size gain is less than the -configured value. Smaller statements do not benefit from compression, and are -left uncompressed. - (file-compression)= ## File compression and decompression diff --git a/docs/src/main/sphinx/admin/properties.md b/docs/src/main/sphinx/admin/properties.md index 4789f7e1e5af07..9f2f3e6ddd2d99 100644 --- a/docs/src/main/sphinx/admin/properties.md +++ b/docs/src/main/sphinx/admin/properties.md @@ -15,6 +15,7 @@ properties, refer to the {doc}`connector documentation `. :titlesonly: true General +Client protocol HTTP server Resource management Query management diff --git a/docs/src/main/sphinx/client.md b/docs/src/main/sphinx/client.md index b8274606bf8b36..7fcb6b867706ab 100644 --- a/docs/src/main/sphinx/client.md +++ b/docs/src/main/sphinx/client.md @@ -24,3 +24,7 @@ The Trino project maintains the following other client libraries: In addition, other communities and vendors provide [numerous other client libraries, drivers, and applications](https://trino.io/ecosystem/client) + +Configure support for the [spooling protocol on the cluster](prop-protocol-spooling) +and your client to improve performance for client interactions with higher data +transfer demands. diff --git a/docs/src/main/sphinx/client/cli.md b/docs/src/main/sphinx/client/cli.md index 015c07ef255c29..a899ad1dea83a6 100644 --- a/docs/src/main/sphinx/client/cli.md +++ b/docs/src/main/sphinx/client/cli.md @@ -602,6 +602,36 @@ Query 20200707_170726_00030_2iup9 failed: line 1:25: Column 'region' cannot be r SELECT nationkey, name, region FROM tpch.sf1.nation LIMIT 3 ``` +(cli-spooled-protocol)= +## Spooled protocol + + + --encoding= Experimental spooled protocol encoding [available: json, json+zstd, json+lz4] + +validate encoding? + + CLI options + + +--encoding-id - renamed to --encoding yet? + +validate encoding in CLI? + + +segment expiry configuration + +spolling location configuration +accessbile from all nodes + + +sizing of spooling storage location +management of old spooled data +survives cluster restart? Or is picked up cleanly after restart? + + +https://www.linkedin.com/posts/mateuszgajewski_trino-trino-performance-activity-7244658731991392258-0udu?utm_source=share&utm_medium=member_desktop + + (cli-output-format)= ## Output formats diff --git a/docs/src/main/sphinx/client/jdbc.md b/docs/src/main/sphinx/client/jdbc.md index 016236c68d138e..c4d1cf874026b9 100644 --- a/docs/src/main/sphinx/client/jdbc.md +++ b/docs/src/main/sphinx/client/jdbc.md @@ -262,3 +262,10 @@ may not be specified using both methods. network overhead and uses smaller HTTP headers and requires Trino 431 or greater. ::: + + +(cli-spooled-protocol)= +## Spooled protocol + + + --encoding= Experimental spooled protocol encoding [available: json, json+zstd, json+lz4] diff --git a/docs/src/main/sphinx/develop/client-protocol.md b/docs/src/main/sphinx/develop/client-protocol.md index 346617d585d37c..39345f5105e5fc 100644 --- a/docs/src/main/sphinx/develop/client-protocol.md +++ b/docs/src/main/sphinx/develop/client-protocol.md @@ -260,3 +260,10 @@ subsequent requests to be consistent with the response headers received. Class `io.trino.client.ProtocolHeaders` in module `trino-client` in the `client` directory of Trino source enumerates all the HTTP request and response headers allowed by the Trino client REST API. + + +(spooled-protocol)= +## Spooled protocol + +https://www.linkedin.com/posts/mateuszgajewski_trino-trino-performance-activity-7244658731991392258-0udu?utm_source=share&utm_medium=member_desktop +