Project Swift #22271

wendigo · 2024-06-05T08:03:47Z

Trino has had its protocol since it's inception in 2012. Both client and cluster protocols are REST-oriented and are using JSON as only serialization format and HTTP/1.1 as a transport layer. While in 2012 the client and server protocols were good enough for the majority of use-cases, nowadays the amount of data clients want to efficiently retrieve from the Trino cluster has increased significantly.

We are starting project Swift with the goal of improving existing Trino protocol, both for client and server to server communication.

Introduction of v2 protocol isn't the goal for this project.

Tasks

Client protocol improvements

Give feedback

Support and enable HTTP/2 #21793

roadmap
Add support for path prefix routing using "X-Forwarded-Prefix" header #22227

cla-signed stale-ignore
[Swift] Spooled client protocol extension #22662

jdbc performance roadmap
Options

Server protocol improvements

Give feedback

Support and enable HTTP/2 #21793

roadmap
Test H2C/HTTP/2 for internal communication #22249

cla-signed
Reject X-Forwarded-* headers by default #6552
Reject X-FORWARDED-* headers if processing is disabled airlift/airlift#1183
Add HTTP version to client traces airlift/airlift#1158
Reimplement request/response timing - improves accuracy and correctness airlift/airlift#1161
Update airlift to 249 #22457

cla-signed
[Swift] Spooling protocol extension #22995

cla-signed enhancement jdbc roadmap
Missing :authority pseudo-header when proxy converts HTTP/1 to HTTP/2 #23237
Announce GA of the Spooled Protocol
Options

The text was updated successfully, but these errors were encountered:

sajjoseph · 2024-06-05T13:52:28Z

Wonderful. Thanks for this initiative even though I will be thrilled if we ever see green light for V2 protocol.
How about the following.

Add nextURI to the HTTP header response
Add partialCancelUri to the HTTP header response
targetResultSize enhancement
Add cluster identifier as a request parameter

wendigo · 2024-06-05T14:04:36Z

@sajjoseph can you elaborate more on the use-cases for each of the points?

himanshpal · 2024-06-05T14:50:37Z

In a world where Arrow and Arrow Flight are the new standards and being increasingly adopted by many databases, Do we ever plan to invest in Arrow and integrate in Trino ?

I know, couple of years ago Netflix team did a poc for integrating Arrow in trino but it never got completed.

wendigo · 2024-06-06T07:20:21Z

@himanshpal we are not considering introduction of an entirely new protocol at the moment (like Arrow Flight). We are thinking about other serialization format for the client-server communication and Arrow is one of the candidates.

losipiuk · 2024-06-06T09:13:49Z

cc: @losipiuk

mosabua · 2024-06-19T16:20:19Z

@himanshpal just to clarify what @wendigo mentioned.. we are considering Arrow as one of the candidates but in its current format it has significant limitations in its type system so that it can not be used to cover all data from Trino and its richer type system. So we might end up in a situation where Arrow can be used with limitations in place, and another format is used for full support. However .. the Arrow project is advancing and we are still quite a way from even starting on a V2 protocol. There is a lot of room to improve the current protocol and that is our focus in this Project Swift.

mosabua · 2024-06-19T16:23:29Z

@wendigo I think some of the ideas from @sajjoseph are related to Trino Gateway and other tools being able to redirect easier by just using info in the HTTP headers rather than having to parse the response. I kinda recall us talking about that in some Trino Gateway dev syncs as well so maybe @oneonestar @vishalya @willmostly @Chaho12 have a better memory than me and can detail this more.

wendigo · 2024-06-19T16:27:22Z

@mosabua I recall it.

mosabua · 2024-06-24T14:58:44Z

Some very interesting numbers from a user reported in #22303 related to changing targetResultSize .. this could be a great quick win. Maybe its worth changing the current default to more than 16MB for starters. And maybe figure out some way to adjust automatically.

nickalexander053 · 2024-06-27T10:35:55Z

Does project swift include the ability to do parallel reads directly from the worker nodes? I would love to remove all file system access from users and pump everything through Trino but I need to support use cases where very large full tables are loaded for model training into spark. Any idea when work on v2 protocol will begin

wendigo · 2024-06-27T10:41:08Z

@nickalexander053 yes, parallel reads are part of the project but exposing data directly from the worker is not an option so we've approached that other way around. The protocol changes that we are planning to introduce will support your use case.

nickalexander053 · 2024-06-27T10:46:16Z

@wendigo Thanks, could you elaborate or point me to some documentation/discussion as to what the protocol changes are? Any idea when work on the protocol changes may begin, would love to help

wendigo · 2024-06-27T11:14:37Z

@nickalexander053 we will post more details soon, we already have a first iteration of a working prototype

shohamyamin · 2024-06-28T07:59:16Z

Improving the protocol could possibly help with creating odbc driver for Trino?

For example if I am not mistaken someone was implemented the flightSQL protocol in there forked Trino and use the flightSQL odbc driver and that work for him.

So maybe taking under consideration the need of odbc driver in the building of the protocol will make it easier in the future to build an odbc driver

wendigo · 2024-06-28T08:57:25Z

@shohamyamin this is not a goal and flightsql is out of scope

sajjoseph · 2024-07-15T01:44:47Z

@sajjoseph can you elaborate more on the use-cases for each of the points?

I added more details here - #22662 (comment).
Thanks!

FHTMitchell · 2024-10-25T15:54:42Z

@mosabua

Can you please expand on which Trino types you think aren't expressible in arrow?

wendigo · 2024-10-25T15:56:30Z

@FHTMitchell for example timestamp with picosecond precision with timezone

wendigo · 2024-10-25T15:59:29Z

Arrow only supports timestamp(6)

FHTMitchell · 2024-10-25T16:17:46Z

@wendigo

Thanks for the quick reply!

Yeah the 64 bit precision of the arrow timestamp wouldn't fit the full range of the trino equivalents. Have you considered using arrow extension types? Arrow should cover 99% of users use cases so feels, to me at least, that building on an established format would be preferable.

mosabua · 2024-10-25T16:18:24Z

From what I know Arrow also does not support some of our more complex data types. We will essentially have to create a mapping and translation layer for https://arrow.apache.org/docs/python/api/datatypes.html and https://trino.io/docs/current/language/types.html .. at first from Trino to Arrow but potentially in both directions.

This might make sense from a compatibility perspective for client tools that work with Arrow directly.
From a performance perspective it might be better to figure out a way to move the memory format from Trino (that inspired Arrow) over the wire directly to the clients. That would avoid the translation .. but we would have to expand what our client drivers can do to adjust for that.

At this stage we don't know what is better and we might end up doing both. For now we are already seeing amazing improvements with the spooling protocol and are working on documenting that and get it supported in all clients. It is flexible enough to support other encodings so the doors are open..

mosabua · 2024-10-25T16:30:44Z

Also one last note ... we essentially want to do what is best for Trino users first and foremost. And that is mostly performance and Trino-specific use case related. When and where integration with Arrow is important (which we dont know at this stage), we would love to find out more about it and for people with Arrow knowledge and development skills to help.

losipiuk · 2024-10-28T08:56:48Z

From a performance perspective it might be better to figure out a way to move the memory format from Trino (that inspired Arrow) over the wire directly to the clients. That would avoid the translation

Actually this not that simple :) You have very different backward compatibility requirements for protocol and internal representation. You need to be free to make any internal changes, to unlock performance/other improvements, and you are basically not allowed to do any changes to external protocol. So translation is needed no matter what.

wendigo added the roadmap Top level issues for major efforts in the project label Jun 5, 2024

wendigo changed the title ~~Project XXX~~ Project Swift Jun 6, 2024

mosabua mentioned this issue Jun 19, 2024

Trino-cli has no option to accept targetResultSize query parameter #22303

Closed

wendigo mentioned this issue Jun 20, 2024

Update airlift to 249 #22457

Merged

wendigo mentioned this issue Jul 13, 2024

[Swift] Spooled client protocol extension #22662

Closed

wendigo mentioned this issue Oct 1, 2024

Upgrade from 429 to 443 introduced java.lang.IllegalArgumentException: Unknown java type class io.trino.spi.block.SqlMap #23632

Open

wendigo mentioned this issue Oct 11, 2024

Query state shows Finishing and stays for long when the result set is large #23759

Closed

This was referenced Oct 18, 2024

Improve and expand client protocol docs #23836

Draft

Enable HTTP/2 for internal communication by default #23857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Swift #22271

Project Swift #22271

wendigo commented Jun 5, 2024 •

edited

Loading

Client protocol improvements

Server protocol improvements

sajjoseph commented Jun 5, 2024

wendigo commented Jun 5, 2024

himanshpal commented Jun 5, 2024 •

edited

Loading

wendigo commented Jun 6, 2024

losipiuk commented Jun 6, 2024

mosabua commented Jun 19, 2024

mosabua commented Jun 19, 2024

wendigo commented Jun 19, 2024

mosabua commented Jun 24, 2024 •

edited

Loading

nickalexander053 commented Jun 27, 2024

wendigo commented Jun 27, 2024

nickalexander053 commented Jun 27, 2024

wendigo commented Jun 27, 2024 •

edited

Loading

shohamyamin commented Jun 28, 2024

wendigo commented Jun 28, 2024

sajjoseph commented Jul 15, 2024

FHTMitchell commented Oct 25, 2024

wendigo commented Oct 25, 2024

wendigo commented Oct 25, 2024

FHTMitchell commented Oct 25, 2024 •

edited

Loading

mosabua commented Oct 25, 2024 •

edited

Loading

mosabua commented Oct 25, 2024

losipiuk commented Oct 28, 2024

Project Swift #22271

Project Swift #22271

Comments

wendigo commented Jun 5, 2024 • edited Loading

Tasks

Client protocol improvements

Server protocol improvements

sajjoseph commented Jun 5, 2024

wendigo commented Jun 5, 2024

himanshpal commented Jun 5, 2024 • edited Loading

wendigo commented Jun 6, 2024

losipiuk commented Jun 6, 2024

mosabua commented Jun 19, 2024

mosabua commented Jun 19, 2024

wendigo commented Jun 19, 2024

mosabua commented Jun 24, 2024 • edited Loading

nickalexander053 commented Jun 27, 2024

wendigo commented Jun 27, 2024

nickalexander053 commented Jun 27, 2024

wendigo commented Jun 27, 2024 • edited Loading

shohamyamin commented Jun 28, 2024

wendigo commented Jun 28, 2024

sajjoseph commented Jul 15, 2024

FHTMitchell commented Oct 25, 2024

wendigo commented Oct 25, 2024

wendigo commented Oct 25, 2024

FHTMitchell commented Oct 25, 2024 • edited Loading

mosabua commented Oct 25, 2024 • edited Loading

mosabua commented Oct 25, 2024

losipiuk commented Oct 28, 2024

wendigo commented Jun 5, 2024 •

edited

Loading

himanshpal commented Jun 5, 2024 •

edited

Loading

mosabua commented Jun 24, 2024 •

edited

Loading

wendigo commented Jun 27, 2024 •

edited

Loading

FHTMitchell commented Oct 25, 2024 •

edited

Loading

mosabua commented Oct 25, 2024 •

edited

Loading