Replies: 3 comments 3 replies
-
I'll soon start having to make test cases for the validator, which in part I've been dreading/postponing because the JSON for them would be so verbose, so having anything that abbreviates it would be great (though part of it is also that prost, the protobuf code gen I'm using, only supports the binary format... so I was planning to rely on the Python bindings for the validator for the bulk of the testing, so I can use the Python protobuf library to convert between JSON and binary). As for a more explain-y, debugging output, I made a HTML exporter in the validator, which currently looks like this for the plan in your gist. My plan is that, once the validator starts to actually understand the plans (I've primarily been working on infrastructure thus far, the actual validation is lagging far behind), the tree will include human-readable notes for things like expressions so you don't have to unfold that mess to see what things mean, and the relation graph should also include a short summary of what each relation does. Also, the JS for jumping between nodes still needs some work, and I should probably factor out fontawesome (or at least figure out the actual license and embed the SVGs for the icons if allowed). But, obviously, this is an output-only format. NOSJ is primarily interesting to me for writing plans manually. I like the backronym by the way :) Is the Python code for this already public somewhere? |
Beta Was this translation helpful? Give feedback.
-
This is super cool stuff @saulpw. My slowness in response should not be equated to by excitement about this. I've been thinking about this a bunch. The current proposal is not a human-first format (@saulpw comments call this out). I think the proposal outlined here is useful and is far easier to read than raw proto json, I think we should have the canonical text representation be designed for humans first as opposed to something derived from the protobuf serialized objects (which have there own weird shapes because of things like proto's rules). TLDR; I vote to offer this as a useful tool for users but let's not call it the text format. I think that needs to be built human-first. |
Beta Was this translation helpful? Give feedback.
-
Hmm, I didn't see this before I started all the work on a text format in substrait-io/substrait-cpp. The new form is less tied to the protobuf though so it may be even more useful than what we have here (although I suspect that this implementation is more complete). |
Beta Was this translation helpful? Give feedback.
-
As I play around with Substrait, I'm finding the default JSON representation of Substrait protobuf messages to be unwieldy: hard to read, unpleasant to diff, and impossible to write. I think a more compact and nicer Substrait text format would make it easier to develop, debug, and hack on Substrait. [This is related to Discussion #11, which I didn't see when I wrote this.]
To that end, I've developed a prototype for a "Substrait assembly language", which has some neat features:
The basic format looks like this (without any macro substitutions):
I think this is already a big improvement over the original JSON, in that it removes most of the unnecessary line noise that JSON itself requires. With a few macros, it gets even more compact and easier to read:
I believe @jacques-n has some stated interest in a higher-level Substrait language, designed top-down from the spec (instead of bottom-up from the protobuf JSON output). He suggested that the output from
EXPLAIN
in various SQL engines could be a source of inspiration, as it's familiar to many data people. To that end, I compiled a gist of TPC-H query 3EXPLAIN
output from several SQL engines, alongside the Substrait plan in current JSON protobuf output (from isthmus) and in this proposed and tentatively-named NOSJ (Nicer Object Syntax for JSON) format.This prototype already hits many of the design goals I was aiming for, but one essential element that's currently missing, is the ability to refer to functions/columns/tables by name instead of by anchor/ordinal. So far, the NOSJ syntax is nicely generic and could apply as well to any JSON object; only the macros are Substrait-specific. I'm hoping to make the base format extensible enough that at least something like
&and:bool
could be used inline as a function reference, and automatically add the function and its anchor to the outermost.extensions
list.What do y'all think of this format? Would this improve your workflow when dealing with Substrait plans?
Beta Was this translation helpful? Give feedback.
All reactions