-
-
Notifications
You must be signed in to change notification settings - Fork 162
Structured Data Over Pipes
andychu edited this page May 30, 2019
·
11 revisions
Takeaway from http://www.oilshell.org/blog/2017/09/19.html
- Use
%
format strings, as is a common convention. - Implement a way to output the
NUL
byte.git log
uses%x00
, andfind -printf
uses\0
. - Use UTF-8 encoding for strings.
\0
can't appear in UTF-8 strings, except as a terminator.
Advantages:
- It's already a common practice. See Unix Tools.
- Works with
xargs -0
(which was meant forfind -print0
) -
%x00
is a trivial patch if it exists.
- Works with
- You can save serialization cost by selecting the fields you want.
- In an escaping context, you can make this safe against against adversarial input.
Disadvantages:
-
%s
is not that readable. But this can be mitigated by Oil Metaprogramming. That is, turning it into"hash: $hash commit: $commit"
.
- JSON for structured (and proper escaping)
- CSV for tabular data (and proper escaping)
- Also need a foo.csv_schema for the types. JSON has types in the data encoding, but CSV doesn't.
- Provide
%#s
for a length prefix, for truly binary data. What use cases exist?- Alternative: base64 encode
- Alternative: pass the file system path of the file (could be in memory on
tmpfs
).
- Netstrings -- for fixed formats
- Elm: Understanding Pipes in Elm -- In Elm we have the pipe operator (<| or |>). The pipes can be |> (pipe forward) and <|(pipe backward). It represents how the data is being passed.
- Elixir
- R -- with
%>%
and->
and magritter - Tulip shell-like / Haskell-like language
- Clojure ?
- Haskell ?
I don't think Julia has it.