Skip to content

Structured Data Over Pipes

andychu edited this page May 30, 2019 · 11 revisions

Up: Structured Data in Oil

Takeaway from http://www.oilshell.org/blog/2017/09/19.html

Minimal Solution, That Basically Exists

  • Use % format strings, as is a common convention.
  • Implement a way to output the NUL byte. git log uses %x00, and find -printf uses \0.
  • Use UTF-8 encoding for strings. \0 can't appear in UTF-8 strings, except as a terminator.

Advantages:

  • It's already a common practice. See Unix Tools.
    • Works with xargs -0 (which was meant for find -print0)
    • %x00 is a trivial patch if it exists.
  • You can save serialization cost by selecting the fields you want.
  • In an escaping context, you can make this safe against against adversarial input.

Disadvantages:

  • %s is not that readable. But this can be mitigated by Oil Metaprogramming. That is, turning it into "hash: $hash commit: $commit".

Other Solutions

  • JSON for structured (and proper escaping)
  • CSV for tabular data (and proper escaping)
    • Also need a foo.csv_schema for the types. JSON has types in the data encoding, but CSV doesn't.
  • Provide %#s for a length prefix, for truly binary data. What use cases exist?
    • Alternative: base64 encode
    • Alternative: pass the file system path of the file (could be in memory on tmpfs).
  • Netstrings -- for fixed formats

Languages with Pipe-Like Features

  • Elm: Understanding Pipes in Elm -- In Elm we have the pipe operator (<| or |>). The pipes can be |> (pipe forward) and <|(pipe backward). It represents how the data is being passed.
  • Elixir
  • R -- with %>% and -> and magritter
  • Tulip shell-like / Haskell-like language
  • Clojure ?
  • Haskell ?

I don't think Julia has it.

Clone this wiki locally