-
-
Notifications
You must be signed in to change notification settings - Fork 164
OSH Word Evaluation Algorithm
OBSOLETE, See http://www.oilshell.org/release/latest/doc/simple-word-eval.html
This page documents a portion of the OSH implementation. It differs significantly from other shells in this respect.
- They tend to use a homogeneous tree with various flags on nodes (e.g.
nosplit
,assignment
, etc.). - OSH uses a typed, heterogeneous tree (now statically checked with MyPy).
For example:
word_part =
Literal(...)
| BracedVarSub(...)
| CommandSub(...)
| SingleQuoted(...)
| DoubleQuoted(...)
| ...
https://github.com/oilshell/oil/blob/master/frontend/syntax.asdl#L107
Notes:
- Specifying ML-like data structures with ASDL was an implementation style borrowed from CPython itself: see posts tagged #ASDL.
- Smoosh is written in OCaml and Lem and also uses a typed, heterogeneous tree in some places. However, it also has a notion of "control codes", probably inherited from libdash, and word expansion operates on these control codes (section 4.2 of the paper).
- As much parsing as possible is done in a single pass, with lexer modes.
- There are subsequent tweaks for detecting assignments, tildes, etc. They re-write small parts of the syntax tree, but are not a full parsing pass.
- There is a "metaprogramming" pass for brace expansion:
i=0; echo {$((i++)),foo,bar}
. (Bash brace expansion syntax is more like metaprogramming, whereas zsh implements it more like word evaluation.)
There are three stages (not four as in POSIX). EvalWordSequence2
in osh/word_eval.py
is a tiny function that shows the stages explicitly.
- Evaluation of the typed tree.
- There is also a restricted variant of word evaluation for completion, e.g. so arbitrary processes aren't run with you hit TAB.
-
part_value = String(...) | Array(...)
inosh/runtime.asdl
is an important intermediate data type.
- Splitting with IFS. Ths is specified with a state machine in
osh/split.py
. (I think OSH is unique in this regard too.)- Splitting involves the concept of "frames", to handle things like
x='a b'; y='c d'; echo $x"${@}"$y
. The last part of$x
has to be joined withargv[0]
, andargv[n-1]
has to be joined with$y
.
- Splitting involves the concept of "frames", to handle things like
- Globbing.
There is no such thing as "quote removal" in OSH (e.g. any more than a Python or JavaScript interpreter has "quote removal"). It's just evaluation.
- Caveat: Bug with
IFS='\'
- Splitting and globbing are separate stages, but have "dependencies" because of statements like
echo $prefix_that_could_be_split/"constant string"*.sh
- Internally, splitting and globbing both use
\
to inhibit "evaluation". That is,\*
is an escaped glob. And\
is an escaped space (IFS character). - This causes problems when
IFS='\'
. I think I could choose a different character for OSH, maybe even theNUL
byte.
- Splitting and globbing are separate stages, but have "dependencies" because of statements like
OSH wants to treat all sublanguages uniformly. (Command, Word, Arith, and the non-POSIX bool [[
) are the main sublanguages.)
For some "dynamic" sublanguages like builtin flag syntax, we fall a bit short, but that could change in the future.
This matters for interactive completion, where it would be useful to understand every sublanguage statically.
For example, note that you can have variable references in several sublanguages:
Static:
-
x=1
-- assignments are in the command language -
echo ${x:-${x:-$y}}
-- word language -
echo $(( x == y ))
-- arithmetic language -
[[ $x -eq $y ]]
-- boolean language
Dynamic:
-
code='x=1'; readonly $code
-- the dynamic builtin language - Other builtins that manage variables:
getopts
read
unset
-
printf -v
in bash
- Variable references in
${!x}
in bash/ksh
Claim: There should only be one step to word evaluation! I prefer to get rid of multi-stage "expansion" as a notion in shell.
- Splitting is a mistake; a holdover from when shell had no arrays. See Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays
- Dynamically parsed globs are a mistake (just like dynamically parsed arithmetic is). Shell already has
eval
, so you lose nothing by making globs static.
The Oil language will statically parse globs, and I may add something like shopt -s static-glob
to OSH in the short term.