From c4fb75bed98fd61626c99cf38dda721e81aa3a3f Mon Sep 17 00:00:00 2001 From: Andy Chu Date: Wed, 18 Mar 2020 23:20:44 -0700 Subject: [PATCH] [doc] Draft data-model, and add process-model. Move some content to doc/legacy-array.md. --- build/doc.sh | 1 + doc/data-model.md | 364 +++++++++++++++++++------------------------ doc/index.md | 3 +- doc/legacy-array.md | 155 ++++++++++++++++++ doc/process-model.md | 58 +++++++ test/spec-bin.sh | 5 - 6 files changed, 378 insertions(+), 208 deletions(-) create mode 100644 doc/legacy-array.md create mode 100644 doc/process-model.md diff --git a/build/doc.sh b/build/doc.sh index 3a5da67cc8..be34b9d777 100755 --- a/build/doc.sh +++ b/build/doc.sh @@ -96,6 +96,7 @@ readonly MARKDOWN_DOCS=( # Internal stuff data-model + process-model architecture-notes toil ) diff --git a/doc/data-model.md b/doc/data-model.md index 006455da17..9bf8f010a2 100644 --- a/doc/data-model.md +++ b/doc/data-model.md @@ -5,16 +5,39 @@ in_progress: yes Data Model for OSH and Oil ========================== -This doc internal data structure in the Oil interpreter, and gives examples of -how you manipulate them with shell or Oil code. + + +It's confusing that shell has many syntaxes for the same semantics. For +example, in bash, these four statements do similar things: + +```sh-prompt +$ foo='bar' +$ declare -g foo=bar +$ x='foo=bar'; typeset $x +$ printf -v foo bar + +$ echo $foo +bar +``` + +In addition Oil, adds JavaScript-like syntax: + +``` +var foo = 'bar' +``` -The interpreter is "unified". +This syntax can express more data types, may also confuse new users. -- OSH semantics are based on: - - POSIX shell for strings - - bash and ksh for arrays and associative arrays. bash largely follows ksh - is the case of arrays. Its associative arrays are quirkier. -- TODO: Python coercions. +SoTtis document describes user-facing data structures in the Oil interpreter. +which should help users reason about the meaning of programs. + +A shortcut: after creating shell variables, use the `repr` builtin to inspect +them!
+## Design Goals -## Why Use this Information? - -The goal of Oil is to replace this quirky language. But we still made it -compatible. +### Simplify and Rationalize bash -If you want to write scripts compatible with OSH and bash. +POSIX shell has a fairly simple model: everything is a string, and `"$@"` is a +special case. +Bash adds many features on top of POSIX, including arrays and associative +arrays. Oil implements those features, and a few more. -## Oil's Data Model is Slightly Different Than Bash +However, it also significantly simplifies the model. -It's meant to be more sane. +A primary difference is mentioned in [Known Differences](known-differences.html): -See [Known Differences](known-differences.html). - -I salvaged these semantics. - -Worst of the language! Newest and most "grafted on". - -### Surprising Parsing - -Parsing bash is undecidable. - - A[x] - a[x] +- In bash, the *locations* of values are tagged with types, e.g. `declare -A + unset_assoc_array`. +- In Oil, *values* are tagged with types. This is how common dynamic languages + like Python and JavaScript behave. -### Surprising Coercions +In other words, Oil "salvages" the confusing semantics of bash and produces +something simpler, while still being very compatible. - Horrible - - a=('1 2' 3) - b=(1 '2 3') # two different elements - - [[ $a == $b ]] - [[ ${a[0]} == ${b[0]} ]] - - [[ ${a[@]} == ${b[@]} ]] - - -Associative arrays and being undefined - -- half an array type - - strict_array removes this - - case $x in "$@" -- half an associative array type +### Add New Features and Types -### Bugs +TODO -- test -v -- empty array conflicts with `set -o nounset` (in bash 4.3). I can't recommend - in good faith +- eggex type +- later: floating point type -## Memory +## High Level Description +### Memory Is a Stack Shell has a stack but no heap. It has values and locations, but no references/pointers. Oil adds references to data structures on the heap, which may be recurisve. +- The stack also has the **arguments array** which is spelled `"$@"` in shell, + and `@ARGV` in Oil. -### Undef, Str, Sequential/Indexed Arrays, Associative Array - -- "array" refers to both. - - although Oil has a "homogeneous array type" that's entirely different - - OSH array vs. Oil array -- no integers, but there is (( )) -- "$@" is an array, and "${a[@]}" too - - not true in bash -- it's fuzzy there - - but $@ and ${a[@]} are NOT arrays -- flags: readonly and exported (but arrays/assoc arrays shouldn't be exported) - - TODO: find that - -### Arrays Can't Be Nested and Can't Escape Functions - -- Big limitation! Lifting it in Oil -- You have to splice -- There's no Garbage collection. - -### OSH Doesn't have True Integers +### Functions and Variables Are Separate -We save those for Oil! +There are two distinct namespaces. For example: -There are lots of coercions instead. - -bash has '-i' but that's true anyway. - - -## Operations on All Variables +``` +foo() { + echo 'function named foo' +} +foo=bar # a variable; doesn't affect the function +``` -### assignment +### Variable Name Lookup with "Dynamic Scope" -### unset +OSH has it, but Oil limits it. -You can't unset an array in OSH? But you can in bash. +### Limitations of Arrays And Compound Data Structures -### readonly +Shell is a value-oriented language. -### export only applies to strings +- Can't Be Nested +- Can't Be Passed to Functions or Returned From Functions +- Can't Take References; Must be Copied +Example: -## Operations on Arrays +``` +declare -a myarray=("${other_array[@]}") # shell -### Initialization +var myarray = @( @other_array ) # Oil +``` - declare -a array - declare -a array=() +Reason: There's no Garbage collection. - declare -A assoc - # there is no empty literal here +### Integers and Coercion -Also valid, but not necessary since `declare` is local: +- Strings are coerced to integers to do math. +- What about `-i` in bash? - local -a array - local -A assoc -Makes a global array: +### Unix `fork()` Has Copy-On-Write Semantics - array=() +See the [Process Model](process-model.html) document. -### Array Literals -Respects the normal rules of argv. +## Key Data Types - prefix=foo - myarray=(one two -{three,four}- {5..8} *.py "$prefix*.py" '$prefix*.py') +TODO: [osh/runtime.asdl]($oil-src) - myarray=( - $var ${var} "$var" - $(echo hi) "$(echo hi)" - $(1 + 2 * 3) - ) + -### Associative Array Literals +### `cell` - (['k']=v) +TODO - Unlike bash, ([0]=v) is still an associative array literal. +- [export]($help) only applies to **strings** - It's not an indexed array literal. This matters when you take slices and - so forth? +### `value` +Undef, Str, Sequential/Indexed Arrays, Associative Array -### "${a[@]}" is Evaluating (Splicing) +- "array" refers to both. + - although Oil has a "homogeneous array type" that's entirely different + - OSH array vs. Oil array +- no integers, but there is (( )) +- "$@" is an array, and "${a[@]}" too + - not true in bash -- it's fuzzy there + - but $@ and ${a[@]} are NOT arrays +- flags: readonly and exported (but arrays/assoc arrays shouldn't be exported) + - TODO: find that - echo "${array[@]}" - echo "${assoc[@]}" +### `cmd_value` for shell builtins -Not Allowed, unlike in bash! +Another important type: - $assoc ${assoc} "${assoc}" - ${!assoc} ${assoc//pattern/replace} # etc. +``` + assign_arg = (lvalue lval, value? rval, int spid) + cmd_value = + Argv(string* argv, int* arg_spids, command__BraceGroup? block) + | Assign(builtin builtin_id, + string* argv, int* arg_spids, + assign_arg* pairs) +``` -### Iteration -Note that since a for loop takes an array of words, evaluating/splicing works: +## Printing State - for i in "${a1[@]}" "${a2[@]}"; do - echo $i - done +### Shell Builtins -### ${#a[@]} is the Length +Oil supports various shell and bash operations to view the interpretr state. +- `set` prints variables and their values +- `set -o` prints options +- `declare/typeset/readonly/export -p` prints a subset of variables +- `test -v` tests if a variable is defined. - echo ${#array[@]} - echo ${#assoc[@]} +### [repr]($help) in Oil +Pretty prints a cell. -### Coercion to String by Joining Elements +This is cleaner! - echo ${!array[@]} - echo ${!assoc[@]} +TODO: What about functions - echo ${!array[*]} - echo ${!assoc[*]} - echo "${!array[*]}" - echo "${!assoc[*]}" -### Look Up By Index / Key With a[] - matrix: - a['x'] a["x"] - a["$x"] - a[$x] - a[${x}] - a[${x#a}] +## Modifying State - a[x] -- allowed - A[x] -- NOT allowed? It should be a string +### Oil Keywords - (( 'a' )) -- parsed, but can't evaluate +TODO: See Oil Keywords doc. - # This is a string in both cases - a[0] - A[0] +### Shell Assignment Builtins: declare/typeset, readonly, export +... -undef[0]=1 automatically creates an INDEXED array -undef=(1) +### [unset]($help) -### Assign / Append To Location Specified by Index / Key +You can't unset an array in OSH? But you can in bash. - a[expr]= # int_coerce - A[expr]= # no integer coercion +### Other Builtins -Just like you can append to strings: +- [read]($help). Sometimes sets the magic `$REPLY` variable. +- [getopts]($help) - s+='foo' -Append to elements of an array, which are strings: +## Links - a[x+1]+=x - a[x+1]+=$x +- +- -### Slicing With ${a[@]:5:2} +## Appendix: Bash Issues - ${array[@]:1:3} + - # TODO: disallow this? because no order - ${assoc[@]:1:3} +### Strings and Arrays Are Confused + Horrible -NOTE: string slicing: + a=('1 2' 3) + b=(1 '2 3') # two different elements + [[ $a == $b ]] + [[ ${a[0]} == ${b[0]} ]] + [[ ${a[@]} == ${b[@]} ]] -### Append Array to Array - a=(1 2 3) - a+=(4 5 6) +Associative arrays and being undefined +- half an array type + - strict_array removes this + - case $x in "$@" +- half an associative array type -### Get All Indices With ${!a[@]} +### Indexed Arrays and Associative Arrays Are Confused - echo ${!array[@]} - echo ${!assoc[@]} +### Empty and Unset Are Confused +- empty array conflicts with `set -o nounset` (in bash 4.3). I can't recommend + in good faith. -### Vectorized String Operations + - echo ${array[@]//x/X} - echo ${assoc[@]//x/X} + - -## Links - -- -- +--> diff --git a/doc/index.md b/doc/index.md index d31e27ecee..e3de0a1e72 100644 --- a/doc/index.md +++ b/doc/index.md @@ -64,7 +64,8 @@ These docs span both OSH and Oil. Internal Details: - [Data Model](data-model.html) -- The interpreter. - - TODO: Rules for scope +- [Process Model](process-model.html). Shell is a language that lets you use + low-level Unix constructs. - [Architecture Notes](architecture-notes.html) -- The interpreter - [Error List](errors.html) - [Toil](toil.html). Continuous Testing on Many Platforms. diff --git a/doc/legacy-array.md b/doc/legacy-array.md new file mode 100644 index 0000000000..80fdbaf257 --- /dev/null +++ b/doc/legacy-array.md @@ -0,0 +1,155 @@ +--- +in_progress: yes +--- + +Draft +===== + +## Operations on Arrays + +### Initialization + + declare -a array + declare -a array=() + + declare -A assoc + # there is no empty literal here + +Also valid, but not necessary since `declare` is local: + + local -a array + local -A assoc + +Makes a global array: + + array=() + +### Array Literals + +Respects the normal rules of argv. + + prefix=foo + myarray=(one two -{three,four}- {5..8} *.py "$prefix*.py" '$prefix*.py') + + myarray=( + $var ${var} "$var" + $(echo hi) "$(echo hi)" + $(1 + 2 * 3) + ) + +### Associative Array Literals + + (['k']=v) + + Unlike bash, ([0]=v) is still an associative array literal. + + It's not an indexed array literal. This matters when you take slices and + so forth? + + +### "${a[@]}" is Evaluating (Splicing) + + echo "${array[@]}" + echo "${assoc[@]}" + +Not Allowed, unlike in bash! + + $assoc ${assoc} "${assoc}" + ${!assoc} ${assoc//pattern/replace} # etc. + + +### Iteration + +Note that since a for loop takes an array of words, evaluating/splicing works: + + for i in "${a1[@]}" "${a2[@]}"; do + echo $i + done + +### ${#a[@]} is the Length + + + echo ${#array[@]} + echo ${#assoc[@]} + + +### Coercion to String by Joining Elements + + echo ${!array[@]} + echo ${!assoc[@]} + + echo ${!array[*]} + echo ${!assoc[*]} + + echo "${!array[*]}" + echo "${!assoc[*]}" + +### Look Up By Index / Key With a[] + + matrix: + a['x'] a["x"] + a["$x"] + a[$x] + a[${x}] + a[${x#a}] + + a[x] -- allowed + A[x] -- NOT allowed? It should be a string + + (( 'a' )) -- parsed, but can't evaluate + + # This is a string in both cases + a[0] + A[0] + + +undef[0]=1 automatically creates an INDEXED array +undef=(1) + +### Assign / Append To Location Specified by Index / Key + + a[expr]= # int_coerce + A[expr]= # no integer coercion + +Just like you can append to strings: + + s+='foo' + +Append to elements of an array, which are strings: + + a[x+1]+=x + a[x+1]+=$x + +### Slicing With ${a[@]:5:2} + + ${array[@]:1:3} + +Note the presence of DISALLOWED VALUES. + + + # TODO: disallow this? because no order + ${assoc[@]:1:3} + + +NOTE: string slicing: + + + +### Append Array to Array + + a=(1 2 3) + a+=(4 5 6) + + +### Get All Indices With ${!a[@]} + + echo ${!array[@]} + echo ${!assoc[@]} + + +### Vectorized String Operations + + echo ${array[@]//x/X} + + echo ${assoc[@]//x/X} + diff --git a/doc/process-model.md b/doc/process-model.md new file mode 100644 index 0000000000..63ab4057c9 --- /dev/null +++ b/doc/process-model.md @@ -0,0 +1,58 @@ +--- +in_progress: yes +--- + +Process Model +============= + + + + +Related: [Data Model](data-model.html). These two docs are the missing +documentation for shell! + + +
+
+ +## Constructs + + +### Pipelines + +- `shopt -s lastpipe` + +### Functions Can Be Transparently Put in Pipelines + + +### Explicit Subshells are Rarely Needed + +- prefer `pushd` / `popd`, or `cd { }` in Oil. + +### Redirects + + +### Other + +- xargs, xargs -P +- find -exec + + + +## Builtins + +### [wait]($help) + +### [fg]($help) + +### [bg]($help) + +### [trap]($help) + + + diff --git a/test/spec-bin.sh b/test/spec-bin.sh index cc12c7ba9b..c6c7b1f8f9 100755 --- a/test/spec-bin.sh +++ b/test/spec-bin.sh @@ -17,11 +17,6 @@ # all at once with: # # test/spec-bin.sh all-steps -# -# Could also build these: -# - coreutils -# - re2c for the OSH build (now in build/codegen.sh) -# - cmark set -o nounset set -o pipefail