Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unset should unshadow variables higher on the stack (at least for nonlocals) #706

Closed
andychu opened this issue Apr 15, 2020 · 19 comments
Closed

Comments

@andychu
Copy link
Contributor

andychu commented Apr 15, 2020

This is the issue from Michael Greenberg, who is working on POSIX shell (Smoosh, the executable formal semantics). So even though POSIX doesn't have locals, it has temp bindings, and this also affects unset!

#329

I didn't refresh my memory of this fully yet, but I think we can work it out between these two issues ... this is why we have the test cases test/spec.sh assign -r 24-27, noted in the commit

aac8712


Looking at case 24 only, I think I went with that behavior because only dash and zsh agree. Every other shell disagrees to an extent.

Since people rely on the bash idiom, it's probably better to change it to be closer to that, but I'm not sure exactly how. So yeah having the exact test cases for ble.sh will help, e.g. in spec/ble-idioms.tset.sh.

Originally posted by @andychu in #653 (comment)

@andychu
Copy link
Contributor Author

andychu commented Apr 15, 2020

@akinomyoga Let's talk about the unset issue here because it will probably be a long conversation :-)

I think if we can reconcile our test cases / results it will be easier to see the solution

@andychu
Copy link
Contributor Author

andychu commented Apr 15, 2020

Also, if it helps, we could upgrade the version in test/spec-bin.sh to bash 5.0 rather than 4.4


And here are the 4 test cases that fail with the new behavior. But they aren't set in stone. I mainly want to see if we're missing any cases before figuring out what to do

http://travis-ci.oilshell.org/jobs/2020-04-15__02-09-39.wwz/_tmp/spec/osh.html

andychu pushed a commit that referenced this issue Apr 15, 2020
Still need to figure out what the right behavior is, but this makes spec
tests pass again.
@mgree
Copy link

mgree commented Apr 17, 2020

That is hysterical. I can't see from the test results... what's your test case for temp bindings and/or locals and unset? On my tests for nested function calls with temporary bindings:

f() {
    x=$((x+1))
    unset x
}

g() {
    x=$((x+1)) f
    post_f=$x
}

h() {
    orig=$x
    x=$((x+1)) g
    post_g=$x
    case $x in
        ( $((orig+3)) ) echo "$1 UNNESTED VALUES (BUT NESTED UNSET)" ;;
        ( ${orig} ) echo "$1 NESTED VALUES WITH NESTED UNSET" ;;
        ( "" ) echo "$1 HAS GLOBAL UNSET" ;;
        ( * ) echo "unexpected value: '$x'";;
    esac
}

x=0
h GLOBAL

x=0 h LOCAL

In bash, dash, Smoosh, and zsh, I get NESTED VALUES WITH NESTED UNSET for both GLOBAL and LOCAL bindings on h. In yash, mksh, and ksh, I get GLOBAL UNSET for both GLOBAL and LOCAL bindings on h.

Only a very strange implementation would treat temp bindings and locals differently, but you could, I guess? Nobody does, as far as I can tell. (I didn't test osh or bosh or other shells.)

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

Yeah I feel like there should be a consistent rule, but I haven't reconciled that with the test results yet.

FWIW this is the page with some idioms that rely on subtle bash behavior: https://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference

I think we have to come up with some better test cases ...

Also in Koichi's table on #653, there is sometimes a different in behavior on whether the variable being unset is in the current scope (local), or in a frame above (dynamic scope). I would like to avoid those special cases if possible...


Does Smoosh have the concept of a "cell"? That is, it's a location for a value plus its metadata. I think it must, because you can have the -x exported bit on a variable that's unset, etc.

I don't know if POSIX mentions anything like that. But the two different behaviors we are experimenting with is if unset should remove the cell, or if it should fill the cell with the Undef value (which is what fails on set -u, etc.).

As far as I remember, that was the exact same distinction that we played with back in 2019 for the temp bindings... But I don't remember it more clearly.

Anyway I think it will be possible to get to the bottom of this but I just wanted to loop you in, in case you had an opinion.


Here's a more concrete question: of the two different behaviors you observed, is one of them POSIX compliant and the other isn't, or does POSIX simply not specify it?

@mgree
Copy link

mgree commented Apr 17, 2020

From Section 2.9.1 of the POSIX spec:

If the command name is a function that is not a standard utility implemented as a function, variable assignments shall affect the current execution environment during the execution of the function. It is unspecified:

  • Whether or not the variable assignments persist after the completion of the function
  • Whether or not the variables gain the export attribute during the execution of the function
  • Whether or not export attributes gained as a result of the variable assignments persist after the completion of the function (if variable assignments persist after the completion of the function)

Unfortunately, none of this has any bearing on unset,readonly, or shadowed bindings. And local itself is unspecified, so it can really do whatever you want, so far as I can tell.

On the following script, bash/dash/zsh/Smoosh all yield "reverts to global", while other yash/ksh/mksh yield "overrides":

ORIG=5
NEW=6
OVERRIDE=10


f() {
    if [ "$x" = "$ORIG" ]
    then
        echo "unshadowed"
    fi
    x=$OVERRIDE
}


x=$ORIG
x=$NEW f
if [ "$x" = "$ORIG" ]
then
    echo "reverts to global"
elif [ "$x" = "$OVERRIDE" ]
then
    echo "overrides"
elif [ "$x" = "$NEW" ]
then
    echo "reverts to temp binding"
else
    echo "unexpected value '$x'"
fi

As for Smoosh, you can see the binding code in os.lem.

The logic of where metadata is stored is... unhappy. Locals and globals are treated differently.

Local environments store a possible value for the string (where Nothing means unset and Just "" means the empty string) right along with the readonly/export metadata (called local_opts).

The top-level global environments have (a) a map defining the values of variables (unset values aren't in the map, set values map to possibly empty symbolic strings), (b) a set of names marked for export, and (c) a set of names marked readonly.

So morally, yes, I have "cells". Notably, they're not mutable---nothing is! That's why smoosh is sooooooo slow (4x off other shells written in C, haven't compared to OSH).

andychu pushed a commit that referenced this issue Apr 17, 2020
We don't set the cell to Undef.

Changed assertions in 'spec/assign'.

We no longer match dash/zsh in most cases.

- Case 24 we have unique behavior, but we always did (/)
- We match bash with one case
- We match mksh with two cases

Addresses issue #706.

This is sort of an experiment.  Let's see if ble.sh runs.  Either way
the semantics are simple, and this appears to be closer to what it
wants.
@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

@akinomyoga I changed Oil back to "delete the cell", which makes one of the test cases you gave pass.

Can you test it out more and tell me if the rest of ble.sh works? Adding more test cases in spec/ble-idioms is welcome if necessary.

If this works I think it's fine as the behavior, since POSIX doesn't seem to address it, and shells diverge a lot anyway. This is apparently the part of shell where shells disagree the most.

I believe the only reason we had it this way is because zsh and dash seemed to mostly agree, not because any program needed the behavior.

This makes Oil behave more like bash and mksh, and less like dash and zsh. At least for the test cases I have. I'm not that confident they're comprehensive though...


Thanks @mgree for looking into it and for the background. Oil still prints "reverts to global" on that case.

@akinomyoga
Copy link
Collaborator

But it is even more complicated than local scope and dynamic scope, because you also have these "temp bindings" to worry about. #653 (comment) by @andychu

I'm sorry, it took much time to investigate the behavior of these temporary bindings (tempenv). I have never tested the interaction of unset and tempenv, so I was investigating it with Bash and puzzled by its strange behavior. I think now I came to a conclusion. Here I summarize my investigation.

1. Bug in Bash-4.3..5.0 and tempenv/local/unset

I was searching inside the Bash source code (where I found that these temporary bindings are called tempenv in the Bash source code). Finally it turned out that actually there is a bug in Bash-4.3..5.0, which was now fixed in the devel branch of Bash. Also, the treatment of tempenv has been largely changed from Bash 4.3, so we need to test with the devel branch of Bash to know its behavior.

Details of Bug
#!/bin/bash

# bash 4.3..5.0 bug

f1() {
  local v=local
  unset -v v # this is local-scope unset
  echo "$v"
}

v=global
v=tempenv f1

# Results:
#   Bash 2.05b-4.2    outputs "v: (unset)"
#   Bash 4.3-5.0      outputs "v: global"
#   Bash devel branch outputs "v: (unset)"

The fix was made just about two months ago in the commit f65f3d54 (commit bash-20200207 snapshot). This is the related ChangeLog:

            2/6
            ---
variables.c
  - make_local_variable: make sure local variables that have the same
    names as variables found in the temporary environment are marked as
    local. From Grisha Levit <address@hidden> back in 12/2018

This bug was actually reported one year and a half ago.

https://lists.gnu.org/archive/html/bug-bash/2018-12/msg00031.html

From: Grisha Levit
Subject: should 'local' create local variables when they exist in the tempenv?
Date: Sun, 9 Dec 2018 01:30:53 -0500

When a variable is present in the temporary environment and then declared local in a function, it seems to not actually make a local variable, in the sense that the variable does not show up in the output of `local', unsetting the variable reveals the variable from the higher scope rather than marking it invisible, etc.

$ f() { local v=x; local -p; }; v=t f

$ f() { local v; declare -p v; }; v=t f
declare -x v="t"

$ f() { local v=x; unset v; declare -p v; }; v=g; v=t f
declare -- v="g"

Is this intentional?

2. Interaction of tempenv/localvar/eval/unset

Here I summarize the behavior in Bash 4.3+. The treatment of tempenv has been largely changed in Bash 4.3. And there was an additional fix in the devel branch which will be released in Bash 5.1. I haven't checked the actual implementation in detail, but the observable behavior can be explained by the following model.

The function has its own variable context (let us call it local_context). In addition, we can create multilevel nested variable contexts in a function scope by v=xxx eval '...'. There are two types of the variable defined in any of the function-scope contexts: tempenv and localvar. Each variable has a flag to indicate if it is a tempenv or localvar.

Behavior of function call with tempenv

The function call of the form v=xxx fn creates a tempenv in local_context of fn.

Behavior of builtin local

When one attempts to create a variable by local builtin, Bash first searches an existing variable cell with the same name.

  • If localvar or tempenv is found in the current function, it reuses the cell. When it is tempenv, Bash removes the tempenv flag to turn it into localvar. When a value is specified for local, the existing value is overwritten by the specified value.
  • Otherwise, Bash creates a new localvar in local_context of the current function. If a value is specified for local, the variable is initialized with the specified value. Otherwise, if shopt -s localvar_inherit is set or there is an existing tempenv, the value is inherited from the existing variable. Otherwise, the value is Undef.

The behavior is complicated, so I don't think Oil should exactly follow Bash behavior particularly for the case where eval (nested contexts in a function) is involved (until we find any Bash script that uses such a structure like v=xxx eval 'unset v or v=xxx eval 'local v. ble.sh doesn't use such a structure).

Behavior of builtin unset

unset can have two different behaviors: local-scope unset (value-unset) and dynamic unset (cell-unset). If shopt -s localvar_unset is set or the target variable is localvar in the current function, unset performs value-unset. Otherwise, unset is always cell-unset.

To implement it in Oil, you can forget about the option shopt -s localvar_unset. Then, the rule is simple: If the variable is localvar of the current function scope, value-unset is prformed. Otherwise, cell-unset is performed.

Test cases

Here are test cases to demonstrate the above behavior. Results are obtained by the devel branch of Bash. I will create PR for spec tests later.

1. local-scope/dynamic unset (local)

#!/bin/bash

unlocal() { unset -v "$1"; }

f1() {
  local v=local
  unset v
  echo "[$1,local,(unset)] v: ${v-(unset)}"
}
v=global
f1 global

f1() {
  local v=local
  unlocal v
  echo "[$1,local,(unlocal)] v: ${v-(unset)}"
}
v=global
f1 global

Result

[global,local,(unset)] v: (unset)
[global,local,(unlocal)] v: global

2. local-scope/dynamic unset (tempenv&local)

local mutates tempenv to localvar rather than shadows it.

#!/bin/bash

unlocal() { unset -v "$1"; }

f1() {
  local v=local
  unset v
  echo "[$1,local,(unset)] v: ${v-(unset)}"
}
v=global
v=tempenv f1 global,tempenv

f1() {
  local v=local
  unlocal v
  echo "[$1,local,(unlocal)] v: ${v-(unset)}"
}
v=global
v=tempenv f1 global,tempenv

Result

[global,tempenv,local,(unset)] v: (unset)
[global,tempenv,local,(unlocal)] v: global

3. local-scope/dynamic unset (tempenv)

unset for tempenv is always dynamic unset.

#!/bin/bash

unlocal() { unset -v "$1"; }

f1() {
  unset v
  echo "[$1,(unset)] v: ${v-(unset)}"
}
v=global
v=tempenv f1 global,tempenv

f1() {
  unlocal v
  echo "[$1,(unlocal)] v: ${v-(unset)}"
}
v=global
v=tempenv f1 global,tempenv

Result

[global,tempenv,(unset)] v: global
[global,tempenv,(unlocal)] v: global

4. tempvar through eval

While v=xxx fn creates tempenv in local_context of fn, v=xxx eval fn creates tempenv outside of the function fn.

#!/bin/bash

unlocal() { unset -v "$1"; }

f5() {
  echo "[$1] v: ${v-(unset)}"
  local v
  echo "[$1,local] v: ${v-(unset)}"
  ( unset v
    echo "[$1,local+unset] v: ${v-(unset)}" )
  ( unlocal v
    echo "[$1,local+unlocal] v: ${v-(unset)}" )
}
v=global
f5 global
v=tempenv f5 global,tempenv
v=tempenv eval 'f5 "global,tempenv,(eval)"'

Result

[global] v: global
[global,local] v: (unset)
[global,local+unset] v: (unset)
[global,local+unlocal] v: global
[global,tempenv] v: tempenv
[global,tempenv,local] v: tempenv
[global,tempenv,local+unset] v: (unset)
[global,tempenv,local+unlocal] v: global
[global,tempenv,(eval)] v: tempenv
[global,tempenv,(eval),local] v: tempenv
[global,tempenv,(eval),local+unset] v: (unset)
[global,tempenv,(eval),local+unlocal] v: tempenv

5. local inherits the value of tempenv

It doesn't inherit the value of normal exported variables. It only inherits the value of tempenv.

#!/bin/bash

f1() {
  local v
  echo "[$1,(local)] v: ${v-(unset)}"
}
f2() {
  f1 "$1,(func)"
}
v=global
v=tempenv f2 global,tempenv
(export v=global; f2 xglobal)

Result

[global,tempenv,(func),(local)] v: tempenv
[xglobal,(func),(local)] v: (unset)

6. v=xxx eval ''

v=xxx eval '...' can create a nested variable context in a function. local mutates tempenv to localvar.

#!/bin/bash

f1() {
  local v=local1
  echo "[$1,local1] v: ${v-(unset)}"
  v=tempenv2 eval '
    echo "[$1,local1,tempenv2,(eval)] v: ${v-(unset)}"
    local v=local2
    echo "[$1,local1,tempenv2,(eval),local2] v: ${v-(unset)}"
  '
  echo "[$1,local1] v: ${v-(unset)} (after)"
}
v=global
v=tempenv1 f1 global,tempenv1

Result

[global,tempenv1,local1] v: local1
[global,tempenv1,local1,tempenv2,(eval)] v: tempenv2
[global,tempenv1,local1,tempenv2,(eval),local2] v: local2
[global,tempenv1,local1] v: local1 (after)

7. local-scope/dynamic unset (nested context localvar)

#!/bin/bash

unlocal() { unset -v "$1"; }

f2() {
  local v=local1
  v=tempenv2 eval '
    local v=local2
    (unset v  ; echo "[$1,local1,tempenv2,(eval),local2,(unset)] v: ${v-(unset)}")
    (unlocal v; echo "[$1,local1,tempenv2,(eval),local2,(unlocal)] v: ${v-(unset)}")
  '
}
v=tempenv1 f2 global,tempenv1

Result

[global,tempenv1,local1,tempenv2,(eval),local2,(unset)] v: (unset)
[global,tempenv1,local1,tempenv2,(eval),local2,(unlocal)] v: local1

8. dynamic unset (nested context localvar x3)

#!/bin/bash

unlocal() { unset -v "$1"; }

f3() {
  local v=local1
  v=tempenv2 eval '
    local v=local2
    v=tempenv3 eval "
      local v=local3
      echo \"[\$1/local1,tempenv2/local2,tempenv3/local3] v: \${v-(unset)}\"
      unlocal v
      echo \"[\$1/local1,tempenv2/local2,tempenv3/local3] v: \${v-(unset)} (unlocal 1)\"
      unlocal v
      echo \"[\$1/local1,tempenv2/local2,tempenv3/local3] v: \${v-(unset)} (unlocal 2)\"
      unlocal v
      echo \"[\$1/local1,tempenv2/local2,tempenv3/local3] v: \${v-(unset)} (unlocal 3)\"
      unlocal v
      echo \"[\$1/local1,tempenv2/local2,tempenv3/local3] v: \${v-(unset)} (unlocal 4)\"
    "
  '
}
v=global
v=tempenv1 f3 global,tempenv1

Result

[global,tempenv1/local1,tempenv2/local2,tempenv3/local3] v: local3
[global,tempenv1/local1,tempenv2/local2,tempenv3/local3] v: local2 (unlocal 1)
[global,tempenv1/local1,tempenv2/local2,tempenv3/local3] v: local1 (unlocal 2)
[global,tempenv1/local1,tempenv2/local2,tempenv3/local3] v: global (unlocal 3)
[global,tempenv1/local1,tempenv2/local2,tempenv3/local3] v: (unset) (unlocal 4)

9. dynamic unset by unlocal (nested context tempenv x3)

unset removes the cell for each nested context one by one.

#!/bin/bash

unlocal() { unset -v "$1"; }

f4.unlocal() {
  v=tempenv2 eval '
    v=tempenv3 eval "
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)}\"
      unlocal v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unlocal 1)\"
      unlocal v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unlocal 2)\"
      unlocal v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unlocal 3)\"
      unlocal v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unlocal 4)\"
    "
  '
}
v=global
v=tempenv1 f4.unlocal global,tempenv1

Result

[global,tempenv1,tempenv2,tempenv3] v: tempenv3
[global,tempenv1,tempenv2,tempenv3] v: tempenv2 (unlocal 1)
[global,tempenv1,tempenv2,tempenv3] v: tempenv1 (unlocal 2)
[global,tempenv1,tempenv2,tempenv3] v: global (unlocal 3)
[global,tempenv1,tempenv2,tempenv3] v: (unset) (unlocal 4)

10. dynamic unset by unset (nested context tempenv x3)

unset for tempenv in the current function is also dynamic unset.

#!/bin/bash

f4.unset() {
  v=tempenv2 eval '
    v=tempenv3 eval "
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)}\"
      unset v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unset 1)\"
      unset v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unset 2)\"
      unset v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unset 3)\"
      unset v
      echo \"[\$1,tempenv2,tempenv3] v: \${v-(unset)} (unset 4)\"
    "
  '
}
v=global
v=tempenv1 f4.unset global,tempenv1

Result

[global,tempenv1,tempenv2,tempenv3] v: tempenv3
[global,tempenv1,tempenv2,tempenv3] v: tempenv2 (unset 1)
[global,tempenv1,tempenv2,tempenv3] v: tempenv1 (unset 2)
[global,tempenv1,tempenv2,tempenv3] v: global (unset 3)
[global,tempenv1,tempenv2,tempenv3] v: (unset) (unset 4)

3. shopt -s localvar_unset

Is the bash 5.0 behavior with localvar_unset OK? I would rather keep it simple rather than have too many special cases, but still compatible. It's hard to write documentation when there are too many
special cases. #653 (comment) by @andychu

No. Actually you can just forget about localvar_unset. Existing many Bash programs including bash-completion doesn't work with shopt -s localvar_unset. And, no Bash program uses shopt -s localvar_unset. I just tested with that option for completeness.

There is a story for localvar_unset. About two years ago, a man appeared in bug-bash mailing list and insisted that the dynamic unset behavior is bug and try to change the existing behavior of Bash. Chet and other people try to convince him that the behavior is intensional one even if it is tricky or quirky, there are many existing scripts relying on that behavior, the behavior varies among shells so there is no established correct behavior. But we could not convince him, and he continued to post replies to the mailing list for a long time. Finally Chet implemented an option localvar_unset for him, but I don't think he has actually written a script affected by that option because the original discussion is started by the question from another user but not by him.

I have searched the use of localvar_unset in GitHub. We can find only two programs that use localvar_unset, ble.sh and modernish. A few other scripts just enumerates all the shopt options.

  • ble.sh temporarilly turn off localvar_unset to implement dynamic unset when localvar_unset is enabled.
  • modernish turns on localvar_unset to make a behavior more common to other shells. As you know modernish tries to make a universal shell scripting experience which can run on a wide range of POSIX shells. In this sense, modernish is not compatible with real Bash scripts like bash-completion and others which are full of Bashism.

Does the behavior as of the last commit make it work? #653 (comment) by @andychu

To implement the Bash behavior, we need to switch between these two behavior depending on a situation. The conlusion of the above discussion is: If the variable is localvar of the current function scope, value-unset is prformed. Otherwise, cell-unset is performed.

This statement and the table below doesn't exactly match what I observe... maybe you can add some test cases below #24 in spec/builtin-vars.test.sh? #653 (comment) by @andychu

I will look at it more later but having a test case for exactly what's used in ble.sh will help. #653 (comment) by @andychu

Since people rely on the bash idiom, it's probably better to change it to be closer to that, but I'm not sure exactly how. So yeah having the exact test cases for ble.sh will help, e.g. in spec/ble-idioms.tset.sh. #653 (comment) by @andychu

That is hysterical. I can't see from the test results... what's your test case for temp bindings and/or locals and unset? #706 (comment) by @mgree

I think we have to come up with some better test cases ... #706 (comment) by @andychu

I'm sorry for the late reply. Later, I will try to add test cases in spec/{builtin-vars,ble-idioms}.test.sh. I haven't tested so much with other shells, but it seems the model is completely different between any shells.

Also, if it helps, we could upgrade the version in test/spec-bin.sh to bash 5.0 rather than 4.4 #706 (comment) by @andychu

Although Bash 5.0 has an option shopt -s localvar_unset, effectively no Bash program uses that option. Also, as the local/unset bug of Bash 4.3 is not yet fixed in Bash 5.0, I think we don't have to
upgrade it.

@akinomyoga
Copy link
Collaborator

@akinomyoga I changed Oil back to "delete the cell", which makes one of the test cases you gave pass.

Oh, I'm sorry I haven't noticed your reply two hours ago. But the value-unset and cell-unset should be switched depending on the situation. There are many Bash scripts that use the behavior for both types of unset.

I tried the current master of Oil. In the following case, unset should perform value-unset but not cell-unset. This difference will break many scripts...

$ bash -c 'v=global; f1() { local v=1234; unset v; echo "${v-(unset)}"; }; f1'
(unset)
$ osh -c 'v=global; f1() { local v=1234; unset v; echo "${v-(unset)}"; }; f1'
global

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

OK wow!! I was not expecting that level of investigation.

But thank you for doing that -- those kinds of bugs are why I don't just want to copy bash behavior. I don't want to blindly copy bash 5.0 and then they change it in bash 5.1. I would rather design something that makes sense but is also compatible with existing programs.

I guess I'm glad the rule is relatively simple, although it's also just what you originally said in the bug.


But I still would want to see the comparisons with other shells. I think the question now is whether Oil:

  • Implements the rule you say: If the variable is localvar of the current function scope, value-unset is prformed. Otherwise, cell-unset is performed.
  • Implements the rule behind some kind of option, like shopt -s compat_unset.

On the face of it, it seems weird that there's a difference. Why wouldn't you want unset mylocal to reveal variables in higher frames, but you do want unset callerlocal to reveal it? That is inconsistent to me.

i.e. are bash scripts using documented behavior, or are they just taking advantage of implementation quirks in bash? I would be interested in if the rule is documented anywhere.


And I would like to see the test cases first. Let's do it in two steps -- add test cases that reveals what bash does, and compare with other shells. (I assume no shell behaves like bash, but it will be interesting to see where exactly the differences are.)

And then the second step is to figure out what Oil should do, and if there should be a shopt option (which I generally like to avoid, but I also don't want to copy implementation details when bash is the only shell that behaves a certain way.) Again dash and zsh did widely agree, so I chose that behavior. My memory was also that smoosh agreed, although I don't remember what cases it was exactly.

Thanks again for the detailed investigation! (There's no rush to do this, I'm going to make a release tomorrow anyway, but we can settle it in a later release)

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

Also to clarify, I think either of these behaviors is acceptable if they would run existing programs:

  • always cell-unset
  • always value-unset

But you are saying that those simpler rules are not sufficient for some programs.

I guess I should try bash-completion. bash-completion works with the old value-unset rule. I use it under Oil every time I make an Oil release . Maybe it doesn't work with the cell-unset rule?


I looked at the upvar page you mentioned:

https://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference

Were you saying that this relies on the mixed "value-unset cell-unset" rule? I read the page but didn't see that. However I guess I can just try it.

(By the way I plan for something a lot simpler in Oil -- #586 is setref, which is basically a better syntax to use the -n nameref flag in bash. )

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

Hm to make this decision process shorter, I would say that:

  • If the behavior is documented, we could implement the mixed rule
  • if the behavior isn't documented but matches some other shell, we could also implement the mixed rule.
  • otherwise we should implement a simpler rule by default, and have shopt -s bash_unset to make it compatible. Unless you think that there is some patch to apply to ble.sh that can use one of the simpler value-unset or cell-unset rules. Then we can avoid global options, which I am trying to have fewer of.

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

For what it's worth, I just tested bash-completion with the current bin/osh, which now does unset-cell.

It seems to work, even though it uses the upvar trick?

https://github.com/oilshell/bash-completion/blob/master/bash_completion#L162

(It also worked with the previous unset-value rule.)

I didn't do a comprehensive test but I just hit TAB a bunch, and everything seemed to work the same. git-completion.bash also works (although it doesn't really use unset).

@akinomyoga
Copy link
Collaborator

akinomyoga commented Apr 17, 2020

Implements the rule behind some kind of option, like shopt -s compat_unset.

I think shopt -s bash_unset is one of the reasonable choices for Oil implementation. Anyway, there should be at least a (special) mode in which unset behaves like that in Bash.

On the face of it, it seems weird that there's a difference. Why wouldn't you want unset mylocal to reveal variables in higher frames, but you do want unset callerlocal to reveal it? That is inconsistent to me.

Yes. This is inconsistent. I guess originally Bash unset was "always cell-unset", but the behavior was changed for the current-scope local variables to meet the behavior of global variables in POSIX. This is post-hoc explanation, but one can remember the behavior in this way: If the variable is found by local lookup, unset sets Undef. If the variable is found by dynamic variable lookup, unset performs dynamic unset (cell-unset). The users who don't know about the dynamic scoping of variables would expect unset to make the variable Undef in local. The users who know about the dynamic scoping would expect the dynamic behavior of unset.

i.e. are bash scripts using documented behavior, or are they just taking advantage of implementation quirks in bash? I would be interested in if the rule is documented anywhere.

It's documented. But the interaction with tempenv is not documented.

Bash Reference Manual - 3.3 Shell Functions

The unset builtin also acts using the same dynamic scope: if a variable is local to the current scope, unset will unset it; otherwise the unset will refer to the variable found in any calling scope as described above. If a variable at the current local scope is unset, it will remain so until it is reset in that scope or until the function returns. Once the function returns, any instance of the variable at a previous scope will become visible. If the unset acts on a variable at a previous scope, any instance of a variable with that name that had been shadowed will become visible.

I guess I should try bash-completion. bash-completion works with the old value-unset rule. I use it under Oil every time I make an Oil release . Maybe it doesn't work with the cell-unset rule?

It's the opposite. Some parts of bash-completion use cell-unset. It doesn't use value-unset as far as I know. Even if it doesn't explicitly fail, some type of completion candidates will be missing if unset performs always value-unset.

Were you saying that this relies on the mixed "value-unset cell-unset" rule?

No, the Freddy Vulto's trick only uses cell-unset. I didn't mean a single program uses both, but there are programs that use cell-unset and also programs that use value-unset.

  • Actually I don't know specific programs that use value-unset, so maybe "always cell-unset" is reasonable in this sense.
  • But light users who don't know the concept of the dynamic scope would be likely to use unset as value-unset. If unset is "always cell-unset", such users will be confused because the variables in previous frames can be unexpectedly rewritten. For light users who never use the dynamic scope of variables, "always value-unset" is reasonable.
  • "mixed unset" is somewhat in between the above two approaches. The users without knowledge of the dynamic scope will only use unset for variables defined in the same function. So, unset for the variables in the current function is value-unset. But we do need cell-unset in programs that use the dynamic scope of variables. So, unset for the variables found by dynamic lookup is cell-unset. The users without knowledge of the dynamic scope will never use unset for variables not defined in the current function, so there will be no problem.
  • But I think if we design a new interface for Oil, we should prepare two different builtins for both types of unset [e.g. unset for value-unset and unlocal for cell-unset], or two different options [e.g. unset -u for value-unset (local-scope unset), and unset -d for cell-unset (dynamic unset)].
  • If the behavior is documented, we could implement the mixed rule

It's documented as mentioned above.

  • if the behavior isn't documented but matches some other shell, we could also implement the mixed rule.

I believe no other shell behaves like Bash. Actually I think every shell behaves in its own way here. But probably we need more investigation on the behavior of other shells.

Unless you think that there is some patch to apply to ble.sh that can use one of the simpler value-unset or cell-unset rules.

I think ble.sh works with "always cell-unset". It will not work with "always value-unset". But I don't know "always cell-unset" is friendly for light users. Actually shopt -s localvar_unset introduced in Bash 5.0 forces unset to "always value-unset", which implies some demands for value-unset.

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

OK thank you for the thorough analysis! That helps a lot. I changed Oil to cell-unset recently, and I will release that. So if ble.sh will run with that, that's great.

I see what you're saying about it being potentially confusing.

And "local cell unset" is a rarer behavior -- only yash and mksh match, and they are relatively unpopular. dash/bash/zsh all use "local value unset". (I just tested it)

But I'm inclined to leave it now until someone complains. We can add shopt -s bash_unset later if they do.

I don't think naive users use unset very much. In fact I don't think I have ever used in a program (just like I don't use del in Python). I might only use it in an interactive session.

If someone unsets a variable, presumably they don't want to use it at all anymore in that function. They could be surprised if it reveals a global, but if they never use it again, that won't happen. So let's see if anyone notices basically :) It does match mksh and yash so it's not completely bizarre.

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

Also if ble.sh works with this, I wouldn't bother adding the additional spec tests. We can spend the effort on some of the other things in #653, e.g. maybe the mapfile builtin since you starred it, or spec tests for other differences, etc.

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

And honestly I don't see why you would want to unset a local. Unsetting a global makes sense because maybe you don't want to pollute your namespace.

But the local will disappear anyway once the function returns.

And if unset mylocal does not reveal anything in higher frames, what use is it? If you don't want to use a local again, then you can simply not use it, rather than unset it.

I suspect this is the reason why shells can differ so much... because that is not something programs use a lot. But I can see advanced use cases for cell-unset of nonlocals, so we can stick with that.

(Something like this could go in the Oil manual, which is why I am thinking about it)


Also I think mylocal='' can always be used rather than unset mylocal. Unless you want to reveal variables higher in the stack, which Oil now provides :)

@andychu
Copy link
Contributor Author

andychu commented Apr 17, 2020

Actually back in #329 I mentioned that I wanted Oil to have shopt --unset dynamic_scope. I'm not sure about that yet, but if so that would be a more general solution than shopt -s bash_unset.

Basically the idea is that I want dynamic scope to be obvious

  • For writing, you would do something like setref x = 1, which is like x=1 where x has the -n nameref flag.
  • In terms of reading, if you turn off dynamic scope, then you could have an explicit syntax to look up the stack like ${^var} or something.

Basically I think the dynamic scope rule is confusing in general, e.g. to most people coming from Python and JS. It's not just unset that is confusing! I will think about this a bit more, but I don't think we need an special case for unset.


BTW I think Perl has something like this... dynamic scope is an option, not the default.

@andychu andychu changed the title unset behavior differs from bash unset should unshadow variables higher on the stack (at least for nonlocals) Apr 17, 2020
@akinomyoga
Copy link
Collaborator

Thank you!

If someone unsets a variable, presumably they don't want to use it at all anymore in that function. They could be surprised if it reveals a global, but if they never use it again, that won't happen. So let's see if anyone notices basically :) It does match mksh and yash so it's not completely bizarre.

OK! That makes sense.

And honestly I don't see why you would want to unset a local. Unsetting a global makes sense because maybe you don't want to pollute your namespace.

There are several use cases for cell-unset in ble.sh. In any case, the purpose is not just to remove the current local variables but to access variables defined in previous frames. So it is helpful to have cell-unset.

  • One is the situation of upvar. When one wants to return a value by shell variables like fn() { eval "$1=something"; }, one needs to remove a local variable in the case that the local variable has the same name with "$1".
  • Another case is the situation that one wants to read the value of global variables which might be shadowed by local variables. In this case, one can read global variables without affecting local variables by performing repeated unset in a subshell.
  • Also, ble.sh does something like fn1() { local var; fn2; unlocal var; fn2; } where fn2 reads/writes the variable var. We want to make the effect of only the second fn2 call remain in previous frames.

And if unset mylocal does not reveal anything in higher frames, what use is it? If you don't want to use a local again, then you can simply not use it, rather than unset it.

Yeah, that is a good point. I think that's the reason why I don't know real shell scripts that use value-unset behavior although I sometimes see this topic in bug-bash mailing list. Maybe only learners try to use value-unset to understand how unset works.

Summary of behavior of different shells

I added test cases in PR #718. Here is the summary of behavior of different shells.

bash osh yash mksh zsh ash dash
unset localvar switch cell cell cell value value value
unset tempenv cell cell cell value value value value
tempenv-in-localctx yes no no no no yes ?
localvar-tempenv-share yes yes no no no no no
localvar-init unset unset unset unset empty unset inherit
localvar-nest yes yes no no yes no no
nested-unset bash cell yash mksh value value value
  • unset localvar, tempenv
    • switch = value-unset for the local scope, cell-unset for the dynamic scope
    • cell = always cell-unset
    • value = always value-unset
  • tempenv-in-localctx: Is tempenv in v=tempenv fn created in local context of fn?
  • localvar-tempenv-share: Are tempenv and localvar in v=tempenv eval 'local v=localvar' created in the same variable context?
  • localvar-init: How the variable is initialized with local v?
    • unset = the variable is set to be unset state
    • empty = the empty string is set
    • inherit = inherit the value of the variable with the same name in previous context
  • localvar-nest: Are local1 and local2 in local v=local1; v=tempenv2 eval 'local v=local2' created in different nested contexts?
  • nested-unset:
    • bash = value-unset for localvar, one-level cell-unset for tempenv
    • yash = cell-unset all tempenvs and cell-unset one-level localvar in the function
    • mksh = cell-unset all tempenvs and localvars in the function
    • cell = one-level cell-unset
    • value = value-unset

@andychu
Copy link
Contributor Author

andychu commented Apr 18, 2020

Yes it seems clear that this area where shells disagree the most! That's what I remember from the "temp binding" issue in summer 2019. It looks like it also extends to local and unset.

On the one hand, it's pretty weird because these are fundamental parts of the language. On the other hand I can see that most shell programs don't actually use this flexibility. I would say once something is using "upvars" and unshadowing, it is starting to be less like a "shell script" and more like using shell as a programming language.

I hope in Oil we can come up with some nicer idioms. I think the core data structure of a "stack of cells" with 3 flags (export/readonly/nameref) will hold up, but we can find cleaner ways to manipulate it.

But either way I'm glad that Oil has a pretty simple and documentable behavior. I think we're starting to run some of the biggest shell scripts in the world, so I'm optimistic it will hold up.


The sentence you quoted in the bash manual is an example of what I dislike about bash... It tells you literally what bash does, almost repeating the interpreter in prose form.

But it doesn't tell you WHY it did that. And a lot of times there is no "why". It's just a mistake that
was later documented.

In Oil, the "why" for unset is basically because it runs programs like ble.sh :) And it's a simple and consistent rule.

But I also think that most idiomatic Oil programs will not need to use unset at all. I would go as far as to say there could be a mode which only allows unset a[i] and unset a["key"].


Thanks for the help with this!

@andychu andychu closed this as completed Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants