Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoconf $(( $* )) requires shopt -s eval_unsafe_arith #1450

Closed
zackw opened this issue Jan 13, 2023 · 17 comments
Closed

autoconf $(( $* )) requires shopt -s eval_unsafe_arith #1450

zackw opened this issue Jan 13, 2023 · 17 comments

Comments

@zackw
Copy link

zackw commented Jan 13, 2023

Here's another fun word-splitting problem exposed by Autoconf's test suite.

as_fn_arith ()                     
{
    as_val=$(( $* ))
}
as_fn_arith 1 + 1
echo $as_val

prints "2" when executed by ksh, dash, or bash, but when executed by osh it prints

 as_val=$(( $* ))
            ^~
test.sh:3: fatal: Invalid integer constant '1 + 1'

Since this is a use of $*, not within double quotes, I think IFS should be irrelevant.

@andychu
Copy link
Contributor

andychu commented Jan 14, 2023

Oof, I reproduced this. So this one can actually be fixed by setting shopt -s eval_unsafe_arith first (an OSH-specific option)

The issue is that this kind of dynamic arithmetic can be unsafe in certain contexts with shells like ksh / bash that have array references.

i.e. the 1 + 1 can also be a[$(echo 42 > PWNED)]

https://github.com/oilshell/blog-code/tree/master/crazy-old-bug

I reported this to OpenBSD ksh awhile ago, and they disallowed it as well. But OSH is stricter than what they did -- we disallow all implicit parsing of data as code, not just array references

  • I wonder how many autoconf scripts rely on it? Is it mainly a test suite thing, or is it semantics autoconf relies on commonly ?

I guess we should test more besides CPython's. CPython ./configure apparently does not rely on it. (Although there is a slight possibility we could be running it incorrectly because of an issue like this.)


I also reproduced the IFS bug, so let's look at that and then circle back

@andychu
Copy link
Contributor

andychu commented Jan 14, 2023

I guess the minimum we should do is actually put shopt -s eval_unsafe_arith in the error message. We've done that for a few other strict options

But yeah I would say autoconf compatibility without extra flags is pretty clearly essential

@zackw
Copy link
Author

zackw commented Jan 15, 2023

Well, what autoconf (technically, the M4sh layer) is trying to do with this construct, is use $(( ... )) for arithmetic if possible, falling back to expr if not. So, early in a script, it emits

if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null
then :
  eval 'as_fn_arith ()
  {
    as_val=$(( $* ))
  }'
else case e in #(
  e) as_fn_arith ()
  {
    as_val=`expr "$@" || test $? -eq 1`
  } ;;
esac
fi # as_fn_arith

with the expectation that later it will be able to do things like

# Create a file whose timestamp is in the future.
# (next year)-01-01 00:01 UTC should always be in the future,
# even on slow machines.
echo BAD >file
this_year=`TZ=UTC0 date +%Y`
as_fn_arith $this_year + 1 && next_year=$as_val
TZ=UTC0 touch -t ${next_year}01010001 file

This is much more commonly used in Autoconf's test harness (which is an even gnarlier generated shell script than anything autoconf proper is likely to produce, see "Generating Test Suites with Autotest") than in autoconf-generated configure scripts, but it does happen sometimes -- any script that uses AC_CHECK_SIZEOF potentially uses it (you may have to be cross-compiling to make it happen), and the M4 macro AS_VAR_ARITH, which directly invokes the above shell logic, is a documented utility for use by people writing checks.

It would not be a completely crazy notion to add OSH-specific smarts to M4sh. For instance, if something not entirely unlike

as_fn_arith ()
{
    setvar as_val = $*
}

would work in OSH (without turning on all the rest of Oil ;-) then we could detect OSH and use that. The big downside of going this route would of course be that some existing autoconf scripts, and all existing autotest scripts, would not work under OSH until regenerated, but perhaps that would be acceptable, given that nobody outside of the Oil project is really expecting OSH to be usable for autoconf scripts at the moment.

@zackw
Copy link
Author

zackw commented Jan 15, 2023

Also, Autoconf proper never uses arrays (since not all shells have them) but I can't guarantee the same for all existing configure scripts and third-party .m4 files, many of which have never been tested on the more unusual systems that Autoconf proper tries to support.

@zackw
Copy link
Author

zackw commented Jan 15, 2023

(In case you're wondering, the weird else case e in #( e) thing is to prevent a shell syntax error if the body of the else clause is empty, which the script generator cannot reliably detect.)

@andychu
Copy link
Contributor

andychu commented Jan 15, 2023

Hm actually this makes me wonder if a reason OSH is running Python's configure more slowly now is if it fails some kind of test like that, and then it starts spawning a lot of expr processes. (I've been meaning to add a simple external binary / builtin histogram profiling feature, and that may help answer that question)

Right, I think practically speaking, since autoconf is released infrequently, it makes sense to do things on the OSH side. As far as I know autoconf scripts should be "easy" to run because they already run in dash / bash / BSD ksh and more. Anything that runs in all those shells should run in OSH

I don't see any reason we shouldn't be able to run all the old bash and Python configure scripts going back a decade or more ...

The scripts that only run in bash are the hard ones, and sometimes those need to be modified.


I want to find a minimal distro to build from source with only OSH on the disk ... So presumably this bug might occur in that context, or maybe it just slows everything down ...

It would be annoying to change how our arithmetic is parsed but it's definitely possible. We can parse dynamically and then look for the a[$(echo 42 > PWNED)] stuff in the AST

@zackw
Copy link
Author

zackw commented Jan 15, 2023

I'm not sure I understand the security problem here, but would it be sufficient to say that if you encounter $(( $* )) or $(( "$@" )) then the expansion of the variable is word-split according to the usual behavior of $* or "$@" and then parsed as an arithmetic expression, but no further expansion occurs? So you cannot smuggle in $anything that way.

@andychu andychu changed the title OSH: Incompatible behavior of $(( $* )) inside a shell function autoconf $(( $* )) requires shopt -s eval_unsafe_arith Jan 16, 2023
@andychu
Copy link
Contributor

andychu commented Jan 16, 2023

That rule would still have problems for shells with ksh arrays, which includes bash and OSH. (e.g. see the links on https://github.com/oilshell/blog-code/tree/master/crazy-old-bug )

We can obviously do it because we have shopt -s eval_unsafe_arith, but I don't want that to be the default for security reasons.

But I also want to run autoconf with default settings.


So I guess I would want to fix any remaining autoconf issues, and then circle back to this one.

I found another one last night while running bash ./configure with OSH. It completes but it gives different output.

#1455

Unfortunately it's a bit hard for me to debug that ...

  1. I assume the LHS variable is empty
  2. then OSH complains about the test = no. Other shells complain about this too -- it's not valid shell.

But I haven't looked at why the variable is empty in OSH and not bash.

The configure completes, but I think it probably gives different results, which is not good.

Not sure if we need to set up a better debugging / test harness ... I would like to make fixing these "mechanical" and knock a buch of bugs off.

I guess fixing things through the autoconf test suite could be a better option than trying to fix generated autoconf scripts. I would definitely be interested in more bug reports !

@andychu
Copy link
Contributor

andychu commented Jan 16, 2023

An idea I thought of that might be easy to implement

  • dynamic parsing of arithmetic can be on by default in OSH (not in Oil)
  • to mitigate the security issue, turn off array references in arithmetic

Array references are what cause the security issue, and dash / POSIX shell doesn't have arrays, so autoconf should work fine.

This might slow things down a little, but also might be a pretty effective and simple solution

@zackw
Copy link
Author

zackw commented Feb 1, 2023

Unfortunately, because this issue breaks the autoconf test suite's driver, I'm unable to isolate any more autoconf-related issues until it's resolved.

I don't understand why the rule I suggested above -- "if you encounter $(( $* )) or $(( "$@" )) then the inner variable is expanded and word-split according to the usual behavior of $* or "$@" and then parsed as an arithmetic expression, but no further expansion occurs" -- would "still have problems with shells with ksh arrays". The array reference would not be expanded in this case, due to the rule that no further expansion occurs. This would be sufficient for auto*, which never use arrays anyway.

@andychu
Copy link
Contributor

andychu commented Feb 1, 2023

Ah OK, I can probably prioritize a fix ... I'm working sort of in this area now.

The way our parsing and evaluation works, these are all the same:

echo $(( $* ))
echo $(( "$@" ))
echo $(( x ))
echo $(( $x ))

So let's put some code inside data. There is no problem here, bash and other shells give an error:

$ x='$(echo 42 > PWNED)'

$ echo $(( x ))
-bash: $(echo 42 > PWNED): syntax error: operand expected (error token is "$(echo 42 > PWNED)")

Now let's change it to an ksh-array reference:

$ x='a[$(echo 42 > PWNED)]=1'

$ echo $(( x ))
1

$ cat PWNED 
42

That's just the way all ksh-derived shells work, which is very surprising.

I think your rule is pretty close to what we'll end up doing, but it's not the way I think of it. OSH doesn't really have a notion of "expansion" -- it has parsing and evaluation, more like what's described in books about Lisp.

(These are organized in the files osh/*_parse.py and osh/*_eval.py -- interleaved parsers and evaluators of sublanguages)

The parsing is static (done without reference to variables at runtime), unlike other shells which mix the two stages. The surprising thing is that this slightly different architecture can be highly compatible, but this is one area where it causes differing behavior.

Not sure if I mentioned this already: How to Parse Shell Like a Programming Language

POSIX shell arithmetic is inherently dynamic -- it comes AFTER variable expansion. We are able to emulate that with "dynamic parsing" and evaluation


Evaluation includes both:

  • 1+2 => 3
  • $(echo 42 > hi) => execute the command

So basically for autoconf to work, we need the evaluation of 1+2 to work (on dynamically parsed cdoe)

But to avoid the security issue, we do NOT want the evaluation of $(echo 42 > hi) to work (on dynamically parsed code).

I think this should be possible -- I'll update this bug

Thanks!

@andychu
Copy link
Contributor

andychu commented Feb 1, 2023

Also to get a feel for how it parses and evaluates, you can do osh -n on any file, and it will print the detailed "lossless syntax tree" (which I'm refactoring now)

$ osh -n -c 'a[$(echo 42 > PWNED)]=1'
(command.ShAssignment
  pairs: [
    (assign_pair
      lhs: 
        (sh_lhs_expr.IndexedName
          name: a
          index: 
            {
              (command_sub
                left_token: <Id.Left_DollarParen '$('>
                child: 
                  (command.Simple
                    words: [{<echo>} {<42>}]
                    redirects: [(redir op:<Id.Redir_Great '>'> loc:(redir_loc.Fd fd:1) arg:{<PWNED>})]
                    do_fork: T
                  )
              )
            }
        )
      op: assign_op.Equal
      rhs: {<1>}
      spids: [0]
    )
  ]
)

@zackw
Copy link
Author

zackw commented Feb 1, 2023

OK, I think I understand what the problem is now.

I phrased things in terms of conventional shell "expansion" operations, but what I was trying to communicate is the limited subset of variable-inside-arithmetic-parentheses constructs that Autoconf needs to work. A better way to put it is, perhaps, that Autoconf only wants to use $(( $* )), where $* was set by invocation of a shell function. Furthermore, if Autoconf is stuck with a shell that doesn't implement $(( ... )), it tries to use expr instead, which means that the value of $* will never contain any variables that need evaluating.

For example:

as_fn_arith () {
  retval=$(( $* ))
}
x=1
as_fn_arith $x + 1    # should set retval to 2
as_fn_arith x + 1     # does not need to work, because `expr x + 1` would not work

Does that help?

@andychu
Copy link
Contributor

andychu commented Feb 1, 2023

Yes that makes sense for sure. We want to allow 1 + 1 and x + 1 so forth, but not allow a[$(echo 42 > PWNED)]

OSH is technically being too strict -- there is no real security problem with parsing and evaluating 1+1, although IMO it's very confusing if you think of it as a programming language

(i.e. most programming languages don't have implicit "expansion" of data into code, without eval )


I did notice the as_fn_arith snippet in CPython's configure, which we CAN apparently run, but it runs more slowly

So I conjectured that's part of the reason it's running more slowly!

There seem to be a bunch of other differences I haven't debugged yet either ... it's very hard to debug the entire generated script, so I think it will help to get the test suite running and giving a report, since presumably it will give more localized failures!

Thanks for the help

andychu pushed a commit that referenced this issue Feb 21, 2023
This is to issue #1450, related to autoconf.

Instead of disallowing dynamic parsing in $(( x )) unconditionally, we
parse, and then disallow command subs, e.g.

    a[$(echo 42 | tee PWNED)]=1

in DATA as opposed to code.

I split shopt allow_csub_psub into _allow_command_sub _allow_process sub
to implement this.

Makes a bunch of spec tests pass.
andychu pushed a commit that referenced this issue Feb 22, 2023
This is related to the fix for #1450.  It relaxes eval_unsafe_arith in
the remaining places.

One more test in spec/nix-idioms now fails.  It passed for the wrong
reason -- it because 'foo[@]' wasn't a valid var ref, not because we
implemented the weird and arguably buggy "empty array + var ref" rule
that bash has.

Also:

- Remove eval_unsafe_arith from almost all spec tests.
- Update docs.
andychu pushed a commit that referenced this issue Feb 22, 2023
With a new option

    shopt -u parse_sh_arith

It was already disallowed in most places with

    shopt -u parse_dparen
    shopt -u parse_sh_assign

The remaining places are the static:

    echo $(( a[i] ))
    echo ${a[i]}

And the dynamic:

    unset a[i]
    printf -v a[i]
    ${!ref}
    declare -n not fully implemented

Follow-up to #1450.
andychu pushed a commit that referenced this issue Feb 22, 2023
With a new option

    shopt -u parse_sh_arith

It was already disallowed in most places with

    shopt -u parse_dparen
    shopt -u parse_sh_assign

The remaining places are the static:

    echo $(( a[i] ))
    echo ${a[i]}

And the dynamic:

    unset a[i]
    printf -v a[i]
    ${!ref}
    declare -n not fully implemented

Follow-up to #1450.

With test/lint fix.
@andychu
Copy link
Contributor

andychu commented Mar 8, 2023

Hi @zackw , sorry for the delay, I fixed this a couple weeks ago, and just released it. I would appreciate testing:

https://www.oilshell.org/release/0.14.2/

osh-0.14.2$ set -- 1 + 2; echo $(( $* ))
3

It disallows the unsafe constructs by default:

osh-0.14.2$ a=( 1 2 3 )

osh-0.14.2$ set -- 1 + 'a[$(echo 1)]'; echo $(( $* ))
  1 + a[$(echo 1)]
        ^~
[ var ? at line 19 of [ interactive ] ]:1: fatal: Command subs not allowed here because eval_unsafe_arith is off

But allows you to opt in:

osh-0.14.2$ shopt -s eval_unsafe_arith
osh-0.14.2$ set -- 1 + 'a[$(echo 1)]'; echo $(( $* ))
3

It should work in both the slow Python tarball, and the fast oils-for-unix C++ tarball (that we will eventually make the default.)


BTW I'm writing a release announcement right now and this will be mentioned. I'll probably call it Oil 0.14.2 - Interactive Shell; Conceding to Reality

Running autoconf without any shopt is basically conceding to reality :)

The Oil language is and will always be statically parsed, but OSH arithmetic is now compatible and dynamically parsed

(Technically it's a little different since it doesn't involve "expansion" / string rewriting, but rather dynamic parsing and evaluation, so that's why we need testing !!!)


I'm going to make a call for more testing like this -- it will be bad if we can't run autoconf because as I understand it runs under like 10 shells. But I think we can!

@andychu
Copy link
Contributor

andychu commented Mar 8, 2023

BTW this was not very hard to fix, took about a day or so, including updating some docs ...

Also I implemented shopt --unset parse_sh_arith to disallow shell arithmetic in the new Oil language

It's just that there dozens of other things to do on the project :) But I consider autoconf very important

@andychu
Copy link
Contributor

andychu commented Mar 18, 2023

@andychu andychu closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants