Warn that evaluate() should not be used on user input #442

corgeman · 2023-07-07T21:32:16Z

The evaluate() function eventually calls eval() on the data provided. eval() is extremely dangerous when supplied with user input and to my knowledge it isn't mentioned that the function does this. I would add a warning in the documentation about this. As a proof-of-concept, the following code should execute the command 'echo verybad' on your computer when ran:

import numexpr
s = """
(lambda fc=(
    lambda n: [
        c for c in 
            ().__class__.__bases__[0].__subclasses__() 
            if c.__name__ == n
        ][0]
    ):
    fc("function")(
        fc("Popen")("echo verybad",shell=True),{}
    )()
)()
"""
numexpr.evaluate(s)

robbmcleod · 2023-07-08T16:56:08Z

Yeah developers seem to be using NumExpr more and more as a parser for handling input from a UI. It would be much safer if we could use ast.parse instead. I think adding a warning to the docs is fine, but it would also make sense to me to add an optional sanitizer. We could search the string for various Python keywords that have no business being in an expression, such as:

import
lambda
eval
__ (dunder)
locals
globals

Sanitization could be enabled by default but give the user the means to turn it off.

robbmcleod · 2023-07-22T19:44:47Z

Alright I implemented a check in 4b2d89c that forbids certain operators used in the expression. Namely: ;, :, [, and __. As far as I can tell this defeats all of the commonly cited attack vectors against eval(). I.e. it bans multiple expressions, lambdas, indexing, and all the dunders. If we want to be very safe, we would want to ban . as a character. However, this is difficult because . is both the reference operator and the decimal operator.

If we want to ban . we could do so if we require it to be followed by a numeral. E.g. 0.1 would pass the check, os.remove would not pass the check. However, this would ban valid Python code such as a * 5. which is a shorthand for casting to double. Perhaps there is someone out their with better regex-foo than myself and can suggest a solution for a regex that bans the Python . operator but doesn't cause issues for decimal points?

Regardless I think this is a good step forward and we're at the point where we should do a new release.

corgeman · 2023-07-26T06:18:20Z

Looks good to me!

robbmcleod · 2023-08-06T19:31:32Z

Ok, made some more improvements. I can ban attribute access to everything but .real and .imag, via

_forbidden_re = re.compile('[\;[\:]|__|\.[abcdefghjklmnopqstuvwxyzA-Z_]')

I also strip the string of all whitespace beforehand.

robbmcleod · 2023-08-06T21:13:33Z

Closing with release of 2.8.5.

jan-kubena · 2023-08-07T07:44:51Z

Hi @robbmcleod, just wanted to point out that you can still access other attributes because Python translates some utf chars into ascii chars automatically (mainly greek alphabet used in math), thus:

numexpr.evaluate("(3+1).ᵇit_count()")

Results in:

array(1, dtype=int32)

It might be better to whitelist real and imag rather than blacklist chars.

robbmcleod · 2023-08-07T14:46:27Z

@jan-kubena ok thanks for the heads up. The trouble here is that decimal needs to work, and numbers can appear in variable names, but not at the start. Do you happen to know where the documentation for which character sets are mangled is?

The really proper way to do it would be to back-port the ast parser work I did in for my attempt at making 3.0.

robbmcleod · 2023-08-07T19:31:15Z

Apparently the pandas group is using dunders as private variables in some of their NumExpr calls.

pandas-dev/pandas#54449

I'm of two minds about this. I could regex against "[a-zA-Z0-9_]+" instead of just "__". But for the security conscious, it does feel to me that NumExpr shouldn't be able to access private variables in the locals and globals dictionaries.

nicoddemus · 2023-08-07T20:28:15Z

Hi folks,

I have a simpler example which also breaks in the new version:

import pandas as pd

name = "Mass (kg)"
df = pd.DataFrame({name: [200, 300, 400]})
df.query(f"`{name}` < 300")

ValueError: Expression (BACKTICK_QUOTED_STRING_Mass__LPAR_kg_RPAR_) < (300) has forbidden control characters.

I understand Mass (kg) is a reasonable column name, so the validation definitely needs more tuning.

zorion · 2023-08-08T13:12:43Z

Hi folks,
I'm not sure if it is the same issue here, we are using numexpr for calculations in a pandas dataframe and we are using double underscore in some of our calculations (for instance, a__b) but now it fails in 2.8.5 (working in 2.8.4).
I have a minimal example without involving pandas:

import numexpr
numexpr.evaluate('a__b / 2', {'a__b': 4})

Or a oneliner:

python -c "import numexpr; print(numexpr.evaluate('a__b / 2', {'a__b': 4}))"

In numexpr==2.8.4 that one works perfectly.
In numexpr==2.8.5 we have the following:
ValueError: Expression a__b / 2 has forbidden control characters.

Is it an intentional feature or something that can be workedaround easily (sending some kwarg to ignore the error)?
We will pin our numexpr requeriments to "<2.8.5" in the meanwhile.

Many thanks!

robbmcleod · 2023-08-08T23:17:00Z

@nicoddemus and @zorion, just waiting to hear back from Pandas' devs on the original report: pandas-dev/pandas#54449 before I do anything. I've tried in the past to establish some sort of line of communication with them and gotten crickets.

zorion · 2023-08-09T10:36:13Z

Hi, thanks for your reply.

I think we are having a legit use of dunder but it is not allowed in a breaking change from 2.8.4 to 2.8.5 so we have to fix our version to 2.8.4 (or lower) and this is an ok workaround for now.
On the other hand, if we had a way to flag "evaluate" that we trust our input we may remove this version restriction. Is it possible to do so?

Many thanks in advance!

lithomas1 · 2023-08-10T20:35:39Z

@nicoddemus and @zorion, just waiting to hear back from Pandas' devs on the original report: pandas-dev/pandas#54449 before I do anything. I've tried in the past to establish some sort of line of communication with them and gotten crickets.

Hi, one of the pandas devs here.

I don't maintain the eval code (I don't think anyone still here does anymore), so not really the best person to comment on this.

Is there a way to gate this stricter checking behind an option?
(For pandas, we have a warning in the docs saying that eval will let users run arbritary code)

Thanks,
Thomas

robbmcleod · 2023-08-11T06:06:56Z

@lithomas1 and company,

I see a few approaches here.

We can deprecate the implementation of the __ filter for a immediate release and put it back in for a future release.
We can put in an option to disable the security check. However, I do think it should default to be on.

In order to avoid forcing everyone to do an emergency release, we probably need to do both, assuming the option defaults to sanitize.

lithomas1 · 2023-08-11T08:58:28Z

@lithomas1 and company,

I see a few approaches here.

We can deprecate the implementation of the __ filter for a immediate release and put it back in for a future release.

We can put in an option to disable the security check. However, I do think it should default to be on.

In order to avoid forcing everyone to do an emergency release, we probably need to do both, assuming the option defaults to sanitize.

Thanks, this sounds good to me.
2 is probably good enough for pandas.
(We are going to be releasing soon anyways in a couple weeks. I think users can be fine pinning numexpr for now).

It would be nice to actually fix pandas, however, I'm not too sure what the future of eval/query will be given its current "zombie"-like state.

rebecca-palmer · 2023-08-14T22:03:12Z

There's actually two changes in numexpr 2.8.5 that fail pandas tests - this, and a change to integer overflow behaviour (possibly a result of the negative-powers changes, but I'm not sure yet) pandas-dev/pandas#54546

robbmcleod · 2023-08-15T00:04:44Z

@rebecca-palmer You can make a new issue if you want, but NumExpr could never cover 2**100 as that's way in excess of 64-bit integers.

robbmcleod · 2023-08-15T04:39:40Z

@zorion, @nicoddemus, @lithomas1 I made a push in 397cc98 that should hopefully fix these issues, if you could please test?

I improved the blacklisting. It should better match the funny unicode coercion that can be done for the attribute access attack.
It only blocks dunders and not a single double underscore.
You can disable it by calling validate(..., sanitize=False) or equivalently evaluate(..., sanitize=False).

If you have a chance please test and provide me with any feedback you may have.

nicoddemus · 2023-08-15T13:33:13Z

Hi @robbmcleod,

Thanks for attempting a fix.

My original example now passes, but this one breaks (reduced from the actual code):

import pandas as pd

name = "II (MM)"
df = pd.DataFrame({name: [200, 300, 400]})
df.query(f"`{name}` >= 3.1e-05")

ValueError: Expression (BACKTICK_QUOTED_STRING_II__LPAR_MM_RPAR_) >= (3.1e-05) has forbidden control characters.

The problem in this case seems to be the scientific notation, if I change 3.1e-05 to something that does not format to scientific (say 0.31) then it no longer errors out. If I use the actual number in decimal notation (0.000031) it still fails because it seems internally it formats back to scientific, because it generates the exact same message as above (with 3.1e-05):

import pandas as pd

name = "II (MM)"
df = pd.DataFrame({name: [200, 300, 400]})
df.query(f"`{name}` >= 0.000031")

ValueError: Expression (BACKTICK_QUOTED_STRING_II__LPAR_MM_RPAR_) >= (3.1e-05) has forbidden control characters.

sanitize=False would be great for us, however we call DataFrame.query which does not have that argument yet.

rebecca-palmer · 2023-08-15T18:31:19Z

@nicoddemus @robbmcleod I think changing _attr_pat to r'.\b(?!(real|imag|\d+e?)\b)' (i.e. adding 'e?') fixes that, but I haven't actually tested it.

nicoddemus · 2023-08-16T11:59:01Z

Perhaps a more reliable approach would be to use ast.parse, and reject the tree if we find statements, lambdas, dunder import, etc?

rebecca-palmer · 2023-08-17T21:50:45Z

@robbmcleod I've added some comments in the commit - do those notify anyone when it's already on the main branch?

robbmcleod · 2023-08-17T22:16:14Z

It needs to match [eE]?[+-]?. I.e. either 'e' or 'E' can denote scientific notation and then it can be '-' or '+' exponents. It's not a big deal, I just haven't had time to sit down and do it yet. Please be patient.

I definitely don't get notifications on commits.

robbmcleod · 2023-08-17T22:17:16Z

@nicoddemus I did use ast.parse for the NumExpr-3.0 branch. This is not a trivial fix to backport it. NumExpr 2 has an home-brew AST.

robbmcleod · 2023-09-04T19:25:25Z

FWIW, NumExpr being a legacy piece of code seems to be in fairly widespread use in other legacy systems (e.g. the Pandas .query method, for example) that aren't well maintained. I used to commonly get requests to consult on code using NumExpr. Hence my desire to implement an effective sanitizer.

There were definitely some growing pains with writing the sanitizer, but to me that was expected. It does seem to work now. It's very hard for me to see any way to bypass it and execute malicious code. I did try and write a whitelist, but that's considerably harder to regex.

Regarding the choice to default to True on sanitization, it goes back to wanting to make the issue loud to end users who don't even know their code is using NumExpr.

I've been thinking if we did want to default to not sanitizing the input string, we could instead show a warning to the user. This warning could be suppressed by setting an environment variable, such as NUMEXPR_NO_WARN_SANTIZE. We could then, state that sanitize=False is deprecated and in the future it will be default to True.

smorken · 2023-09-06T00:58:35Z

I have a workaround that involves running an external parser based on pyparsing to pre-validate expressions before passing them on to numexpr.

This is not fully tested, but I am considering this for my own use. I realize it might not be possible to add an additional package requirement to numexpr, but maybe a similar approach would be practical as opposed to a sanitation approach? This would admittedly cause a performance hit but maybe not huge for the typical numexpr use cases.

I suppose the fact that python eval is even involved might mean there are edge cases and a pre-parser might break some expected functionality of numexpr? I think it's fine for my use case though.

from pyparsing import (
    infix_notation,
    one_of,
    OpAssoc,
    Literal,
    Forward,
    Group,
    Suppress,
    Optional,
    delimited_list,
    ParserElement,
)
from numexpr.necompiler import vml_functions
from pyparsing.common import pyparsing_common


ParserElement.enablePackrat()


LPAREN, RPAREN = map(Suppress, "()")
NUMEXPR_FUNCS = vml_functions + ["where"]


def get_parser():
    integer = pyparsing_common.integer
    real = pyparsing_common.real | pyparsing_common.sci_real
    imaginary = (real | integer) + one_of("j J")
    arith_expr = Forward()
    fn_call = Group(
        one_of(NUMEXPR_FUNCS)
        + LPAREN
        - Group(Optional(delimited_list(arith_expr)))
        + RPAREN
    )
    operand = (
        fn_call | imaginary | real | integer | pyparsing_common.identifier
    )

    bitwise_operators = one_of("& | ~ ^")
    comparison_operators = one_of("< <= == != >= >")
    unary_arithmetic = one_of("-")
    binary_arithmetic = one_of("+ - * / ** % << >>")

    arith_expr << infix_notation(
        operand,
        [
            (bitwise_operators, 2, OpAssoc.LEFT, None),
            (comparison_operators, 2, OpAssoc.LEFT, None),
            (unary_arithmetic, 1, OpAssoc.RIGHT, None),
            (binary_arithmetic, 2, OpAssoc.LEFT, None),
        ],
    )

    return arith_expr

def test_passing_expressions():
    parser = numexpr_expression_parser.get_parser()
    result, parse_results = parser.runTests([
        "where(a) ==  1+2e6j",
        "1 + 2.0 + _abc + sin(o)",
        "1 + 2.0 + __abc",  # __abc is a valid identifier
        "1 + 2.0 + _abc + sin(o)",
    ])
    assert result

def test_failing_expressions():
    parser = numexpr_expression_parser.get_parser()
    result, parse_results = parser.runTests([
        "eval(123)"
    ])
    assert not result

MichaelTiemannOSC · 2023-09-07T10:52:29Z

If no one can think of adding any more tests to this I'll prepare another release?

I'll try and test locally against Pandas as well.

That would be very welcome. pip-audit is now failing my builds due to PYSEC-2023-163 (this issue).

robbmcleod · 2023-09-10T23:26:44Z

I added the means to turn the sanitize=True default behavior off, by setting an environment variable,

set NUMEXPR_SANITIZE=0

Generally speaking I think this shouldn't be any more so a security hole than allowing people to pass sanitize=False is.

I tested with pandas against the tests I found that referenced numexpr or evaluate and they all passed. I wasn't able to run the full pandas test suite as I had some access violation.

Otherwise everything is good to release 2.8.6. I'll give everyone a day to comment.

robbmcleod · 2023-09-10T23:27:49Z

@smorken we could consider adding that as a code snippet to the documentation? Perhaps some section titled "Using NumExpr for evaluating user inputs?"

smorken · 2023-09-11T19:53:59Z

@smorken we could consider adding that as a code snippet to the documentation? Perhaps some section titled "Using NumExpr for evaluating user inputs?"

Sure, by all means, it's mostly just cobbled together from pyparsing examples and by looking at the supported numexpr syntax in the user guide, so feel free to make changes as needed if you see something that could be improved. I am pretty sure that not all of the numexpr syntax would be supported, but I am guessing it might work as a pre-filter for a useful subset of the syntax as it is.

robbmcleod · 2023-09-12T21:54:43Z

2.8.6 has been released, we'll see if there are any further troubles. ::crosses fingers::

@smorken if you want to write a gist I can link to it?

MichaelTiemannOSC · 2023-09-13T13:19:22Z

The 2.8.6 version was just flagged by the same org that flagged 2.8.5: https://vulners.com/osv/OSV:PYSEC-2023-163

I suspect that the real problem is that the LangChain code is the real vulnerability and that numexpr is just exposing what Python itself exposes--an eval that can execute arbitrary code. In my view, the library itself should not be tagged unless it can be exploited by means other than using its standard API in normal ways. But any application that exposes eval to random users is vulnerable, whether they go through a library like numexpr or directly to Python. Somebody needs to sort this with the CVE community, however. I don't think I have standing to argue.

newville · 2023-09-13T18:05:35Z

I came across this from a test failure in my X-ray data analysis codes that uses a library (pyFAI) that uses numexpr.NumExpr on an expression like '4.0e-9*x') (see #449).

I am shocked to learn that numexpr uses eval, and somewhat alarmed at the simplistic approaches proposed here to disallow dunder names.

Trying to parse Python expressions yourself is foolhardy, especially since Python exposes its own parser with ast. You might, for example, consider replacing eval() with the asteval module (https://github.com/newville/asteval). It is true that I am the author, but this is far from a shameless plus to use code: I support it so that my other codes can work safely.

With asteval you could certainly throw out many of the "supported nodes" you do not like (loops, conditionals, etc) and use it only for evaluations of expressions. You could join the discussion about what attributes of Python objects are unsafe. For details, see the list of disallowed attributes at https://newville.github.io/asteval/motivation.html#how-safe-is-asteval.

smorken · 2023-09-13T20:20:19Z

I spent a couple of hours packaging and testing that snippet I posted here before. It's a strict infix pre-parser that supports (much of) the numexpr syntax and all of the function names. Not sure if it will be useful but it's now here in case anyone wants to look.

https://github.com/smorken/numexpr_preparser

newville · 2023-09-14T02:56:01Z

@smorken. Well, that would definitely preserve the errant behavior of #449:

>>> numexpr_safe_evaluate('4.0e-9')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in numexpr_safe_evaluate
ValueError: Expected end of text, found 'e'  (at char 3), (line:1, col:4)

numexpr is designed for parsing and evaluating numerical Python expressions. It has been around for years, is widely used, it has a testing mechanism in place and good documentation. Yet numexpr.NumExpr fails to parse a valid literal number. It appears this has been known for several weeks, and the latest code was released knowing about this and not even adding a test for it. I hope I am misreading that.

Look I am an outsider here, so feel free to cast me as a bad guy I do not mean any ill-will to anyone. But, I am just shocked to learn that numexpr is ever using Python's eval. I am equally shocked to learn of this because the latest release of numexpr.NumExpr cannot parse a valid literal number in scientific notation.

I am dismayed at the attempts discussed here to try to create a pre-parser so that eval can be used "safely". I do hope that such efforts are not being taken seriously.

Please do not use eval.

smorken · 2023-09-14T04:42:44Z

@newville thanks for taking the time to try it out. Added a fix here, to do with the pyparsing objects. I really don't want to distract from the issue here at hand any further with those scripts though, and to be clear I am not proposing that they be added into numexpr as a robust solution.

As numexpr user who was also surprised to learn eval is being used internally. I welcome progress and concrete solutions on the issue as well!

robbmcleod · 2023-09-14T05:42:47Z

Alright, I'm going to try and provide a bit of history of this project because we have so many people coming here without context.

NumExpr (NE) was originally written by David Cooke in the late 2000s for Python 2.5 as a way to accelerate NumPy calculations. He's no longer involved in open source and no one knows where he is or how to contact him. Because NE predates many things in modern Python, the source has a lot of technical debt and it implements its own Abstract Syntax Tree (AST), because the ast module didn't exist back then (being a Python 2.7 innovation). NE turned 15-years old this year.

Incidentally the ast module documentation is still incomplete dogshit in 2023 and one should look at:

https://greentreesnakes.readthedocs.io/en/latest/

, if you want to understand how it actually works.

Francesc Alted took over maintenance in that void because PyTables used NumExpr to do queries. I was using NumExpr 2015-16 to accelerate some scientific calcs without having to implement customer functions in the C-API all the time. After I quit my Swiss post-doc and moved back to Canada I started on a project to make "NumExpr 3.0" which had the potential to fix all the shortfalls in 2.0. However, it was an (overly) ambitious project, and I got a paying job, and my free time evaporated. The 3.0 branch does use the ast module to parse expressions. However, I completely re-wrote the Python part of NE because it's frankly a mess of mutable arguments going into the NE AST and I added the ability to parse multiple lines with temporary variables (which existed in the virtual machine as just a single 4k block) among numerous other improvements. Fransesc asked me if I could take over the maintenance side of things and I agreed, since it was the reasonable thing to do.

My thinking at that time was that people would stop using NE and switch to Numba. I saw NE3 as having a potentially niche for "write once; execute on big data" scripts where the user didn't want to unravel their vectorized calculations to use Numba. Little did I know at that time that a lot of people were using NE2 for a purpose it was never (IMO) intended to be used for: parsing user inputs. This is very clear if you look at the myriad of reported issues on this repo where people are using NumExpr to parse singletons (and hence running into issues with the NE AST), whereas in my opinion NE was intended to be a blocked-calculation virtual machine to avoid being calculation limited by memory bandwidth. NE is extremely inefficient for parsing singleton inputs. CPython itself is more efficient. I've tried repeatedly in the Issues tracker to discourage people from using NE in this way, to no avail.

I personally do not have the bandwidth to implement a new AST in the package. In 2017 yes, but not in 2023. Now, if someone wants to take over maintenance of the project, I'd be thrilled to hand it over. It's possible money could be sourced from one of the open-source funding agencies. The adjacent packages: NumPy, Pandas, and PyTables are all funded. Francesc has approached me in the past asking if I wanted to be part of one of the grant applications, and I said no. I've also been asked to consult for companies using NumExpr internally and again I said no (although this has dried up over the past couple of years). For me, it's not about the money, it's about my personal time.

If I ask someone, "can you please write a unit test for this edge case?" and I get a no code response, I'm not going to be able to fix the problem in 15 minutes on my lunch break. To be clear: I don't use NumExpr in a professional context. I wrote my own virtual machine for professional use. There's no benefit to me in continuing to maintain this project.

t20100 · 2023-09-14T09:14:33Z

Thanks @robbmcleod for maintaining this project!

As this issue highlights, numexpr is a very much used piece of code.

Since the effort of refactoring/rewriting numexpr is too huge, a disclaimer in the documentation about security issues as initially proposed sounds more affordable.

Regarding the issue with scientific notation in v2.8.6, PR #451 proposes both a test and a fix for it.
If this fix is suited, a bug fix release would be much appreciated!

newville · 2023-09-14T16:59:34Z

@robbmcleod

NumExpr (NE) was originally written by David Cooke in the late 2000s for Python 2.5 as a way to accelerate NumPy calculations. He's no longer involved in open source and no one knows where he is or how to contact him. Because NE predates many things in modern Python, the source has a lot of technical debt and it implements its own Abstract Syntax Tree (AST), because the ast module didn't exist back then (being a Python 2.7 innovation). NE turned 15-years old this year.

If memory serves,ast was included with Python 2.6 (and even partially available in 2.5, perhaps as a third-party lib). But with Python 2.7 it became the same source->AST parser used by Python itself.

It is OK to play the "this is a very old codebase" card, but the ast module is hardly new. At many points in the process, the developers apparently decided to stick with "home-built" instead of "standard library". That's OK, if they are able and willing to maintain it.

FWIW, the origins of asteval are from about the same time (2.6 to 2.7 transition). The github repo goes back to 2012, reflecting the transition to using git. Again, not a new project.

Incidentally the ast module documentation is still incomplete dogshit in 2023 and one should look at:

https://greentreesnakes.readthedocs.io/en/latest/

, if you want to understand how it actually works.

Yes, the ast documentation is incomplete, but the usage within asteval sort of demonstrates that it is not really that hard to work with. For anyone who sort of understands the concepts (surely anyone who would consider using pyparsing), ast.dump(ast.parse(string)) is pretty self-explanatory.

But also: all of the well-meaning suggestions and bugfixes here about regular expressions, "dunder" names, and using pyparsing to try to make the input for eval "safe" are missing the entire point of the ast module: You do not ever need to do any lexing or parsing of Python statements. Any lexing or parsing of Python statements that you choose to do will add code that has to be maintained and supported. It will almost certainly depend on fragile regular expressions or parsing modules that are non-trivial to understand. In the end, the lexing parsing will be "correct" if and only if it agrees precisely with the results from the ast module. That is, you can use ast or you can decide to do something worse.

Most of what is dangerous about eval is accessing object attributes. That cannot be avoided by parsing. Many of the worst dangers of eval can be avoided by carefully deciding which attributes can be accessed.

Francesc Alted took over maintenance in that void because PyTables used NumExpr to do queries. I was using NumExpr 2015-16 to accelerate some scientific calcs without having to implement customer functions in the C-API all the time. After I quit my Swiss post-doc and moved back to Canada I started on a project to make "NumExpr 3.0" which had the potential to fix all the shortfalls in 2.0. However, it was an (overly) ambitious project, and I got a paying job, and my free time evaporated. The 3.0 branch does use the ast module to parse expressions. However, I completely re-wrote the Python part of NE because it's frankly a mess of mutable arguments going into the NE AST and I added the ability to parse multiple lines with temporary variables (which existed in the virtual machine as just a single 4k block) among numerous other improvements. Fransesc asked me if I could take over the maintenance side of things and I agreed, since it was the reasonable thing to do.

Well, definitely Thanks to you and Fransesc (and David Cooke) for doing that -- it is much appreciated.

My thinking at that time was that people would stop using NE and switch to Numba. I saw NE3 as having a potentially niche for "write once; execute on big data" scripts where the user didn't want to unravel their vectorized calculations to use Numba. Little did I know at that time that a lot of people were using NE2 for a purpose it was never (IMO) intended to be used for: parsing user inputs. This is very clear if you look at the myriad of reported issues on this repo where people are using NumExpr to parse singletons (and hence running into issues with the NE AST), whereas in my opinion NE was intended to be a blocked-calculation virtual machine to avoid being calculation limited by memory bandwidth. NE is extremely inefficient for parsing singleton inputs. CPython itself is more efficient. I've tried repeatedly in the Issues tracker to discourage people from using NE in this way, to no avail.

Yeah, I understand that...

I personally do not have the bandwidth to implement a new AST in the package. In 2017 yes, but not in 2023. Now, if someone wants to take over maintenance of the project, I'd be thrilled to hand it over. It's possible money could be sourced from one of the open-source funding agencies. The adjacent packages: NumPy, Pandas, and PyTables are all funded. Francesc has approached me in the past asking if I wanted to be part of one of the grant applications, and I said no. I've also been asked to consult for companies using NumExpr internally and again I said no (although this has dried up over the past couple of years). For me, it's not about the money, it's about my personal time.

I understand that too. The numexpr devels might just decide that replacing eval with asteval would circumvent the worst security issues, and avoid all of the discussion here about various band-aids for eval. But, I no absolutely nothing about the numexpr code base.

If I ask someone, "can you please write a unit test for this edge case?" and I get a no code response, I'm not going to be able to fix the problem in 15 minutes on my lunch break. To be clear: I don't use NumExpr in a professional context. I wrote my own virtual machine for professional use. There's no benefit to me in continuing to maintain this project.

Well, I think that many of us will understand trying to maintain software, especially on lunch breaks ;). It looks like you were updating and releasing versions until fairly recently, but maybe I am not understanding some things. Is someone else maintaining this?

rebecca-palmer · 2023-09-18T21:05:50Z

As previously noted, dunder_pat is still blocking some things that aren't dunders. Hence, pandas still fails a test.

rebecca-palmer · 2023-09-22T10:53:22Z

@robbmcleod what, if anything, blocks the above two fixes from being applied, to at least fix the known unnecessary breakage? (See #452 if you prefer a proper pull request.)

In the longer term, it looks to me like everyone here agrees that moving to ast would be better, but will take work.

I may be interested in becoming a maintainer and/or contributing to that work, but this is not a promise at this point.

newville · 2023-09-22T15:42:36Z

Yes, please merge #452 and #451 (both fix literals using scientific notation, #452 adds better checking for dunder names, while #451 adds a test for numeric literals using scientific notation).

As it stands, downstream packages must give a specific and not-the-latest version for numexpr, in their requirements such as numpexpr<=2.8.4.

FrancescAlted · 2023-09-22T15:55:36Z

One can also take the opportunity to produce wheels for forthcoming Python 3.12. Although 3.12 is not final yet (Oct, 2nd is the tentative date), Python folks will not be introducing ABI changes after existing 3.12rc2, so extensions built on it should work well with the forthcoming 3.12 final. Also, the NumPy team is already producing wheels for 3.12, so this dependency should be ready too.

@robbmcleod I'm willing to help in doing the release in case you don't have lots of time right now. BTW, thanks for all the time that you have put in the project so far; you have done a most excellent job in maintaining the project.

FrancescAlted · 2023-09-25T16:36:39Z

I am in the process to release 2.8.7, with the suggestions here. If you want to test how the candidate looks like, please go to #453 and give it a try. My plan is to do a release as soon as possible (hopefully by tomorrow).

Also, and after talking with @robbmcleod , I have added an advert in the README where it is said that the project is looking for (much needed) new maintainers. If anyone here is ready for tackling that, please speak. Thanks!

avalentino · 2023-09-26T17:28:30Z

Maybe this issue can be closed now, right?

chipmuenk · 2023-10-10T06:47:26Z

I'm a bit late to the show but I only noticed yesterday that numexpr also fails with simple complex numbers like 1.0j which affects my software https://github.com/chipmuenk/pyfda. I think parsing complex numbers should be legit use case for numexpr.

rebecca-palmer · 2023-10-10T17:37:53Z

@chipmuenk: yes, that sounds like a bug, sorry.

Untested fix:
-_attr_pat = r'.\b(?!(real|imag|\d*[eE]?[+-]?\d+)\b)'
+_attr_pat = r'.\b(?!(real|imag|\d*[eE]?[+-]?\d+j?)\b)'

FrancescAlted · 2023-10-11T07:49:18Z

@rebecca-palmer could you open a PR adding a test for the new complex case too? thanks in advance!

Dobatymo · 2024-02-22T02:38:19Z

There has been an issue for this since 2018... #323

jan-kubena mentioned this issue Jul 27, 2023

Arbitrary code execution in LLMMathChain langchain-ai/langchain#8363

Closed

14 tasks

robbmcleod closed this as completed Aug 6, 2023

robbmcleod reopened this Aug 7, 2023

robbmcleod mentioned this issue Aug 7, 2023

2.8.5 breaks pandas #444

Closed

rebecca-palmer mentioned this issue Aug 16, 2023

BUG: df.query error when using local variable substitution syntax with numexpr 2.8.5 (forbidden control characters) pandas-dev/pandas#54449

Open

3 tasks

rebecca-palmer mentioned this issue Aug 17, 2023

tables.test() Fails ERROR: None (tables.tests.test_queries.ScalarTableUsageTestCase.None) PyTables/PyTables#1044

Closed

loichuder mentioned this issue Sep 13, 2023

ValueError exception for scientific notation with digits after . #449

Closed

t20100 mentioned this issue Sep 13, 2023

Fix scientific notation support #451

Merged

jsolbrig mentioned this issue Sep 18, 2023

Bug in numexpr v2.8.5 - Fix in GeoIPS or pin to numexpr 2.8.4 until further notice NRLMMD-GEOIPS/geoips#332

Closed

1 task

FrancescAlted closed this as completed Sep 27, 2023

jnywong mentioned this issue May 27, 2024

Fix typo ScienceCore/climaterisk#63

Merged

Warn that evaluate() should not be used on user input #442

Warn that evaluate() should not be used on user input #442

Comments

corgeman commented Jul 7, 2023

robbmcleod commented Jul 8, 2023

robbmcleod commented Jul 22, 2023

corgeman commented Jul 26, 2023

robbmcleod commented Aug 6, 2023

robbmcleod commented Aug 6, 2023

jan-kubena commented Aug 7, 2023 • edited Loading

robbmcleod commented Aug 7, 2023

robbmcleod commented Aug 7, 2023

nicoddemus commented Aug 7, 2023

zorion commented Aug 8, 2023

robbmcleod commented Aug 8, 2023

zorion commented Aug 9, 2023

lithomas1 commented Aug 10, 2023

robbmcleod commented Aug 11, 2023

lithomas1 commented Aug 11, 2023

rebecca-palmer commented Aug 14, 2023

robbmcleod commented Aug 15, 2023

robbmcleod commented Aug 15, 2023

nicoddemus commented Aug 15, 2023

rebecca-palmer commented Aug 15, 2023

nicoddemus commented Aug 16, 2023 • edited Loading

rebecca-palmer commented Aug 17, 2023

robbmcleod commented Aug 17, 2023

robbmcleod commented Aug 17, 2023

robbmcleod commented Sep 4, 2023

smorken commented Sep 6, 2023

MichaelTiemannOSC commented Sep 7, 2023

robbmcleod commented Sep 10, 2023

robbmcleod commented Sep 10, 2023

smorken commented Sep 11, 2023 • edited Loading

robbmcleod commented Sep 12, 2023

MichaelTiemannOSC commented Sep 13, 2023

newville commented Sep 13, 2023

smorken commented Sep 13, 2023

newville commented Sep 14, 2023

smorken commented Sep 14, 2023

robbmcleod commented Sep 14, 2023

t20100 commented Sep 14, 2023

newville commented Sep 14, 2023

rebecca-palmer commented Sep 18, 2023

rebecca-palmer commented Sep 22, 2023

newville commented Sep 22, 2023

FrancescAlted commented Sep 22, 2023

FrancescAlted commented Sep 25, 2023

avalentino commented Sep 26, 2023

chipmuenk commented Oct 10, 2023

rebecca-palmer commented Oct 10, 2023

FrancescAlted commented Oct 11, 2023

Dobatymo commented Feb 22, 2024 • edited Loading

jan-kubena commented Aug 7, 2023 •

edited

Loading

nicoddemus commented Aug 16, 2023 •

edited

Loading

smorken commented Sep 11, 2023 •

edited

Loading

Dobatymo commented Feb 22, 2024 •

edited

Loading