Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang on calculating a large number #538

Closed
LordAro opened this issue May 13, 2014 · 11 comments
Closed

Hang on calculating a large number #538

LordAro opened this issue May 13, 2014 · 11 comments
Labels
Bug Things to squish; generally used for issues High Priority

Comments

@LordAro
Copy link

LordAro commented May 13, 2014

It seems to be trivially easy to cause the bot to hang.

".calc 99999999999**9"

hung for several minutes, and was maxing out a core on my server

@elad661
Copy link
Contributor

elad661 commented May 13, 2014

Can reproduce.

It's threaded, so I guess the reason this freezes is the GIL.

We need to see how we can optimize .calc or prevent calculations with such big numbers.

@lramati
Copy link
Contributor

lramati commented May 14, 2014

Maybe before running the calculation, calculate a computational complexity
based on the operators present and not running if the complexity is too
high?

@ari-koivula
Copy link
Contributor

I suggest adding adding constraints on maximum magnitudes of inputs to exponential operations like multiplication and division. Something like 10**1000 could be reasonable.

Exponentiation needs an additional constraint. Maybe something like exp < 10**1000 and exp * (1+log10(number))**3 < 1000 based on theoretical computational complexity of exponentiation.

@saelo
Copy link

saelo commented Jul 6, 2014

I've prepared a basic fix for this issue which simply spawns a seperate process and kills it
after some time: https://github.com/saelo/willie/commit/c391b13463031e02855b6573cca9a13da557fdf8

I'm sure there's stuff to imporove here. Let me know what you think.

@ari-koivula
Copy link
Contributor

@saelo Have you tested this? How does it fare when a big calculation is spammed? It's no good if the host can be flooded with python processes, with each one trying to alloc gigabytes of memory to do the calculation. Something like (2100)(2**100) can allocate quite a bit of memory in a few seconds.

@saelo
Copy link

saelo commented Jul 7, 2014

@Venti- yeah, I've tested it and it's definately not perfect but at least prevents the bot from disconnecting due to timeout. I've ran ~20 huge calculations at the same time on a raspberry pi, the bot is a lot slower during that time but recovers.
You can probably still DOS the server with 100% CPU load if you wanted to though... Then again you can probably do that without the calc module as well. Memory is not really an issue on the RPI as the CPU is "slow enough", don't know about other systems.
To fix that I would suggest limiting the number of times a specific action/command can be triggered within a couple of seconds inside the framework. Alternatively the calc module could be changed to do that but I think doing it in the framework would be better.

@ari-koivula
Copy link
Contributor

While your solution works, I don't really like the idea of spinning up new python processes just to do a simple computation (something computers are supposed to be pretty good at), so I implemented my own suggestion from earlier.

I made the limits quite liberal so you might be able to break it if you try. I based them on how long the calculations took on my laptop and set the limit on around 0.5s. I figured that should be low enough for something like Raspberry to be able to do them in 5s. You could test that though by running the following script on your Raspberry:

Here is a script to print a table of how long pow takes for various arguments around the 1s mark:

from timeit import timeit
print "   " + "".join("{e:>5}e5".format(e=e) for e in range(1, 10))
for n in range(1, 10):
    print "e{n:<2}".format(n=n),
    for e in range(1, 10):
        print "{time:>6.2f}".format(
                 time=timeit("{n}**({e} * 10**5)".format(n=10**n, e=e), number=1)),
    print ""

Here are the result on my laptop (i7-2670QM): x=exp, y=num

-      1e5    2e5    3e5    4e5    5e5    6e5    7e5    8e5    9e5
e1    0.03   0.09   0.16   0.25   0.35   0.46   0.60   0.73   0.88 
e2    0.08   0.24   0.46   0.73   1.03   1.40   1.80   2.21   2.63 
e3    0.15   0.46   0.87   1.39   1.99   2.63   3.35   4.18   5.15 
e4    0.24   0.73   1.39   2.20   3.11   4.18   5.39   6.59   7.88 
e5    0.34   1.03   2.00   3.12   4.48   5.97   7.56   9.37  11.34 
e6    0.46   1.39   2.62   4.16   5.97   7.86  10.09  12.56  15.39 
e7    0.60   1.79   3.34   5.39   7.60  10.16  13.00  16.23  19.44 
e8    0.73   2.20   4.18   6.60   9.37  12.60  16.26  19.83  23.70 
e9    0.87   2.62   5.15   7.93  11.34  15.44  19.40  23.66  28.58 

@ari-koivula
Copy link
Contributor

I guess we should still ad a overall time limit to the evaluator to avoid cases where some complex expression is chained enough times to cause a significant delay. I first thought that the limited input length would take care of that, but with a slow enough processor and the right expression it probably is still a problem. Setting an overall time limit in the evaluator while the individual operations are limited in time should take care of that though. I will do that later today or tomorrow.

I also noticed that for large expressions the most time can actually go to turning the 10**9 digit number into characters, which is why I set a limit on the magnitude of numbers than can be returned. Another way to handle that might be to print just the magnitude and a few digits for large numbers.

@saelo
Copy link

saelo commented Jul 8, 2014

Sure, works for me.
Apparently (for python2) you also need to check for long though:

>>> isinstance(1000000000000000000000000000, int)
False

For the record here is the output of your script from an RPI (overclocked at 900MHz):

       1e5    2e5    3e5    4e5    5e5    6e5    7e5    8e5    9e5
e1    0.25   0.70   1.45   2.36   2.92   4.80   5.74   6.62   7.37 
e2    0.64   1.94   3.90   6.55   9.17  13.07  15.34  20.14  22.71 
e3    1.48   4.21   7.62  12.98  17.62  23.35  30.51  38.11  43.11 
e4    2.39   6.37  12.67  19.15  28.58  39.07  46.19  58.07  71.57 
e5    3.05   9.55  17.40  28.91  45.25  53.52  68.06  85.06 106.04 
e6    4.22  13.36  23.30  38.51  51.99  70.17  92.41 118.36 134.45 
e7    5.32  15.37  30.89  46.53  67.23  85.88 120.25 140.42 164.13 
e8    5.85  17.60  35.50  53.01  76.90 107.40 128.93 160.87 194.99 
e9    7.09  21.37  41.51  64.69  95.64 123.31 156.29 194.97 246.18

@ari-koivula
Copy link
Contributor

Yeah. Python 3 doesn't have longs, but I think comparing against numbers.Integral should do the job.

ari-koivula added a commit that referenced this issue Jul 8, 2014
- Added default timeout of 5s to expression evaluator, that will throw
  an error before evaluating the next statement if the timeout has passed.
  This should stop an expression consisting of several long running
  statements from taking too long.

- Calc now uses general number formatting from str.format to print the
  number according to it's magnitude.

- Fixes bug in Python 2 from #538, where long ints are let through by using
  abstract base class numbers.Integral.
@ari-koivula
Copy link
Contributor

From looking at your results it's seems like the default limit does indeed limit the runtime to roughly 5s on the pi. I guess you could scale the limit function by timing some calculation at initialization time or something, but I think I'm done with this for now.

maxpowa pushed a commit to maxpowa/Inumuta that referenced this issue Feb 20, 2015
- Fixes sopel-irc#538.

- Removed the drivel from calc when exceptions are raised. Printing the actual
  error might not be pretty, but at least it's not completely useless.
maxpowa pushed a commit to maxpowa/Inumuta that referenced this issue Feb 20, 2015
- Added default timeout of 5s to expression evaluator, that will throw
  an error before evaluating the next statement if the timeout has passed.
  This should stop an expression consisting of several long running
  statements from taking too long.

- Calc now uses general number formatting from str.format to print the
  number according to it's magnitude.

- Fixes bug in Python 2 from sopel-irc#538, where long ints are let through by using
  abstract base class numbers.Integral.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Things to squish; generally used for issues High Priority
Projects
None yet
Development

No branches or pull requests

5 participants