-
Notifications
You must be signed in to change notification settings - Fork 8
On running untrusted code in AWS Lambda
The core feature of luajit.me
is running untrusted Lua code submitted via Web UI.
AWS Lambda is appealing as it would allow to drive costs down and enable auto scaling without any effort on my part. Last but not least, it supports both amd64
and aarch64
!
Sandboxing is a must. Even though it is presumably possible to isolate a Lambda function from the rest of the infrastructure, an exploit could leave it in a weird state. This is concerning as Lambdas are reused if requests come in a short succession.
For luajit.me
use case it is sufficient to sandbox a single process. I.e. being able to filter syscalls would be sufficient.
As it turns out, in AWS Lambda chroot
, seccomp
and ptrace
are failing with EPERM
. This makes sandboxing rather tricky.
Another option considered was QEMU.
As KVM is not available, we are running in dynamic binary translation mode. Unlike KVM-based virtualiser, it didn't undergo security audit and is not recommended. Probably still fine for a low-profile project.
QEMU can virtualise the whole system or a single user-space process. The later is utterly insecure therefore we went for a whole system virtualisation. Here's some timing data.
+ time luajit m.lua
real 0m 0.04s
user 0m 0.00s
sys 0m 0.00s
+ time luajit m.lua 1000
real 0m 0.35s
user 0m 0.10s
sys 0m 0.00s
Mandelbrot program, rendering 100x100 and 1000x1000 bitmaps.
+ vmwrap --no-kvm sh -c 'time luajit m.lua > /dev/null'
real 0m 2.64s
user 0m 0.17s
sys 0m 1.34s
+ vmwrap --no-kvm sh -c 'time luajit m.lua 1000 > /dev/null'
real 0m 7.61s
user 0m 5.06s
sys 0m 1.46s
Same program. Using vmwrap as a simple tool to run a workload in a QEMU VM.
Approx. 30 seconds.
The only working approach to sandboxing untrusted code in AWS Lambda was QEMU. The overall slowdown compared to running unsandboxed was in the range of 22-66. This is absolutely unacceptable as instant response would've been replaced with staring at a spinning wheel. I strongly believe that being blazingly fast is the important feature of luajit.me
.
Long VM init time is also challenging; restoring from a QEMU snapshot could help here.
Therefore I regret to discard AWS Lambda as the prospective option for luajit.me
.
People don't normally run untrusted code in AWS Lambda, I get it.
However other perfectly valid use cases exist. Consider image processing, for example. ImageMagick is yet quite popular and it is not particularly secure. One can easily contain the damage by running ImageMagick sandboxed using a tool like nsjail
.
To implement security in depth we place multiple layers of defence. Even in a greenfield project dealing with complex user inputs it is worthwhile to sandbox the processing logic. What if the code was not written in house? What if we are dealing with something relatively complex and arcane, like pdf?
Do you know another case of complex software dealing with untrusted input? The web browser. If you Google for "web screenshot API" a handful of offerings will surface. This looks like a perfect job for AWS Lambda as it scales effortlessly. But, unfortunately your security is going to be compromised. Would you ever consider running Chrome with the sandbox disabled on your device? Why then you are willing to put your backends at risk?
Sandboxing on Linux is typically achieved with a combination of namespaces, seccomp and cgroups. All three are Linux kernel features. AWS runs a dedicated micro VM for your Lambda. It runs Linux and it could've exposed the kernel features that are essential for sandboxing. Unfortunately, Amazon decided differently.
If you are AWS user, you should ask Amazon for a less crippled Linux to power your Lambda.