Skip to content

On running untrusted code in AWS Lambda

Nick Zavaritsky edited this page Jun 22, 2022 · 5 revisions

The appeal of AWS Lambda

The core feature of luajit.me is running untrusted Lua code submitted via Web UI.

AWS Lambda is appealing as it would allow to drive costs down and enable auto scaling without any effort on my part. Last but not least, it supports both amd64 and aarch64!

The need for sandboxing

Sandboxing is a must. Even though it is presumably possible to isolate a Lambda function from the rest of the infrastructure, an exploit could leave it in a weird state. This is concerning as Lambdas are reused if requests come in a short succession.

For luajit.me use case it is sufficient to sandbox a single process. I.e. being able to filter syscalls would be sufficient.

Sandboxing in AWS Lambda

As it turns out, in AWS Lambda chroot, seccomp and ptrace are failing with EPERM. This makes sandboxing rather tricky.

Another option considered was QEMU.

As KVM is not available, we are running in dynamic binary translation mode. Unlike KVM-based virtualiser, it didn't undergo security audit and is not recommended. Probably still fine for a low-profile project.

QEMU can virtualise the whole system or a single user-space process. The later is utterly insecure therefore we went for a whole system virtualisation. Here's some timing data.

Baseline

+ time luajit m.lua
real	0m 0.04s
user	0m 0.00s
sys	0m 0.00s
+ time luajit m.lua 1000
real	0m 0.35s
user	0m 0.10s
sys	0m 0.00s

Mandelbrot program, rendering 100x100 and 1000x1000 bitmaps.

Running inside a VM

+ vmwrap --no-kvm sh -c 'time luajit m.lua > /dev/null'
real	0m 2.64s
user	0m 0.17s
sys	0m 1.34s
+ vmwrap --no-kvm sh -c 'time luajit m.lua 1000 > /dev/null'
real	0m 7.61s
user	0m 5.06s
sys	0m 1.46s

Same program. Using vmwrap as a simple tool to run a workload in a QEMU VM.

Time to init a VM

Approx. 30 seconds.

Conclusion

The only working approach to sandboxing untrusted code in AWS Lambda was QEMU. The overall slowdown compared to running unsandboxed was in the range of 22-66. This is absolutely unacceptable as instant response would've been replaced with staring at a spinning wheel. I strongly believe that being blazingly fast is the important feature of luajit.me.

Long VM init time is also challenging; restoring from a QEMU snapshot could help here.

Therefore I regret to discard AWS Lambda as the prospective option for luajit.me.

AWS Lambda should absolutely support sandboxing!

People don't normally run untrusted code in AWS Lambda, I get it.

However other perfectly valid use cases exist. Consider image processing, for example. ImageMagick is yet quite popular and it is not particularly secure. One can easily contain the damage by running ImageMagick sandboxed using a tool like nsjail.

To implement security in depth we place multiple layers of defence. Even in a greenfield project dealing with complex user inputs it is worthwhile to sandbox the processing logic. What if the code was not written in house? What if we are dealing with something relatively complex and arcane, like pdf?

Do you know another case of complex software dealing with untrusted input? The web browser. If you Google for "web screenshot API" a handful of offerings will surface. This looks like a perfect job for AWS Lambda as it scales effortlessly. But, unfortunately your security is going to be compromised. Would you ever consider running Chrome with the sandbox disabled on your device? Why then you are willing to put your backends at risk?

Sandboxing on Linux is typically achieved with a combination of namespaces, seccomp and cgroups. All three are Linux kernel features. AWS runs a dedicated micro VM for your Lambda. It runs Linux and it could've exposed the kernel features that are essential for sandboxing. Unfortunately, Amazon decided differently.

If you are AWS user, you should ask Amazon for a less crippled Linux to power your Lambda.

Resources