docs: Document Firecracker Entropy Approach and Practical Use #663

raduweiss · 2018-11-27T12:30:57Z

We got this very good question on Slack:

Things to add to the FAQ: What kind of entropy source(s) are available to the guest? Are RdRand &/ rdseed available? VirtIO RNG?

Since we spent some time thinking about this, we should document the answer (probably in the FAQ).

mcastelino · 2019-02-05T22:24:30Z

@raduweiss the current entropy available with the firecracker microvmm is pretty low and cause issues with some containers which need entropy.

 # cat /proc/sys/kernel/random/entropy_avail
33

So having an approach that provides a higher level of entropy would be good.

/cc @nmeyerhans

jeromegn · 2019-04-08T14:24:29Z

Not sure if this is the right place, but I found a workaround for the low initial entropy. It's not a great fix as I don't know how could it is long-term... somebody who knows more about this can help fill in the gaps.

I created a go program, translated from this python script, which can be statically compiled and ran sometime during the init phase: https://gist.github.com/jeromegn/ba3f694412979d21dafc9d625b8fcf04 (the translation might be wrong, I'm not great at python!)

Example output during init:

[    0.264564] random: crng init done
Finished entropizing with 2272 of bad entropy. took 3.949527ms

Without this, a lot of applications simply won't run right. Hope this helps other people struggling with this.

Edit: This is bad entropy, do not use for anything serious!

raduweiss · 2019-04-09T07:32:19Z

Finished entropizing with 2272 of bad entropy. took 3.949527ms

@jeromegn taking a look at the script that message is very real. It's bad entropy, which should not be used for anything related to cryptography.

jeromegn · 2019-04-09T14:47:39Z

~~@raduweiss is it? My gut feeling says it's bad entropy too, but I just copied it (down to the log message) from the python source.~~

~~I'm reading conflicting opinions on the matter. This one seems to say it's fine: https://news.ycombinator.com/item?id=7361694~~

Edit: nevermind, this is bad entropy. I'm seeding /dev/random from /dev/urandom with low entropy.

raduweiss · 2019-04-10T09:38:48Z

Yeah, I think we need to do better here. I think at a minimum we need to have a doc with some suggestions how to seed a fresh microVM with good entropy (which becomes really critical if you start lots of microVMs).

We'll look into the possibility of actually building a feature to address this in a painless way.

raduweiss · 2019-04-10T20:30:11Z

Quick answer in case it unblocks anyone: It looks like starting rngd very early in the guest OS boot sequence is a good solution for regular use cases (e.g., SSH, TLS). This goes if it's just run once on guest OS boot, and even if there are lots and lots of microVMs on a host.

This will be expanded upon when the PR associated with this issue comes up.

raduweiss · 2019-04-22T10:50:04Z

Another option is to pass some bites from the host during the microVM creation to the guest (e.g., via writing them in a guest-visible file, or making them available in the MMDS).

petreeftime · 2019-04-24T10:54:45Z

Probably, the correct way to go about this is to add RANDOM_TRUST_CPU to the default kernel config for firecracker or adding random.trust_cpu to the boot parameters, unless you have a better source of entropy to seed the vm from, via a virtio-rng device or something similar. See: https://lwn.net/Articles/760584/.

jeromegn · 2019-04-24T12:00:47Z

@petreeftime adding random.trust_cpu=on to the boot parameters does not appear to help. I'm not sure using CONFIG_RANDOM_TRUST_CPU would help if the boot param did not. Could be worth a shot though.

petreeftime · 2019-04-24T12:17:45Z

random.trust_cpu is pretty new, it's was first available in Linux 4.19: torvalds/linux@9b25436. You should see this message if it was activated: random: crng done (trusting CPU's manufacturer).

jeromegn · 2019-04-24T12:58:38Z

@petreeftime thanks a lot! I just compiled a 4.19.36 kernel with the config in this repo and then booted with the random.trust_cpu=on param and it worked! way faster boot.

wahern-splunk · 2019-08-05T22:18:58Z

What's wrong with virtio-rng? https://wiki.qemu.org/Features/VirtIORNG

bbros-dev · 2020-03-14T17:42:04Z

Just a note that having low/zero entropy on a VM can be good.

For instance, this 'feature' of the firecracker VM allowed us to isolate the cause of what were random instances of terraform just hanging (Terraform issue #24375).

When we got terrafom on its own in the VM the hanging became reproducible..... the hanging was caused by waiting for the kernel entropy to reach a certain level. Which it never did.

rng-tools start and the hanging went away.

Just a war story to hopefully persuade you that whatever you add to increase entropy should be easily turned off.

I'd argue for the default behavior to be: no source of increased entropy, with some easy way to turn it on (and back off).

bbros-dev · 2020-03-14T18:00:18Z

I'd also like to cite issue #325 as another data point in favor of having zero/low entropy as a default.
Of course this must be prominently documented along side the 5-devices constraint. With suggestions about how to elevate the starting level of entropy.

While @sipsma was unhappy that his agent started to hang out of nowhere, the consistently low entropy in Firecracker is actually what allowed him to isolate the underlying source of his hang events (a 3rd party library that had introduced a reliance on the kernel level of entropy).

This is not uncommon. Once you know that a likely source of very early hanging, or seemingly random hanging, is the level of entropy, you can isolate the cause (strace, gdb).

However, if the default is to have high levels on entropy injected as the default you are back to square one of having random hanging events (whenever the start entropy is still too low) that you can never reproduce to prove it is application/library xyz.

bbros-dev · 2020-03-14T18:13:39Z

For another data point in favor of consistently low entropy is comment in firecracker-demo issue #23 also makes clear that as long as you can reproduce the low entropy condition you can get an error msg (or strace or dbg output) showing you the source of the behavior is the low entropy of the system, and which library/ application is responsible.

Remedies are known, but need to be documented prominently.

raduweiss · 2020-03-15T09:25:06Z

@bbros-dev that's an interesting angle. We'll keep it in mind.

bbros-dev · 2020-06-14T12:03:38Z

Just a note that it appears the kernel is moving toward removing the blocking pool of random numbers (currently /dev/random blocks when entropy falls 'too low').
Thankfully it sounds like the kernel authors are going to try to remove the kernel from the random number game - placing onus on user space: LWN 07-Jan-2020 - Removing the Linux /dev/random blocking pool

When this comes to pass it is likely VM guest users are going to look to the VM host to expose any RNG hardware it has.
It is unlikely cloud vendors are going to step where the kernel fears to tread.

Even if the kernel does not push RNG into the users lap, and even if cloud vendors do provide access to some hardware RNG: it will be useful to document firecracker best practices for a user to provide a VM instance random seeds and update the entropy count apropriately - ideally KIS.

petreeftime · 2020-06-15T14:00:59Z

I am reading the article as the opposite: the kernel's CSPRNG is good enough that there's no need to have a special blocking pool and can safely rely on urandom for entropy for all things. I would advise towards adding an optional virtio-rng device to Firecracker that can be used to seed the guest CSPRNG with host entropy on boot or periodically.

bbros-dev · 2020-06-15T21:46:35Z

I am reading the article as the opposite: the kernel's CSPRNG is good enough

From Lutomirski's patch which, at the time of writing, seems headed for mainline (bold added):

It adds getentropy(..., GRND_INSECURE). This causes getentropy to
always return something. There is no guarantee whatsoever that
the result will be cryptographically random or even unique, but the
kernel will give the best quality random output it can. The name is
a big hint: the resulting output is INSECURE.

I think if the kernel core is not guaranteeing unique or even a 'cryptographically random' RN it would be a mistake (caveats follow) to add an layer of indirection to a VM that suggests there is a random source available. The caveats are that the device will return some thing if:

it is connected to a hardware device on the host.
It is provided with seed values by the user and the user agrees to take responsibility to provide some way of maintaining a pool of random data; haveged, rng-tools, etc., or some such utility.

What that thing is should be for the the hardware provider to account for, or the user to take responsibility for if they insist on using virtio-rng without a hardware source.

The issue I've flagged is that right now the user can't just provide some random data at startup, and dust their hands, they have to provide code to update the entropy counter or they (more likely some library they use) will still block on /dev/random. That is, the example code @mcastelino provides above will still return 33.

The current workaround is to install some utility like haveged, etc. but this goes against Firecracker's lightweight/perfromance objective.
Hopefully it is temporary state of affairs in a Firecracker VM that you need to wait for havegd etc in order to startup in many real world use cases. (?)

Yes this blocking behavior seems destined to change, but its not because the kernel authors claim to have come up with some way of giving endless random data - rather it appears they are preparing to step aside (correctly in my view) and push responsibility to the user side to determine if the data is random or not for that user purpose.

Remember with Firecracker we're in a fairly unique position, the keyboard, if present, has one key (IMO this is good an should not change), you can run an app from memory so your give up the disk activity as an entropy source, effectively leaving you with device events at 4 unknown bits per event.

Since we're talking about documentation maybe it is worth thinking about removing the word generator since that gives the impression the device/Firecracker is actually creating (generating) something for you.

What I'm suggesting is a device that is some kind of passthrough utility - passing through the hardware vendor data, or the data the user provided at startup.

dianpopa · 2021-04-28T12:27:15Z

In the context of adding snapshot support for Firecracker, we have added documentation containing recommendations related to handling randomness when restoring
multiple microVM clones from a single snapshot: https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/random-for-clones.md. These recommendations also apply for the general use case (i.e handling randomness for any microVM).

raduweiss added Documentation: Design labels Nov 27, 2018

mcastelino mentioned this issue Feb 7, 2019

Low entropy level using firecracker kata-containers/runtime#1167

Closed

alexandruag mentioned this issue Feb 15, 2019

Update FAQ #826

Closed

5 tasks

alexandruag added the Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled label Feb 15, 2019

jeromegn mentioned this issue Apr 8, 2019

rootfs creation documentation is inaccurate #1048

Closed

raduweiss added Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled and removed Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled labels Apr 10, 2019

raduweiss added Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled and removed Priority: High Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled labels Apr 22, 2019

acatangiu mentioned this issue Aug 12, 2019

Running actual Lambda functions in microVMs firecracker-microvm/firecracker-demo#13

Closed

sipsma mentioned this issue Nov 7, 2019

Low entropy causes VM agent to hang indefinitely during start firecracker-microvm/firecracker-containerd#325

Closed

nmeyerhans mentioned this issue Mar 11, 2020

Support virtio-rng #1671

Closed

twelho mentioned this issue Jun 5, 2020

VMs have too low entropy, causes long delays weaveworks/ignite#613

Open

KarthikNedunchezhiyan mentioned this issue Nov 29, 2020

[Bug] Running NodeJs inside microVM fails #2311

Closed

dianpopa closed this as completed Apr 28, 2021

bchalios mentioned this issue Mar 22, 2023

[Bug] Unexpected high latency on first invocation of python's os.urandom #3549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Document Firecracker Entropy Approach and Practical Use #663

docs: Document Firecracker Entropy Approach and Practical Use #663

raduweiss commented Nov 27, 2018

mcastelino commented Feb 5, 2019

jeromegn commented Apr 8, 2019 •

edited

Loading

raduweiss commented Apr 9, 2019

jeromegn commented Apr 9, 2019 •

edited

Loading

raduweiss commented Apr 10, 2019

raduweiss commented Apr 10, 2019

raduweiss commented Apr 22, 2019

petreeftime commented Apr 24, 2019 •

edited

Loading

jeromegn commented Apr 24, 2019

petreeftime commented Apr 24, 2019

jeromegn commented Apr 24, 2019

wahern-splunk commented Aug 5, 2019

bbros-dev commented Mar 14, 2020

bbros-dev commented Mar 14, 2020 •

edited

Loading

bbros-dev commented Mar 14, 2020 •

edited

Loading

raduweiss commented Mar 15, 2020

bbros-dev commented Jun 14, 2020 •

edited

Loading

petreeftime commented Jun 15, 2020

bbros-dev commented Jun 15, 2020 •

edited

Loading

dianpopa commented Apr 28, 2021

docs: Document Firecracker Entropy Approach and Practical Use #663

docs: Document Firecracker Entropy Approach and Practical Use #663

Comments

raduweiss commented Nov 27, 2018

mcastelino commented Feb 5, 2019

jeromegn commented Apr 8, 2019 • edited Loading

raduweiss commented Apr 9, 2019

jeromegn commented Apr 9, 2019 • edited Loading

raduweiss commented Apr 10, 2019

raduweiss commented Apr 10, 2019

raduweiss commented Apr 22, 2019

petreeftime commented Apr 24, 2019 • edited Loading

jeromegn commented Apr 24, 2019

petreeftime commented Apr 24, 2019

jeromegn commented Apr 24, 2019

wahern-splunk commented Aug 5, 2019

bbros-dev commented Mar 14, 2020

bbros-dev commented Mar 14, 2020 • edited Loading

bbros-dev commented Mar 14, 2020 • edited Loading

raduweiss commented Mar 15, 2020

bbros-dev commented Jun 14, 2020 • edited Loading

petreeftime commented Jun 15, 2020

bbros-dev commented Jun 15, 2020 • edited Loading

dianpopa commented Apr 28, 2021

jeromegn commented Apr 8, 2019 •

edited

Loading

jeromegn commented Apr 9, 2019 •

edited

Loading

petreeftime commented Apr 24, 2019 •

edited

Loading

bbros-dev commented Mar 14, 2020 •

edited

Loading

bbros-dev commented Mar 14, 2020 •

edited

Loading

bbros-dev commented Jun 14, 2020 •

edited

Loading

bbros-dev commented Jun 15, 2020 •

edited

Loading