-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Document Firecracker Entropy Approach and Practical Use #663
Comments
@raduweiss the current entropy available with the firecracker microvmm is pretty low and cause issues with some containers which need entropy.
So having an approach that provides a higher level of entropy would be good. /cc @nmeyerhans |
Not sure if this is the right place, but I found a workaround for the low initial entropy. It's not a great fix as I don't know how could it is long-term... somebody who knows more about this can help fill in the gaps. I created a go program, translated from this python script, which can be statically compiled and ran sometime during the init phase: https://gist.github.com/jeromegn/ba3f694412979d21dafc9d625b8fcf04 (the translation might be wrong, I'm not great at python!) Example output during init:
Without this, a lot of applications simply won't run right. Hope this helps other people struggling with this. Edit: This is bad entropy, do not use for anything serious! |
@jeromegn taking a look at the script that message is very real. It's bad entropy, which should not be used for anything related to cryptography. |
Edit: nevermind, this is bad entropy. I'm seeding /dev/random from /dev/urandom with low entropy. |
Yeah, I think we need to do better here. I think at a minimum we need to have a doc with some suggestions how to seed a fresh microVM with good entropy (which becomes really critical if you start lots of microVMs). We'll look into the possibility of actually building a feature to address this in a painless way. |
Quick answer in case it unblocks anyone: It looks like starting rngd very early in the guest OS boot sequence is a good solution for regular use cases (e.g., SSH, TLS). This goes if it's just run once on guest OS boot, and even if there are lots and lots of microVMs on a host. This will be expanded upon when the PR associated with this issue comes up. |
Another option is to pass some bites from the host during the microVM creation to the guest (e.g., via writing them in a guest-visible file, or making them available in the MMDS). |
Probably, the correct way to go about this is to add |
@petreeftime adding |
|
@petreeftime thanks a lot! I just compiled a 4.19.36 kernel with the config in this repo and then booted with the random.trust_cpu=on param and it worked! way faster boot. |
What's wrong with virtio-rng? https://wiki.qemu.org/Features/VirtIORNG |
Just a note that having low/zero entropy on a VM can be good. For instance, this 'feature' of the firecracker VM allowed us to isolate the cause of what were random instances of terraform just hanging (Terraform issue #24375). When we got terrafom on its own in the VM the hanging became reproducible..... the hanging was caused by waiting for the kernel entropy to reach a certain level. Which it never did.
Just a war story to hopefully persuade you that whatever you add to increase entropy should be easily turned off. I'd argue for the default behavior to be: no source of increased entropy, with some easy way to turn it on (and back off). |
I'd also like to cite issue #325 as another data point in favor of having zero/low entropy as a default. While @sipsma was unhappy that his agent started to hang out of nowhere, the consistently low entropy in Firecracker is actually what allowed him to isolate the underlying source of his hang events (a 3rd party library that had introduced a reliance on the kernel level of entropy). This is not uncommon. Once you know that a likely source of very early hanging, or seemingly random hanging, is the level of entropy, you can isolate the cause ( However, if the default is to have high levels on entropy injected as the default you are back to square one of having random hanging events (whenever the start entropy is still too low) that you can never reproduce to prove it is application/library xyz. |
For another data point in favor of consistently low entropy is comment in firecracker-demo issue #23 also makes clear that as long as you can reproduce the low entropy condition you can get an error msg (or Remedies are known, but need to be documented prominently. |
@bbros-dev that's an interesting angle. We'll keep it in mind. |
Just a note that it appears the kernel is moving toward removing the blocking pool of random numbers (currently When this comes to pass it is likely VM guest users are going to look to the VM host to expose any RNG hardware it has. Even if the kernel does not push RNG into the users lap, and even if cloud vendors do provide access to some hardware RNG: it will be useful to document firecracker best practices for a user to provide a VM instance random seeds and update the entropy count apropriately - ideally KIS. |
I am reading the article as the opposite: the kernel's CSPRNG is good enough that there's no need to have a special blocking pool and can safely rely on urandom for entropy for all things. I would advise towards adding an optional virtio-rng device to Firecracker that can be used to seed the guest CSPRNG with host entropy on boot or periodically. |
From Lutomirski's patch which, at the time of writing, seems headed for mainline (bold added):
I think if the kernel core is not guaranteeing unique or even a 'cryptographically random' RN it would be a mistake (caveats follow) to add an layer of indirection to a VM that suggests there is a random source available. The caveats are that the device will return some
What that The issue I've flagged is that right now the user can't just provide some random data at startup, and dust their hands, they have to provide code to update the entropy counter or they (more likely some library they use) will still block on The current workaround is to install some utility like haveged, etc. but this goes against Firecracker's lightweight/perfromance objective. Yes this blocking behavior seems destined to change, but its not because the kernel authors claim to have come up with some way of giving endless random data - rather it appears they are preparing to step aside (correctly in my view) and push responsibility to the user side to determine if the data is random or not for that user purpose. Remember with Firecracker we're in a fairly unique position, the keyboard, if present, has one key (IMO this is good an should not change), you can run an app from memory so your give up the disk activity as an entropy source, effectively leaving you with device events at 4 unknown bits per event. Since we're talking about documentation maybe it is worth thinking about removing the word What I'm suggesting is a device that is some kind of passthrough utility - passing through the hardware vendor data, or the data the user provided at startup. |
In the context of adding snapshot support for Firecracker, we have added documentation containing recommendations related to handling randomness when restoring |
We got this very good question on Slack:
Since we spent some time thinking about this, we should document the answer (probably in the FAQ).
The text was updated successfully, but these errors were encountered: