Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose stronger encryption by default and/or re-use encryption parameters of LUKS container #1539

Closed
UndeadDevel opened this issue Nov 25, 2023 · 25 comments · Fixed by #1541
Closed

Comments

@UndeadDevel
Copy link
Contributor

UndeadDevel commented Nov 25, 2023

Is your feature request related to a problem? Please describe.

  1. Currently re-encrypting the LUKS container from Heads seems to choose default parameters, which are no longer recommended (argon2i with 1 GB memory cost). E.g. the IRTF recommendation states that argon2id with a 6 GB memory cost and parallelity of 4 is recommended for disk encryption; 6 GB might be problematic due to dom0 having only 4 GB of RAM by default, so it would be difficult to change the passphrase from dom0 later in that case (but maybe the user shouldn't do that anyway), but 2 GB would definitely work, or even 3 or perhaps 4, as there is typically 4 GB of swap in dom0 also. 2 GB is recommended in that RFC for general use cases. Other OSs Heads runs on shouldn't have a problem even with 6GB, but see below under "Additional context".
  2. Another problem with the current, weaker defaults is the theoretical possibility of compromising the container using an old header or copy of the current volume encryption key encrypted with weaker parameters, even after the user has changed to stronger parameters later in dom0, if Heads is encrypting containers on COW media, such as SSDs, which are widely used these days and even recommended by QubesOS. See e.g. the cryptseptup FAQ point 5.19 and others.
  3. There's also an open issue in the cryptsetup gitlab that is related: does Heads first create the new key (for the keyslots, not the volume key) and only then re-encrypt (which always creates a new volume key first)? This is the order that is recommended, because otherwise the chance of the old keyslots lying around is greater and the new volume key would first be encrypted with weaker settings; when it is later re-encrypted in the header upon changing the params and keyslots, on COW media, such as SSDs, the old, more weakly encrypted copy of the new volume key may linger.

Describe the solution you'd like
I think probably the best solution would be to read out the parameters currently used for that container (luksDump) and then either re-use them or prompt the user, telling him the current parameters and whether they are outdated (e.g. pbkdf2 instead of argon2 being used; memory cost too low, e.g. <2 GB if argon2 is used; argon2 being used in weaker variants, i.e. not argon2id). Maybe the user could also be prompted for the iter-time (-i=milliseconds parameter; default is 2000), so that the users can choose their own security / convenience tradeoff, especially since the recommended 6 GB memory are currently not possible with cryptsetup (see "Additional context").

Describe alternatives you've considered
An alternative that is still better than the current situation would be to use stronger defaults, at mimimum argon2id with 2 GB memory cost (and iter time of >=2000, but 2000 is already the default with LUKS2).

Additional context
Do note that currently cryptsetup seems to only support a memory cost of up to 4 GB and parallelity of 4. It will also treat those two parameters (--pbkdf-memory=numBytes and --pbkdf-parallel=numThreads) as a maximum; a parallelity of 4 is used as default by cryptsetup; should the system for some weird reason have e.g. less than 4 GB available while 4 GB are given as the parameter or fewer than 4 cores, then cryptsetup will lower the actual parameters used.

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 25, 2023

@UndeadDevel first things first, thanks for the good challenge on implementation.

I just want to make sure we understand what is done today.

Two cryptsetup modules exist under heads. Old modules/cryptsetup being 1.75 and newer modules/cryptsetup2 currently being 2.4.0

All boards configs are including modules/cryptsetup2 as of today.

Now, the reencrypting functions we are talking about have limited arguments forced by Heads today: https://github.com/linuxboot/heads/blob/master/initrd/etc/luks-functions#L350

As pointed at line 350 :

cryptsetup-reencrypt -B 64 --use-directio "$LUKS" --key-slot 0 --key-file /tmp/luks_current_Disk_Recovery_Key_passphrase

This choices were made based on a tradeoff back then where assumptions made by cryptsetup were wrong and where ETA were otherwise exploding.

Note that the passphrase stored under /tmp/ could be moved to /tmp/secrets to be wiped on recovery shell access even more securely by default codepath or on kexec path as well, but the assumption here again is that tmp is ramfs of initrd and recreated on reboot or overwritten on final os boot in all case, where the owner of the computer is supposed to be behind the computer and alone when doing re-ownership wizard. Also note the shred calls that follows usage in all case. The passphrase is not the key. The passphrase creates/unlocks the key.

Slot 0 is forced here because the goal is to not add another key, but make sure that the installation key is overwritten in the best way possible. As you pointed out with 5.19 documentation of SSD drives from cryptsetup, there is not much more that can be done but overwrite it and leave physical forensic possible with soldering implied. Using SSD firmware secure erase was debated in other issues and under cryptsetup for years, and SSD doesn't offer better option then using encryption for the blocks. As also present on cryptsetup doc, entropy of passphrase is the important part here. So Heads again does the right thing here if a user decides to add a sealed TPM Disk Unlock Key, which passphrase is then a 128 random characters, sealed into TPM with firmware measurements, luks header measurements and runtime (loaded kernel modules, no recovery shell access) to be possible to be unsealed with a user selected passphrase which can then be of lower entropy, rate limited by tpm with a fallback of having final OS prompt for LUKS if heads doesn't know how to deal with that LUKS container. That TPM DUK generated passphrase is then unsealed and injected in kexec reconstructed final initrd which is passed to final Os and used to decrypt LUKS container without the user having to type high entropy passphrase in untrusted environment.

Argon is reused if OS created LUKS container with it in version 2.4.0. Heads doesn't interfere with OS distribution choices as of today. Note, though, that this might need to be revisited (and cryptsetup2 module version bumped) if new platforms not having CPU AES extension are to be added under Heads. If today, OS decides to create LUKS container with Adiantum, then Heads will fail reencrypting it simply because neither Linux kernel nor cryptsetup 2.4.0 supports it. This impacts other architectures then x86/power and X86 i3 older generation CPU and was also discussed in other issues.

Tldr: cryptsetup-reencryption under heads does not replace OS choices nor user selected settings as of today. Heads attempts to replace in place slot0 (directio, sector of 64 bytes) to mitigate OEM deployment bad practices and/or limit its liability to the moment user receives a computing device and decides to stop trusting the OEM/third party deployed keys from the moment of re-ownership. When a user installs an OS with LUKS settings that might not be supported by current version of cryptsetup deployed under Heads, cryptsetup-reencrypt calls will bail/fail. Maybe currently with not enough helpful information though.

Todo: there is still one issue you tackled under another issue under nitrokey repo, that is, those luks functions should not create any file under /boot. This is still a bug because not all boards permit TPM sealed Disk Unlock Key. This will need refactoring and deserve a seperate issue.
The codebase deals with that "encrypted disk presence file" as if TPM DUK was enforced, which is not true in all cases.

Note: new LUKS functions have been deployed recently to create LUKS container on USB thumb drive to store gpg key material backup, enabling+forcing authentication for recovery shell and USB boot. Those functions hardcode the best practices borrowed from qubesos LUKS container creation of their installer media.
As of today, this is at https://github.com/linuxboot/heads/blob/master/initrd/etc/luks-functions#L253:

DO_WITH_DEBUG cryptsetup --batch-mode -c aes-xts-plain64 -h sha256 -s 512 -y luksFormat ${DEVICE}1 --key-file <(echo -n "${PASSPHRASE}") > /dev/null 2>&1 || die "blah blah" 

As pointed by cryptsetup documentation, that LUKS container would not be really secure if the passphrase is left to the OEM factory defaults, which reuses GPG Admin PIN which by default is 12345678.
Therefore the same issues are present here then with a "PleaseChangeMe" original passphrase. The OEM factory Reset tries to give power back to the user, but to do so, the user still needs to understand that accepting defaults might not respond to his threat model and therefore, he might want to invest into going the advanced way, which now surveys him into doing the right thing and setup individual passphrases for each secrets.... Hopefully.

Edit:many.

@UndeadDevel
Copy link
Contributor Author

UndeadDevel commented Nov 25, 2023

@tlaurion Thanks for the detailed reply!

Now, the reencrypting functions we are talking about have limited arguments forced by Heads today.
This choices were made based on a tradeoff back then where assumptions made by cryptsetup were wrong and where ETA were otherwise exploding.

Regarding that line: I don't see a -B option in the man pages...if this is block size then that's not applicable for LUKS2 volumes as it was a LUKS1 option. Otherwise it seems that Heads lets the (old) cryptsetup version choose the parameters, which explains why it defaults to argon2i; note that the current version and all prior versions for the last 2 years use argon2id as the default (see, e.g. here). I'll also quote that RFC again in their final recommendations:

The Argon2id variant with t=1 and 2 GiB memory is the FIRST RECOMMENDED option and is suggested as a default setting for all environments. This setting is secure against side-channel attacks and maximizes adversarial costs on dedicated brute-force hardware. The Argon2id variant with t=3 and 64 MiB memory is the SECOND RECOMMENDED option and is suggested as a default setting for memory-constrained environments.

"t=1" refers to iterations, which cryptsetup binds to 4 as minimum by default, which is even better and exceeds the recommendations; not that it's overkill since, as I've stated, the specific RFC recommendation for disk encryption (different from above general recommendation) is 6 GB memory cost and increasing iterations does to some degree allow compensating for lower memory (though if possible, the memory recommendation should be fulfilled before increasing iterations, which the RFC also states elsewhere).

So to have argon2id used by Heads either the cryptsetup version needs to be bumped to one from the last 2 years or the pbkdf needs to be supplied with an additional parameter (--pbkdf=argon2id). A version bump also makes sense due to CVE-2021-4122 (see cryptsetup FAQ point 5.23), which is fixed in cryptsetup 2.4.3 (Edit: though I think a Heads user might actually at least notice that an attack is being attempted, since IIUC the LUKS header would have to be modified by the attacker and Heads measures the header and thus would warn the user about modifications, right?); that version also makes argon2id the default I think.

As you pointed out with 5.19 documentation of SSD drives from cryptsetup, there is not much more that can be done but overwrite it and leave physical forensic possible with soldering implied.

Honestly I think that this is still not a good approach at all...it would be much better if at least some random, higher entropy password (e.g. 7 words from word list or ~20 random characters) was generated for each user separately by OEM and delivered by mail (physical with notebook / PC or email), but that's a problem for OEM, not Heads itself, per se.
I agree that the feature to have a TPM disk unlock key is a great addition.

Heads doesn't interfere with OS distribution choices as of today.
[...]
cryptsetup-reencryption under heads does not replace OS choices nor user selected settings as of today

Maybe I'm misunderstanding what you're saying, but Heads does use weaker settings than what the user has set in the OS in some cases; e.g. I had put a new key into slot 0 from dom0 with argon2id and 2 GB memory cost and then rebooted and done Heads LUKS re-encryption. Here Heads created a new key for slot 0 with argon2i (much weaker) and 1 GB instead of 2 GB (weaker) instead of reusing the parameters of the existing container; if cryptsetup were bumped to a more current version (>=2.4.3) then it would have chosen argon2id, at least.
I do agree that if using a high entropy passphrase, either with TPM DUK or simply a good disk recovery passphrase, then the existing config should be good enough. However, I think the recommendations should nevertheless be approached, if possible (i.e. use argon2id and higher memory cost).

Todo: there is still one issue you tackled under another issue under nitrokey repo, that is, those luks functions should not create any file under /boot. This is still a bug because not all boards permit TPM sealed Disk Unlock Key. This will need refactoring and deserve a seperate issue.
The codebase deals with that "encrypted disk presence file" as if TPM DUK was enforced, which is not true in all cases.

This is what #1536 is about; I simply described it from my user experience; I didn't check if it was due to any files created, though that sounds plausible.

Note: new LUKS functions have been deployed recently to create LUKS container on USB thumb drive to store gpg key material backup, enabling+forcing authentication for recovery shell and USB boot.

Awesome feature btw., I'm looking forward to using it!

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 26, 2023

@UndeadDevel Not time for a PR yet, but testing things under testing branch https://github.com/tlaurion/heads/tree/cryptsetup_version_bump-reencryption_cleanup and trying to document in commits done there.

Note that argonid is not provided by external argon2 library but internal included one (otherwise heavier footprint) and where crypto backend is kernel, not libgrypt (faster hw acceleration provided, lower rando compared to libgcrypt :/) but where urandom is now fed by intel hw rdrand as per #1478

Look at modules/cryptsetup2


Commit: 53aab6f

With -B 64 removed with 2.4.3 reencrypted:
signal-2023-11-26-133020

Slower reencryption speed: stabilizes at 110 mb/s and ETA is misleading.


With master https://app.circleci.com/pipelines/github/linuxboot/heads/700/workflows/cd7734e5-5b90-47be-a53b-98bcf1eeed6a/jobs/13596/artifacts rom

Commit:

cryptsetup2_version := 2.3.3

cryptsetup 2.3.3, -B 64:
Faster reencryption speed, stable at around 138mb/s and ETA not misleading.
signal-2023-11-26-140106

luksDump:
signal-2023-11-26-140706

@tlaurion
Copy link
Collaborator

@UndeadDevel so confirmed that on master, reencrypting passes to argon2i
signal-2023-11-26-140955

@tlaurion
Copy link
Collaborator

Houla @UndeadDevel

Speeds are way slower under cryptsetup 2.6.1, maybe having to do with new defaults and/or directio
signal-2023-11-26-151628

This is based on tlaurion@61636a5

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 26, 2023

Same result removing --direct-io on Samsung 860 1TB drive I had lying around with tlaurion@9877d46

signal-2023-11-26-151628

Note that speed is slowly increasing, now at 67 MiB/s, but at that speed... But I highly doubt to reach speeds that are supposed to be around 200 MiB/s on those SSD drives normally with master and with TPM DUK and a safe 7 words diceware passphrase.

Reencrypting, at this speed, becomes an anti-feature for re-ownership.

@UndeadDevel Why reencrypt OEM install if it takes the same time to reinstall fresh?

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 26, 2023

This issue is tricky.

Let me explain.

  • Hardware limitations should impact ciphers decision. Eg: gen3 i3 don't support aes acceleration by CPU and Adendium should be used instead.
  • OS should do assessment of what hardware support, align LUKS container and sector size as discussed on QubesOS forum https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment/10189
    • This is complicated.
  • So in the reencryption here, since alignment are corrected if they were invalid, reencryption might take longer and create disapointment. But there is not much we can do here then force proper settings?
    • Should Heads support Adendium, assess with benchmark first and then take decision for the user and maybe break support of the final OS being able to decrypt LUKS from its initrd?

@UndeadDevel those are the thoughts i'm having letting the reencryption of the two devices complete and attempt to reencrypt the first one without --direct-io, and the second one with direct-io. Otherwise, we need to acknowledge that newer versions of cryptsetup are taking radical steps enforcing better defaults that should be applied at OS installation (Newer OSes will follow) but as of now, second laptop (1TB drive) was a test install of Qubes 4.2.

If Qubes 4.2 has not revised their installation defaults/sector alignments/disk geometry alignements and Heads is having more recent version then OS deploys, we might be better then for our own goods :/

  • Qubes OS 4.1: cryptsetup 2.3.5
  • Qubes OS 4.2: 2.6.1
  • Other OSes?
    So. What are we trying to fill as user needs here?

So not sure what to do here but going back to the old thread I opened on QubesOS and try to talk with the experts.
Maybe you can jump into the discussion there as well?

Passed my sunday trying to figure out what to do here.
As always, you can click the sponsor button/give through Insurgo Initiative.
That's all I got on free community time investment.

Will check results on both drives, test with cryptsetup status and cryptsetup benchmark but as of now, I do not have answers to provide other then cryptsetup 2.6.1 PoC permitting to have testers join the effort.

Otherwise, PR welcome!

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 26, 2023

So reencrypting Q4.1 installation resulted into a LUKS container that was still argon2i. I guess that is good. We should not interfere with OS initrd support. This is why reinstallation is needed for non-experienced users once in a while, unfortunately (Q4.0 differs from 4.1 which differs from 4.2 defaults on so many disk layout/pool specs/encryption/trimming behaviors) which affect security/performance/resilience.

But that does not explain speed of Q4.2 reencryption speeds. Stopped reencryption. Will reinstall Q4.2 over MX500 drive I have extended comparative info from the past and stick to that for testing.

I guess --direct-io is still needed. Will re-add that.

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 26, 2023

@UndeadDevel
Copy link
Contributor Author

UndeadDevel commented Nov 26, 2023

Regarding the command options: all I know is that I can't find the -B parameter in the man pages, so I'm just wondering what it even does; it looks like it might be about block size, but the man page states that this is only for LUKS1. As for --use-directio this is also a LUKS1 option only, so it will be ignored when dealing with a LUKS2 volume (which is what happens if creating a new LUKS container). Therefore leaving out those two parameters is not expected to change anything when dealing with LUKS2 volumes, as they will simply be ignored (assuming that man pages are correct). No OEM should ship a LUKS1 volume these days.

Regarding (re-)encryption speed: I don't know why it's that much slower with the new version in your tests, but having the option to re-encrypt still makes sense even if slow, e.g. when receiving the device from OEM and not having a good, secure system to create the installation media on: it would be preferable to have a known good install in that case. Furthermore, the main use case for re-encryption seems to be after receiving the machine from OEM; it's not necessary AFAIU to do this for every factory reset, so I don't think it's a huge issue if it is slower; at least for me personally I much prefer it to be slower than less secure. Bumping the cryptsetup version also has other benefits as we've already discussed here.

Also I'm a bit puzzled, as on my SSD (Samsung 980 Pro 1 TB) it took less than 15 minutes to re-encrypt (with Nitrokey Heads 2.1)...presumably the LUKS container grows as I put more data on it due to the thin provisioning of next layer (LVM)? Even in that case, speed for me would have been > 200 MB/s (I didn't pay attention to it then and I can't re-test right now; late next week I can test re-encryption again).
I've found this regarding measures in the Linux kernel, which are supposed to speed up LUKS encryption and are apparently included since 5.10.9.

Lastly, a note on the warning message informing the user of the re-encryption starting (nitpick):

  1. Since it's a warning it should actually warn the user of something, e.g. not to do a hard reboot; if it's not a warning, then it shouldn't state "Warning". This becomes relevant with:
  2. "current passphrase" is a slight semantic miss (it's possible to take that to mean the old passphrase, especially when combined with the "Warning" statement, as the process engaged in by the user involves changing the passphrase, so proper, non-ambiguous terminology would be, e.g., "new" and "old"); also, the object is incorrectly phrased: "Recovery Disk Key passphrase" suggests that there is a "Recovery Disk", which requires a key and the passphrase is to protect that key.

IMHO the proper phrasing would be "new Disk Recovery Key passphrase" instead of "current Recovery Disk Key passphrase".

@tlaurion
Copy link
Collaborator

@UndeadDevel similar comments from https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment/10189/55

As per latest tests:

cryptsetup reencrypt --force-offline-reencrypt --disable-locks "$LUKS" --debug --key-slot 0 --key-file /tmp/luks_current_Disk_Recovery_Key_passphrase

Results in reencryption still in online mode of 31 MiB/s. Will open issue on cryptsetup gitlab soon with proper output log. This is annoying.

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 27, 2023

I've found this regarding measures in the Linux kernel, which are supposed to speed up LUKS encryption and are apparently included since 5.10.9.

Lastly, a note on the warning message informing the user of the re-encryption starting (nitpick):

1. Since it's a warning it should actually warn the user of something, e.g. not to do a hard reboot; if it's not a warning, then it shouldn't state "Warning". This becomes relevant with:

2. "current passphrase" is a slight semantic miss (it's possible to take that to mean the old passphrase, especially when combined with the "Warning" statement, as the process engaged in by the user involves changing the passphrase, so proper, non-ambiguous terminology would be, e.g., "new" and "old"); also, the object is incorrectly phrased: "Recovery Disk Key passphrase" suggests that there is a "Recovery Disk", which requires a key and the passphrase is to protect that key.

IMHO the proper phrasing would be "new Disk Recovery Key passphrase" instead of "current Recovery Disk Key passphrase".

I will look up on those in next steps. Thanks!

@tlaurion
Copy link
Collaborator

I've found this regarding measures in the Linux kernel, which are supposed to speed up LUKS encryption and are apparently included since 5.10.9

Will dig that down and might explain things since kernel on tested laptop (as for most others outside of librems and nv41/ns50) are based on 5.10.5

@tlaurion
Copy link
Collaborator

@UndeadDevel continuing attempts under https://github.com/tlaurion/heads/tree/cryptsetup_version_bump-reencryption_cleanup where tlaurion@755c392 version bump kernel from 5.10.5 -> 5.10.201 with board config changes and config changes.

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 27, 2023

As for -B or --block-size=MiB, or -b or --size=SECTORS, those are still present under cryptsetup 2.6.1. -data-unit-size replaces -B. Seems like logical size needs to be replaces by physical size.

Recommendations are to check smatctl -i output which goes striahgt back about how to deal with the problem as reported to qubes forum

[user@dom0 ~]$ sudo smartctl -i /dev/sda

smartctl 7.2 2021-01-17 r5171 [x86_64-linux-6.1.43-1.qubes.fc32.x86_64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT500MX500SSD1
Serial Number:    REPLACED_STRING
LU WWN Device Id: REPLACED_STRING
Firmware Version: M3CR023
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Nov 27 00:26:09 2023 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

So we have physical sector size of 4096 where logical is 512 and where 2.6.1 is by default using 512 again.

I'm getting lost and tired of this rabbit hole. Will regroup and try to figure out how to question this correctly to cryptsetup devels. Seems like we have to override detection here otherwise doing the wrong thing.

@UndeadDevel
Copy link
Contributor Author

That command doesn't even show me sector sizes on my nvme, but sudo fdisk -l (from dom0) says:

Disk model: Samsung SSD 980 PRO 1TB                 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos

Might help explain why I've had much better performance...

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 27, 2023

Regarding the command options: all I know is that I can't find the -B parameter in the man pages, so I'm just wondering what it even does; it looks like it might be about block size, but the man page states that this is only for LUKS1. As for --use-directio this is also a LUKS1 option only, so it will be ignored when dealing with a LUKS2 volume (which is what happens if creating a new LUKS container). Therefore leaving out those two parameters is not expected to change anything when dealing with LUKS2 volumes, as they will simply be ignored (assuming that man pages are correct). No OEM should ship a LUKS1 volume these days.

Absolutely right there. The only parameters relevant to 2.6.1 are --sector-size=bytes, where 2.7.0 will reintegrate equivalent of -B --block-size=MiB through --data-unit-size=bytes. --use-directio also only applies to luks1.

When looking at debug output and disk activity, chunks rewritten to disk are way slower to write to disk than before. This makes me think that something is wrong with configured options. Some internals changed, including sse aes support which is now optional. cryptsetup guidelines say that cpu should not impact much since normally reencrypting is tugging on IO more then CPU, but this is clearly not the case here, at least on 3rd gen cpu like for the x230.

Enabling all kernel options for benchmarking cryptsetup as well as adding sse acceleration for aes on argon internal library. Note once again that cryptsetup is compiled to use kernel crypto backend, that internal argon library is used instead of external one (for firmware size footprint) and that libgcrypt is considered slower then the kernel if cpu acceleration is enabled. Also note that cryptsetup uses /dev/urandom here (unless /dev/random specified) where again hw crytpo backed acceleration is activated in kernel config for a while now.

Testing new install of Q4.2rc5 and redoing with latest roms with configure option and all interesting crypto in kernel.

Continuing https://github.com/tlaurion/heads/tree/cryptsetup_version_bump-reencryption_cleanup with
tlaurion@84b7f99
tlaurion@9e231f5

Regarding (re-)encryption speed: I don't know why it's that much slower with the new version in your tests, but having the option to re-encrypt still makes sense even if slow, e.g. when receiving the device from OEM and not having a good, secure system to create the installation media on: it would be preferable to have a known good install in that case. Furthermore, the main use case for re-encryption seems to be after receiving the machine from OEM; it's not necessary AFAIU to do this for every factory reset, so I don't think it's a huge issue if it is slower; at least for me personally I much prefer it to be slower than less secure. Bumping the cryptsetup version also has other benefits as we've already discussed here.

Agreed but as said.... 26 MiB/s is unacceptable still. That is 2.5 hours to reencrypt disk. Like I said, reinstalling would take less time and defeats the purpose. The tolerance point when this feature was created was that the disk should be reencrypted under one hour. It was actually reencrypted on chosen SSD under 30 minutes before for same disk, same cpu same memory.

Also I'm a bit puzzled, as on my SSD (Samsung 980 Pro 1 TB) it took less than 15 minutes to re-encrypt (with Nitrokey Heads 2.1)...presumably the LUKS container grows as I put more data on it due to the thin provisioning of next layer (LVM)? Even in that case, speed for me would have been > 200 MB/s (I didn't pay attention to it then and I can't re-test right now; late next week I can test re-encryption again). I've found this regarding measures in the Linux kernel, which are supposed to speed up LUKS encryption and are apparently included since 5.10.9.

nvme and higher end CPU hides the bottlenecks on newer hardware. Unfortunately that won't fly for most users of Heads.

Lastly, a note on the warning message informing the user of the re-encryption starting (nitpick):

1. Since it's a warning it should actually warn the user of something, e.g. not to do a hard reboot; if it's not a warning, then it shouldn't state "Warning". This becomes relevant with:

2. "current passphrase" is a slight semantic miss (it's possible to take that to mean the old passphrase, especially when combined with the "Warning" statement, as the process engaged in by the user involves changing the passphrase, so proper, non-ambiguous terminology would be, e.g., "new" and "old"); also, the object is incorrectly phrased: "Recovery Disk Key passphrase" suggests that there is a "Recovery Disk", which requires a key and the passphrase is to protect that key.

IMHO the proper phrasing would be "new Disk Recovery Key passphrase" instead of "current Recovery Disk Key passphrase".

I guess you are talking about the wording of the Re-ownership wizard which matches vocabulary outlined here https://osresearch.net/Keys/#luks-disk-encryption-key
signal-2023-11-26-151628

The end user is told the context here (passphrase that was setuped at OS installation).

You can raise another issue on that one if more unifying/simplification is needed, where PR would be more appreciated.

@UndeadDevel
Copy link
Contributor Author

UndeadDevel commented Nov 27, 2023

nvme and higher end CPU hides the bottlenecks on newer hardware. Unfortunately that won't fly for most users of Heads.

Well...as I see it if you don't trust modern hardware at all then your system will soon (next 5-10 years) be unusable anyway; furthermore, if security is the reason for staying with old hardware, which already entails plenty of sacrifices, wouldn't a 2 hour wait be acceptable if it means more security? As I already pointed out, the problem with re-install is that for those who are buying their first secure system it will be difficult to create a trusted USB drive as installation medium, so running re-encrypt from Heads still makes sense here, even if installing were faster; furthermore, re-encryption is not something that normally has to be done very often; in fact, with good opsec practices the main point would be after receiving the device from OEM and setting up the secrets (re-ownership); on further factory resets it can likely be skipped.

I can't re-test under Heads right now, but I ran a cryptsetup reencrypt with version 2.6.1 in a domU for testing purposes with following commands:
dd if=/dev/urandom of=myvol.luks.dat bs=1M count=50k status=progress which started out at ~250 MB/s and climbed to ~370 MB/s
sudo cryptsetup luksFormat myvol.luks.dat
sudo cryptsetup reencrypt myvol.luks.dat which started out at ~450 MB/s and climbed to ~540 MB/s
Due to the version the default settings used argon2id with parallelity 4 and time cost 5 as well as 1 GB memory cost despite me running this in a relatively constrained VM (8 cores, 4 GB RAM):

Data segments:
  0: crypt
	offset: 16777216 [bytes]
	length: (whole device)
	cipher: aes-xts-plain64
	sector: 4096 [bytes]

Keyslots:
  1: luks2
	Key:        512 bits
	Priority:   normal
	Cipher:     aes-xts-plain64
	Cipher key: 512 bits
	PBKDF:      argon2id
	Time cost:  5
	Memory:     1048576
	Threads:    4

Curiously, it reports sector: 4096 [bytes], even though that VM sees 512 bytes physical and logical with sudo fdisk -l just as dom0, but maybe I just don't understand what the significance is. In my overall LUKS container (containing all of QubesOS) it shows a sector size of 512 under that parameter with luksDump.

One might say that this doesn't transfer to Heads as it runs on bare metal, but Heads should really be even faster due to more resources available and no virtualization layers. As stated, a full re-encryption of my ~930 GB LUKS container under Heads took less than 15 mins (albeit with the old cryptsetup version).

You can raise another issue on that one if more unifying/simplification is needed, where PR would be more appreciated.

I can try to do a PR, but I'm not a professional developer and so far I've only contributed to lower complexity projects, so keep that in mind. I'll try to get to it this week.

@tlaurion
Copy link
Collaborator

@UndeadDevel Attempts to reencryption speed and discussions on that specific issue under #1541

@tlaurion
Copy link
Collaborator

tlaurion commented Nov 29, 2023

Fell on cloudflare/linux#6 pointing to torvalds/linux@39d42fa last night. Resulting testing works way faster and stabler.

Idea there that got merged in Linux kernel is to bypass kernel read and write queues.

This best practice needs to be applied to make sure we are applying fastest read/write paths without additional delay and tackle IO limit of bus and SSD only: not testing memory/memcpy/kernel buffers/bus/SSD limits exposed on older platforms. All will benefit.

@tlaurion
Copy link
Collaborator

tlaurion commented Apr 7, 2024

It was decided that Heads should not interfere with OS choices, where pushing for better defaults should unfortunately be done upstream in the OS installer hardcoded options.

If for whatever reason, you believe Heads should force better encryption options down at Re-Ownership reencryption time, please reopen.

Note though that some versions of old hardware don't support hardware accelerated encryption and this is why this choice is not currently enforced. One such example is the x230i. Therefore, detection of hardware specs and hw acceleration features should be done under OSes. At some point I feel I have to draw a line into maintaining things that might break down not properly thought impacts of such improvement requests. I think OSes are doing a proper job at following best practices generally in that matter, where Heads should focus on permitting re-ownership flawlessly, and that enough is becoming complicated.

@UndeadDevel
Copy link
Contributor Author

UndeadDevel commented Apr 9, 2024

@tlaurion The QubesOS 4.2 installer does result in the PBKDF argon2id to be used, however; the problem is that Heads, when doing a container re-encryption, re-encrypts with argon2i due to using an outdated cryptsetup version, though I see that you've added commits to the PR, which I haven't reviewed yet (see my comment there).

I understand that Heads maintainers won't necessarily have the resources to be better than OSs like QubesOS (where I've contributed this comment to argue for improved defaults), but I do think that it's a necessity for Heads to provide a more current cryptsetup version to not use the outdated defaults of an older version (which, again, may be fixed in the PR I haven't reviewed yet).


It might also make sense to provide the user with the opportunity to choose the memory cost and iteration time; i.e. after the user chooses re-encryption in the whiptail menu or during factory reset, do a read for memory cost (but force a minimum of 1 GB; this would be fed into the cryptsetup parameter --pbkdf-memory in kilobytes) as well as a read for the iteration time (force a minimum of 2 seconds; this would be fed into the cryptsetup parameter -i in milliseconds). That way the user can increase the encryption strength if desired, which would also somewhat increase Heads encryption robustness in case cryptsetup bumps the defaults for those parameters in a future version, where Heads would be under less pressure to upgrade in that case. If the user doesn't input anything then the parameters shouldn't be used, i.e. call cryptsetup without those parameters so it uses internal cryptsetup defaults in that case (better for maintainability).

One big advantage of this is that cryptsetup, according to the man-page of 2.6.1, by default caps the memory cost at 1GB and may even choose a lower value depending on its benchmark result. Meanwhile the IRTF recommendation is 6 GB, which is above even the current possible (for cryptsetup) maximum of 4 GB; cryptsetup only allows using 4 GB if the parameter --pbkdf-memory is used and the benchmark "agrees" (this can be "helped" by choosing a higher -i parameter on systems with slower hardware).
(Probably last edit:) On reading the changelog for 2.7.0 again, however, I noticed that 2.7.0 now doesn't do a sanity check on those params any more (relevant for --pbkdf-memory), so probably the Heads code then should do that just in case (or at least warn the user); 2.7.0 also widens the safety margin to protect against OOM when generating parameters via benchmark, i.e. now at most half of available free memory will be used in that case; this could be "copied" by the Heads code. 2.7.1 (the latest version) doesn't add anything relevant here.

If you agree with this I could see about providing a PR to implement it. This, together with a cryptsetup version bump in Heads, and considering that point 3 of the OP was implemented already in #1542, would indeed fix this issue in a sort of second alternative solution.

@tlaurion
Copy link
Collaborator

tlaurion commented Apr 9, 2024

Thanks @UndeadDevel .

Version bump to 2.6.1 occurs in PR as of now.
As for changing settings in Metadata with --persistent, I'm still unsure (otherwise those are applied ulonky to heads, not in booted os). Same for applied queuing bypass as of now : that's something we want to be applied here but not necessarily on booted OS, where gain things might change and not being desired by users. Again, past bug and cve under nix for online reencryption etc is not really an enforced thing where os behavior might diverge from expected one and bug reports would become messy upstream. Im not sure, at all, that we want to change behavior globally outside of intended use cases: change luks key and reencrypt disk to remove oem liability of being able to disclose luks header backup for lifetime usage of shipped laptop with qubesos pre-installed.

Will also not bump to 2.7.1 as of now, too edgy and not used by mostly any distro. I tend to not want to have Heads users as beta testers for upstream. I'm also not convinced pushing for SSD enforced live encryption and would delegate reports to people more qualified, or at least until it becomes mainstream and deployed per OS installers.

@UndeadDevel
Copy link
Contributor Author

UndeadDevel commented Apr 18, 2024

Okay great, @tlaurion, so that (cryptsetup version bump) fixes the most pressing issue!

I totally agree about not bumping up to 2.7.1.

Regarding --persistent: I don't think any metadata stored with --persistent would be an issue as that affects how the volume is mounted regarding configuration normally stored in crypttab. Right now the two slots will almost certainly have different memory and time costs anyway and there's no issue; they can even have different PBKDFs (I tested this).

I'll post your relevant comment portion from the PR thread here to discuss further the encryption parameters:

@UndeadDevel the code under kexec-seal-key is adding a luks key. Therefore, we could enforce better defaults there, if desired. Let's see what is being done by default from cryptetup 2.6.1 here. As stated before, I would prefer staying away and relaying on what cryptsetup enforces as the best defaults there. When comes to DUK, lets remember that the DUK "passphrase" here, more a key itself really, is a 128 chars randomly generated (with now HW backed randomization if available) and therefore I strill believe this to be irrelevant since that keyslot cannot even be filled with weak "passphrase". Nobody will type 128 chars here and no collision is theoritically possible, even?

The thing is, the official recommendations of the people who created Argon2 as well as the IRTF recommendations differ from the personal opinion of the lead cryptsetup maintainer:
According to the former, first parallelity should be increased as much as possible, then memory and only then iterations, all the while keeping the time cost acceptable to the user.
What cryptsetup actually does, however, is:

  1. use as many threads as cores, up to 4 (this is okay, but not great...CPUs with more than 4 cores are becoming more and more common, so hopefully a future cryptsetup version allows more than 4)
  2. memory: here cryptsetup will behave differently depending on whether --pbkdf-memory was given as a parameter:
    • if yes, then use up to the provided value as long as at least 4 iterations are done and the time cost (iteration time) is not exceeded; the maximum possible memory cost is 4 GB (not great at all, as the Argon2 authors recommended 6 GB already 7 years ago for disk encryption use cases (!), but changing this requires a substantial rewrite of certain cryptsetup code IIUC)
    • if no, then use up to 1 GB only, again as long as at least 4 iterations are done and the time cost (iteration time) is not exceeded

So cryptsetup deviates from the recommendations of the people who probably know better by prioritizing iterations over memory, which should not be done for Argon2, especially argon2id and argon2d (even for argon2i it's not recommended, but less bad; overall this doesn't mean we should switch to argon2i, however, as argon2id is still more robust).

Also, the hash function SHA512 is recommended by yet other experts for better quantum resistance for aes-xts-plain64 ciphers (used in LUKS2). I'm not sure that cryptsetup even supports that (Edit: yes, it does), but it's certainly not the default (which is SHA256).

So really the problem is with the cryptsetup people, who stubbornly oppose the official recommendations. Ideally they would wise up and change their defaults + implement the possibility of more than 4 threads and more than 4 GB (+SHA512 as the default hash function for aes-xts-plain64).

I can understand if you want to say "upstream issue, not going to touch it", but IMHO it would be nice if the IMO most severe limitation of current cryptsetup defaults (memory cost capped at 1 GB in default benchmark mode) would be (partially) mitigated by allowing the user to choose the memory cost, which we would feed into the --pbkdf-memory parameter to at least "unlock" the current 4 GB maximum for the memory cost.

And yes, I agree that this isn't really an issue for the TPM DUK (here the question is rather how securely the passphrase is stored in the TPM), but it is an issue for the LUKS DRK, which Heads also changes during OEM Factory Reset. The user will not be remembering 256 random characters for the LUKS DRK, after all.

(Hopefully last edit:) Okay I did some more experiments on file-based LUKS containers with cryptsetup 2.6.1 in a large Qubes Standalone VM with 10 GB of RAM and the whole CPU (12th gen i7); the following command will produce what I would call the currently best possible (with cryptsetup 2.6.1 and that, or better, hardware/vmware setup) settings:
cryptsetup reencrypt --hash sha512 --pbkdf-memory 4000000 -i 5000 luks_container, with the following luksDump (relevant parts):

Keyslots:
  0: luks2
	Key:        512 bits
	Priority:   normal
	Cipher:     aes-xts-plain64
	Cipher key: 512 bits
	PBKDF:      argon2id
	Time cost:  4
	Memory:     4000000
	Threads:    4
	AF stripes: 4000
	AF hash:    sha512

and as you see from the command only 5 seconds unlock time, which is acceptable UX-wise. Of course on older hardware this will not be possible, unless the user accepts a lot more than 5 seconds or lowers the memory cost (cryptsetup does not allow going lower than 4 iterations sadly, even though that would be better for argon2id than lowering the memory cost).
The point is, if we include the possibility for the user to choose the memory cost and iteration time, then users can make these tradeoffs according to their threat model and, most importantly, (partially) mitigate the current issue with cryptsetup defaults and algorithm.

@tlaurion
Copy link
Collaborator

tlaurion commented Aug 17, 2024

@UndeadDevel Everythig needed under #1541 ?
If so we can close this. I will make #1541 close this when merged for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants