Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSM 3: PKCS#11 walking skeleton #727

Merged
merged 153 commits into from
Nov 23, 2021
Merged

Conversation

ximon18
Copy link
Member

@ximon18 ximon18 commented Nov 15, 2021

This PR builds on PR #688 "Use new de-mut'd rpki-rs Signer trait" and implements a working PKCS#11 signer including a GH Actions job to run the Krill test suite using the new signer against an installed instance of SoftHSMv2.

Known issues

  • A general lack of tests.
  • Comments in code need review/removing/adding/updating.
  • This PR still doesn't introduce configuration file based selection & configuration of signers, so it uses hard-coded feature gated creation and configuration of of signers instead.
  • Warnings from cargo build about unused code, These are due to the current hard-coded signer setup.

Readme first

I updated the doc/development/hsm/architecture.md file, I suggest reading it first. There's still lots more I could probably mention and explain and maybe some of it should be removed or moved to some sort of appendix.

A note about Cargo.toml and the pkcs11 crate

I hit a strange problem where in release builds the pkcs11 crate seems to end up with a key NULL ptr being optimized out such that SoftHSMv2 complains with CKR_ARGUMENTS_BAD and in the logs says that pReserved must be NULL. In debug builds this doesn't happen. So in Cargo.toml there's currently a workaround for this that disables compiler optimizations only for the pkcs11 crate in release mode. That seems to have "solved" the problem for now but this should be raised with the pkcs11 crate owners, there are no issues relating to this in their GitHub project that I can see.

I have raised mheese/rust-pkcs11#49 for this.

Changes that are quick to review

First, let's mention changes that are separate and easy to review and will knock down dramatically the count of files left to review:

  • .github/workflows/ci.yml: An updated GitHub Actions CI workflow that adds testing Krill in PKCS#11 mode against SoftHSMv2 installed in the runner machine.

  • Cargo.lock and Cargo.toml: Additional crate dependencies and Cargo features relating to the new PKCS#11 code.

  • src/commons/crypto/cert.rs, src/commons/crypto/cms.rs, src/commons/crypto/signing/dispatch/krillsigner.rs, src/daemon/ca/certauth.rs, src/daemon/krillserver.rs, src/pubd/manager.rs, src/upgrades/mod.rs, src/upgrades/v0_9_0, /ca_objects_migration.rs: Modified calls to KrillSigner::build() to pass an additional alternate_config boolean flag. This should be false in most cases, it is for use by a subset of the tests that create a second instance of Krill or KrillSigner within the same process and then try and use PKCS#11 signer with SoftHSMv2, as SoftHSMv2 doesn't support multiple users or concurrent login by the same user so when true the second KrillSigner instance (e.g. for "Alice") will use OpenSSL instead of PKCS#11 based signing. The need for this flag will disappear when signers can be configured as tests can then configure the signing setup as needed without this hard-coded workaround.

  • tests/migrate_repository.rs: The migrate_repository test has been disabled when using PKCS#11 testing mode as it fails, I believe relating to the use of SoftHSMv2 as mentioned above, but in this case I have not yet narrowed down the exact failure.

  • src/commons/crypto/signing/signers/error.rs: Multiple files including this one were touched by a SignerError variant rename to make it immediately clear whether this issue is worth retrying or not, something that was confusing me on a regular basis. The variants were also sorted into alphabetical order.

    • SignerError::SignerUnavailable is renamed to SignerError::TemporarilyUnavilable.
    • SignerError::SignerUnusable is renamed to SignerError::PermanentlyUnusable
  • src/commons/crypto/signing/dispatch/signerprovider.rs: The enum based dispatch has been extended by copy-pasting a line in each match block and editing it to refer to the new SignerProvider::Pkcs11 variant. I also then extended it again to support a SignerProvider::Mock variant for testing purposes.

  • src/commons/crypto/signing/signers/kmip/connpool.rs: An Arc was introduced around the ConnectionSettings as it fitted better into the code I wrote for the PKCS#11 signer and to support, maybe, referencing part of the Krill configuration PR directly in a later PR rather than cloning what could be a lot of configuration data.

  • src/commons/crypto/signing/signers/kmip/signer.rs: The data argument of fn rand() was renamed to target to better reflect what it is for and to be more consistent with existing code elsewhere.

  • src/commons/crypto/signing/signers/mod.rs: pub mod XXX for the new pkcs11 and probe modules (probe is a factoring out of code from the KMIP signer to be used also by the PKCS11 signer).

  • src/commons/crypto/signing/signers/pkcs11/mod.rs: pub mod XXX for the new pkcs11/xxx.rs modules.

  • src/commons/crypto/signing/dispatch/signerrouter.rs: Extended to create a PKCS#11 signer if the hsm-tests-pkcs11 feature flag is enabled. Added some initial happy flow tests based around the new mock signer.

  • src/commons/crypto/signing/signers/softsigner.rs: Very minor changes including re-ordering of some functions for consistency across signer implementations, and prefixing of some error messages with "OpenSSL".

Intermediate changes

  • src/commons/crypto/signing/dispatch/signerinfo.rs: Removed an unused function and updated it to generate the signer handle as a UUID and to store the signer backend identity details (public key and private key internal id) more explicitly, not combined and encoded together as a signer handle.

  • src/commons/crypto/signing/signers/kmip/internal.rs:

    • Server probing functionality has been extracted into a new probe.rs module. The new probe_server() function contains the KMIP specific bits that remain from before.
    • A stupid inverted flag mistake was fixed (see IGNORE_MISSING_CAPABILITIES).
    • Some function ordering was changed to make comparison between signer modules easier.
    • The supports_rng_retrieve flag was renamed to a non-KMIP-specific name supports_random_number_generation term so that the same naming can be used in the PKCS#11 signer as well for consistency.
    • The .pool.get() call was abstracted into .get_connection() so that the same pattern could be used in the PKCS#11 signer (for which we do no need to obtain a connection but not from a pool).
    • Some minor changes made along the way (added comments, prefixing some error messages with "KMIP" but I think error messages will need to be reviewed and tweaked again later so I'd ignore this for now).

New modules:

  • src/commons/crypto/signing/signers/mocksigner.rs:

    Initial version of a mock signer. Should perhaps be updated to use hard-coded values instead of actually doing key generation and signing using OpenSSL.

  • src/commons/crypto/signing/signers/probe.rs:

    Functionality extracted from the KMIP signer for use also by the PKCS#11 signer. I added some tests as well while I was refactoring.

    Explanation: In the KMIP signer while not yet initialized we try periodically (but not on every operation) to connect to the signer backend and to then verify that the KMIP server meets our requirements. We need to do the same for the PKCS#11 signer as well as in the case of something like a PKCS#11 library communicating with a remote AWS CloudHSM server it may fail initially if the server is not yet reachable and so we will have to keep trying. I verified this support with the YubiHSM USB key that I have because if the background yubihsm-connector daemon isn't running then the library can be loaded but can't connect with the backend yet.

    Functional changes: The signer specific behaviour is now passed in as a callback function. I did it this way as the alternative was enum based dispatch which seemed like overkill or using a trait which we try and avoid as Rust with traits is more complicated (e.g. lack of official async support if we want to use async later for signers).

  • src/commons/crypto/signing/signers/pkcs11/context.rs

    This new module provides the Pkcs11Context struct which encapsulates

    • Loading of PKCS#11 libraries.
    • Iinvoking C_Initialize() once and only once per process.
    • Delaying the call to C_Initialize() to avoid blocking Krill startup (e.g. if the backend is a remote AWS CloudHSM instance that we're having trouble connecting to and maybe the PKCS#11 library even tries multiple times with some largish timeout).
    • Calling C_Finalize() when all references to the context are dropped.
    • Logging all dispatches to the underlying PKCS#11 interface and any errors that occur.

    Known issues:

    • I'm not sure that the Option in ctx: Option<Ctx> is actually needed, need to review that.
  • src/commons/crypto/signing/signers/pkcs11/session.rs

    This new module provides the Pkcs11Session struct which is a thin wrapper around calls to Pkcs11Context ensuring that a new PKCS#11 session is opened and closed around the call (which I'm not sure is super wise or necessary but I had it in the prototype and haven't been revisited that design yet) and passing the created session handle to the Pkcs11Context when dispatching to the PKCS#11 C_XXX() function. It takes a lock (read or write as appopriate) on calls into the context because some of the functions exposed by the underlying pkcs11 crate are &mut self.

  • src/commons/crypto/signing/signers/pkcs11/signer.rs

    A thin wrapper that implements the rpki crate Signer interface by delegating into the functions in pkcs11/internal.rs, very similar to what is done for the KMIP signer in kmip/signer.rs.

The biggest part: the core of the new PKCS#11 signer

The pkcs11 dependency

First, we should consider whether the Rust pkcs11 is an acceptable crate to use. It uses the Apache 2.0 license which many of the Krill dependencies use. There are a few packages using it, it's been around for several years and has continued to receive updates in that time, it has multiple contributors though primarily just one, it has lots of forks though whether that means it is good to build on or people have found issues with it I do not know, from its description it appears to have been well tested and no issues have been seen with it so far with either SoftHSMv2, YubiHSM or AWS CloudHSM. Lastly, it has by far the most downloads, and the most recent downloads of the pkcs11 and cryptoki labelled crates on Crates.io.

The new Krill code

The structure tries to be consistent with that of kmip/internal.rs. It feels like there is more common functionality that can be extracted across the signer implementations but there are subtle differences so perhaps not, or not without more pain than gain.

The build(), get_name(), set_handle(), get_info(), create_registration_key() and sign_registration_challenge() functions are used by initial creation of the signer and registering/binding from SignerRouter, same as with the KMIP signer.

get_test_connection_settings() is a hard-coded placeholder until configuration from config file is supported.

To use a PKCS#11 signer we need to know which path to load the library from, which slot id to use (or slot label to lookup to determine the slot ID) and any optional user PIN to authenticate with. The PKCS#11 specification allows for PKCS#11 implementations that do not require login, those that require login but do not require a PIN and those the require login with a PIN.

The probe_server() function should probably be split into smaller chunks by extracting helper functions to make the logic clearer and to make the fn smaller. Essentially it just tries to access the library, determine the slot ID if needed, query information about the library. check that we can establish a session and login if needed, and determine whether or not the signer is capable of generating random numbers or not.

The primary differences to the KMIP signer are in the functions that support the Signer trait, e.g. build_key(), sign_with_key(), find_key(), etc.

In build_key() we assign a random ID to the key to create so as we can't change it afterwards (because at least AWS CloudHSM doesn't support C_SetAttribute()) and we need some stable identifier for looking up the key later . This could be a more deterministic value, e.g. with some fixed set of prefix bytes, and doesn't need to use OpenSSL rand (I see that @timbru just added a dependency on the rand crate in a different change so that could be used instead) or something else. After the key is created we attempt to obtain its key material or RSA modulus and public exponent so that we can determine the Krill KeyIdentifier value for the key.

Key IDs are CKA_ID byte values. To pass them back to the SignerMapper we convert them to strings by hex encoding them.

Lastly, there's a big mapping at the bottom of the module to decide whether a given error is retryable or not.

…, and support in principle selecting which signer to use for which purpose. (#539)
- Add a dependency on the backoff crate for retry support.
- Add a dependency on the r2d2 crate for connection pooling support.
- Uses GitHub versions of the bcder and rpki crates for the DER Unsigned Integer support needed by the KMIP signer.
- Refactor signers to crypto::signers and replace the Dummy signer with a KMIP signer.
- Added a "hsmtest" job to the GitHub Actions CI workflow that runs all Krill tests using the KMIP signer against PyKMIP.
- Added a "hsm-tests" Cargo feature flag for configuring Krill to use ONLY KMIP as a signer, not OpenSSL at all.
  Currently building without the "hsm-tests" feature flag set will fail if the "hsm" feature flag is set.
  Krill isn't ready to be used in "hsm" mode yet.
- Changes SignerProvider to implement the Signer trait so that it can be passed to builders so that their invocation of a signer also goes via SignerProvider dispatching to the correct signer.
… the write lock while switching to using the server as part of finishing a successful probe.
…er. Previously some Krill logic when invoked was given the same signer as handling the current purpose to invoke later even if for a different purpose. If the initial purpose required the KMIP signer as the key owning signer but the later purpose was one-off signing then that should be able to be routed if desired to the OpenSslSigner, for example. Introduces another layer of indirection: RouterSigner.
"router", not "dispatcher", and don't lock the entire SignerRouter for create/delete key operations.
…ner and from KeyIdentifier to Signer specific key id. Stores mapping using a new SignerInfo AggregateStore impl backed by a 'signers' subdirectory of the Krill data dir.. Krill can now be built with the `hsm` feature active without also requiring the `hsm-tests` feature to be active. Needs code cleanup and tests and docs.
… in HSM test usage mode (use the HSM as much as possible).
…r operation when using the PyKMIP signer instead of OpenSSL doesn't invoke refresh single too soon.
…asn't intended to securely guarantee that and other mechanisms for that exist such as TLS server certificate verification.
…crate version for new ItemNotFound suberror. Make the signer registration public key non-optional. Remove no longer used get_handle() signer fn. Deduplicate signers added to the pending set. Improvements to the binding process: bind by same name first, then try other signer store public keys; detect fatal failures and abort testing ready signers; detect key not found separately to other KMIP errors; cleanup the logic; don't panic if KMIP signer doesn't yet have a signer handle; just unwrap() locks consist with other Krill code.
@ximon18 ximon18 added the hsm Relates to adding HSM support to Krill label Nov 15, 2021
src/commons/crypto/signing/dispatch/signerinfo.rs Outdated Show resolved Hide resolved
src/commons/crypto/signing/dispatch/signerinfo.rs Outdated Show resolved Hide resolved
src/commons/crypto/signing/signers/probe.rs Show resolved Hide resolved
tests/migrate_repository.rs Show resolved Hide resolved
src/commons/crypto/signing/signers/pkcs11/context.rs Outdated Show resolved Hide resolved
src/commons/crypto/signing/signers/pkcs11/internal.rs Outdated Show resolved Hide resolved
…enum SlotIdOrLabel. Also refactor the `fn probe_server()` code where it is used into smaller private helper fns so that the main logic of the fn is easier to see. This also removes the need for the 'reacquisition' of the readale_ctx` which was a bit ugly.
…move_key()` to `remove_key()` as it IS used by the RTA code. Also make key removal code more consistent across signer impls and always remove the key from the `SignerMapper` too.
… just a consequence of the the fact that the Rust std lib `fn HashMap::or_insert_with_key()` is infallible, but as we don't actually use the std lib impl we can change this behaviour.
@ionut-arm
Copy link

First, we should consider whether the Rust pkcs11 is an acceptable crate to use. It uses the Apache 2.0 license which many of the Krill dependencies use. There are a few packages using it, it's been around for several years and has continued to receive updates in that time, it has multiple contributors though primarily just one, it has lots of forks though whether that means it is good to build on or people have found issues with it I do not know, from its description it appears to have been well tested and no issues have been seen with it so far with either SoftHSMv2, YubiHSM or AWS CloudHSM. Lastly, it has by far the most downloads, and the most recent downloads of the pkcs11 and cryptoki labelled crates on Crates.io.

Hi, to add a bit of context to this (disclaimer upfront - I'm one of the developers/maintainers of the cryptoki crate): We started off using the pkcs11 crate and contributing to it, however at some point last year when we dropped a big PR that tried to fix some of the (security) problems in the crate the maintainer's time dried off and we couldn't progress. cryptoki is actually a fork of pkcs11 on which we have since continued building (a fairly large update to the interface, documentation, etc., is currently in the making, actually).

@ximon18
Copy link
Member Author

ximon18 commented Nov 22, 2021

First, we should consider whether the Rust pkcs11 is an acceptable crate to use. It uses the Apache 2.0 license which many of the Krill dependencies use. There are a few packages using it, it's been around for several years and has continued to receive updates in that time, it has multiple contributors though primarily just one, it has lots of forks though whether that means it is good to build on or people have found issues with it I do not know, from its description it appears to have been well tested and no issues have been seen with it so far with either SoftHSMv2, YubiHSM or AWS CloudHSM. Lastly, it has by far the most downloads, and the most recent downloads of the pkcs11 and cryptoki labelled crates on Crates.io.

Hi, to add a bit of context to this (disclaimer upfront - I'm one of the developers/maintainers of the cryptoki crate): We started off using the pkcs11 crate and contributing to it, however at some point last year when we dropped a big PR that tried to fix some of the (security) problems in the crate the maintainer's time dried off and we couldn't progress. cryptoki is actually a fork of pkcs11 on which we have since continued building (a fairly large update to the interface, documentation, etc., is currently in the making, actually).

Thanks @ionut-arm. I've created #731 as a reminder to come back and review the decision to use the pkcs11 Rust crate dependency in Krill.

Base automatically changed from hsm-persistent-signer-key-mappings to hsm November 23, 2021 10:59
@ximon18 ximon18 force-pushed the issue-547-pkcs11-walking-skeleton branch from 7c4831f to 7bc195e Compare November 23, 2021 13:35
@ximon18 ximon18 merged commit 7054f84 into hsm Nov 23, 2021
@ximon18 ximon18 deleted the issue-547-pkcs11-walking-skeleton branch November 23, 2021 14:16
@ximon18
Copy link
Member Author

ximon18 commented Jan 4, 2022

@ionut-arm: FYI while testing on a Raspberry Pi 4b which is an ARMv7 architecture we encountered an issue with the pkcs11 crate which the cryptoki crate does not seem to suffer from. See mheese/rust-pkcs11#51. As such we may look more actively at the cryptoki crate now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hsm Relates to adding HSM support to Krill
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants