Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSM 3: PKCS#11 walking skeleton #710

Conversation

ximon18
Copy link
Member

@ximon18 ximon18 commented Nov 3, 2021

This PR builds on PR #688 "Use new de-mut'd rpki-rs Signer trait" and implements a working PKCS#11 signer including a GH Actions job to run the Krill test suite using the new signer against an installed instance of SoftHSMv2.

Known issues

  • A general lack of tests.
  • Comments in code need review/removing/adding/updating.
  • This PR still doesn't introduce configuration file based selection & configuration of signers, so it uses hard-coded feature gated creation and configuration of of signers instead.
  • Warnings from cargo build about unused code, These are due to the current hard-coded signer setup.

Readme first

I updated the doc/development/hsm/architecture.md file, I suggest reading it first. There's still lots more I could probably mention and explain and maybe some of it should be removed or moved to some sort of appendix.

A note about Cargo.toml and the pkcs11 crate

I hit a strange problem where in release builds the pkcs11 crate seems to end up with a key NULL ptr being optimized out such that SoftHSMv2 complains with CKR_ARGUMENTS_NAD and in the logs says that pReserved must be NULL. In debug builds this doesn't happen. So in Cargo.toml there's currently a workaround for this that disables compiler optimizations only for the pkcs11 crate in release mode. That seems to have "solved" the problem for now but this should be raised with the pkcs11 crate owners, there are no issues relating to this in their GitHub project that I can see.

I have raised mheese/rust-pkcs11#49 for this.

Changes that are quick to review

First, let's mention changes that are separate and easy to review and will knock down dramatically the count of files left to review:

  • .github/workflows/ci.yml: An updated GitHub Actions CI workflow that adds testing Krill in PKCS#11 mode against SoftHSMv2 installed in the runner machine.

  • Cargo.lock and Cargo.toml: Additional crate dependencies and Cargo features relating to the new PKCS#11 code.

  • src/commons/crypto/cert.rs, src/commons/crypto/cms.rs, src/commons/crypto/signing/dispatch/krillsigner.rs, src/daemon/ca/certauth.rs, src/daemon/krillserver.rs, src/pubd/manager.rs, src/upgrades/mod.rs, src/upgrades/v0_9_0, /ca_objects_migration.rs: Modified calls to KrillSigner::build() to pass an additional alternate_config boolean flag. This should be false in most cases, it is for use by a subset of the tests that create a second instance of Krill or KrillSigner within the same process and then try and use PKCS#11 signer with SoftHSMv2, as SoftHSMv2 doesn't support multiple users or concurrent login by the same user so when true the second KrillSigner instance (e.g. for "Alice") will use OpenSSL instead of PKCS#11 based signing. The need for this flag will disappear when signers can be configured as tests can then configure the signing setup as needed without this hard-coded workaround.

  • tests/migrate_repository.rs: The migrate_repository test has been disabled when using PKCS#11 testing mode as it fails, I believe relating to the use of SoftHSMv2 as mentioned above, but in this case I have not yet narrowed down the exact failure.

  • src/commons/crypto/signing/signers/error.rs: Multiple files including this one were touched by a SignerError variant rename to make it immediately clear whether this issue is worth retrying or not, something that was confusing me on a regular basis. The variants were also sorted into alphabetical order.

    • SignerError::SignerUnavailable is renamed to SignerError::TemporarilyUnavilable.
    • SignerError::SignerUnusable is renamed to SignerError::PermanentlyUnusable
  • src/commons/crypto/signing/dispatch/signerprovider.rs: The enum based dispatch has been extended by copy-pasting a line in each match block and editing it to refer to the new SignerProvider::Pkcs11 variant. I also then extended it again to support a SignerProvider::Mock variant for testing purposes.

  • src/commons/crypto/signing/signers/kmip/connpool.rs: An Arc was introduced around the ConnectionSettings as it fitted better into the code I wrote for the PKCS#11 signer and to support, maybe, referencing part of the Krill configuration PR directly in a later PR rather than cloning what could be a lot of configuration data.

  • src/commons/crypto/signing/signers/kmip/signer.rs: The data argument of fn rand() was renamed to target to better reflect what it is for and to be more consistent with existing code elsewhere.

  • src/commons/crypto/signing/signers/mod.rs: pub mod XXX for the new pkcs11 and probe modules (probe is a factoring out of code from the KMIP signer to be used also by the PKCS11 signer).

  • src/commons/crypto/signing/signers/pkcs11/mod.rs: pub mod XXX for the new pkcs11/xxx.rs modules.

  • src/commons/crypto/signing/dispatch/signerrouter.rs: Extended to create a PKCS#11 signer if the hsm-tests-pkcs11 feature flag is enabled. Added some initial happy flow tests based around the new mock signer.

  • src/commons/crypto/signing/signers/softsigner.rs: Very minor changes including re-ordering of some functions for consistency across signer implementations, and prefixing of some error messages with "OpenSSL".

Intermediate changes

  • src/commons/crypto/signing/dispatch/signerinfo.rs: Removed an unused function and updated it to generate the signer handle as a UUID and to store the signer backend identity details (public key and private key internal id) more explicitly, not combined and encoded together as a signer handle.

  • src/commons/crypto/signing/signers/kmip/internal.rs:

    • Server probing functionality has been extracted into a new probe.rs module. The new probe_server() function contains the KMIP specific bits that remain from before.
    • A stupid inverted flag mistake was fixed (see IGNORE_MISSING_CAPABILITIES).
    • Some function ordering was changed to make comparison between signer modules easier.
    • The supports_rng_retrieve flag was renamed to a non-KMIP-specific name supports_random_number_generation term so that the same naming can be used in the PKCS#11 signer as well for consistency.
    • The .pool.get() call was abstracted into .get_connection() so that the same pattern could be used in the PKCS#11 signer (for which we do no need to obtain a connection but not from a pool).
    • Some minor changes made along the way (added comments, prefixing some error messages with "KMIP" but I think error messages will need to be reviewed and tweaked again later so I'd ignore this for now).

New modules:

  • src/commons/crypto/signing/signers/mocksigner.rs:

    Initial version of a mock signer. Should perhaps be updated to use hard-coded values instead of actually doing key generation and signing using OpenSSL.

  • src/commons/crypto/signing/signers/probe.rs:

    Functionality extracted from the KMIP signer for use also by the PKCS#11 signer. I added some tests as well while I was refactoring.

    Explanation: In the KMIP signer while not yet initialized we try periodically (but not on every operation) to connect to the signer backend and to then verify that the KMIP server meets our requirements. We need to do the same for the PKCS#11 signer as well as in the case of something like a PKCS#11 library communicating with a remote AWS CloudHSM server it may fail initially if the server is not yet reachable and so we will have to keep trying. I verified this support with the YubiHSM USB key that I have because if the background yubihsm-connector daemon isn't running then the library can be loaded but can't connect with the backend yet.

    Functional changes: The signer specific behaviour is now passed in as a callback function. I did it this way as the alternative was enum based dispatch which seemed like overkill or using a trait which we try and avoid as Rust with traits is more complicated (e.g. lack of official async support if we want to use async later for signers).

  • src/commons/crypto/signing/signers/pkcs11/context.rs

    This new module provides the Pkcs11Context struct which encapsulates

    • Loading of PKCS#11 libraries.
    • Iinvoking C_Initialize() once and only once per process.
    • Delaying the call to C_Initialize() to avoid blocking Krill startup (e.g. if the backend is a remote AWS CloudHSM instance that we're having trouble connecting to and maybe the PKCS#11 library even tries multiple times with some largish timeout).
    • Calling C_Finalize() when all references to the context are dropped.
    • Logging all dispatches to the underlying PKCS#11 interface and any errors that occur.

    Known issues:

    • I'm not sure that the Option in ctx: Option<Ctx> is actually needed, need to review that.
  • src/commons/crypto/signing/signers/pkcs11/session.rs

    This new module provides the Pkcs11Session struct which is a thin wrapper around calls to Pkcs11Context ensuring that a new PKCS#11 session is opened and closed around the call (which I'm not sure is super wise or necessary but I had it in the prototype and haven't been revisited that design yet) and passing the created session handle to the Pkcs11Context when dispatching to the PKCS#11 C_XXX() function. It takes a lock (read or write as appopriate) on calls into the context because some of the functions exposed by the underlying pkcs11 crate are &mut self.

  • src/commons/crypto/signing/signers/pkcs11/signer.rs

    A thin wrapper that implements the rpki crate Signer interface by delegating into the functions in pkcs11/internal.rs, very similar to what is done for the KMIP signer in kmip/signer.rs.

The biggest part: the core of the new PKCS#11 signer

The pkcs11 dependency

First, we should consider whether the Rust pkcs11 is an acceptable crate to use. It uses the Apache 2.0 license which many of the Krill dependencies use. There are a few packages using it, it's been around for several years and has continued to receive updates in that time, it has multiple contributors though primarily just one, it has lots of forks though whether that means it is good to build on or people have found issues with it I do not know, from its description it appears to have been well tested and no issues have been seen with it so far with either SoftHSMv2, YubiHSM or AWS CloudHSM. Lastly, it has by far the most downloads, and the most recent downloads of the pkcs11 and cryptoki labelled crates on Crates.io.

The new Krill code

The structure tries to be consistent with that of kmip/internal.rs. It feels like there is more common functionality that can be extracted across the signer implementations but there are subtle differences so perhaps not, or not without more pain than gain.

The build(), get_name(), set_handle(), get_info(), create_registration_key() and sign_registration_challenge() functions are used by initial creation of the signer and registering/binding from SignerRouter, same as with the KMIP signer.

get_test_connection_settings() is a hard-coded placeholder until configuration from config file is supported.

To use a PKCS#11 signer we need to know which path to load the library from, which slot id to use (or slot label to lookup to determine the slot ID) and any optional user PIN to authenticate with. The PKCS#11 specification allows for PKCS#11 implementations that do not require login, those that require login but do not require a PIN and those the require login with a PIN.

The probe_server() function should probably be split into smaller chunks by extracting helper functions to make the logic clearer and to make the fn smaller. Essentially it just tries to access the library, determine the slot ID if needed, query information about the library. check that we can establish a session and login if needed, and determine whether or not the signer is capable of generating random numbers or not.

The primary differences to the KMIP signer are in the functions that support the Signer trait, e.g. build_key(), sign_with_key(), find_key(), etc.

In build_key() we assign a random ID to the key to create so as we can't change it afterwards (because at least AWS CloudHSM doesn't support C_SetAttribute()) and we need some stable identifier for looking up the key later . This could be a more deterministic value, e.g. with some fixed set of prefix bytes, and doesn't need to use OpenSSL rand (I see that @timbru just added a dependency on the rand crate in a different change so that could be used instead) or something else. After the key is created we attempt to obtain its key material or RSA modulus and public exponent so that we can determine the Krill KeyIdentifier value for the key.

Key IDs are CKA_ID byte values. To pass them back to the SignerMapper we convert them to strings by hex encoding them.

Lastly, there's a big mapping at the bottom of the module to decide whether a given error is retryable or not.

…tinier RwLock around the signer handle held by each SignerProvider.
…nd() is an impl of a trait fn which is not supported.

I also cannot further modify KrillSigner fns to use the new Sign and SignWithKey rpki-rs traits as SignedObjectBuilder::finalize() needs a Signer that we don't want to pass to it and it cannot be replicated as it uses rpki-rs private internals.
… to (a) drop the redundant Signer prefix in the variant name and (b) to indicate the duration/severity of the issue, particularly that Unavailable is a transient error as it is used to guide retry logic.
…rating random numbers to something non-KMIP specific as the concept also applies to other signers and this will make code across signers more consistent and more amenable to factoring out later.
…s what it does and for consistency across signers.
… for using some of the same logic/code in the PKCS#11 signer but without the pool. Hopefully later the common code can be factored out.
…t KMIP code and prototype PKCS#11 code. Uses some of the ideas and code from the KMIP signer to prevent Krill startup blocking or failing if the PKCS#11 signer is unavailable or unusable and for retrying requests if the cause of failure appears to be transient. Also contains initial support for multiple active PKCS#11 signers using the same and/or different libraries (by filesystem path) unlike the prototype which could only load one library at a time. Configuration is hard-coded at present. Also unlike the prototype the Pkcs11Session type handles passing the session handle to the PKCS#11 library instead of requiring the caller to do so. Hopefully lots of code in common with KmipSigner can be factored out later.
- Log all Cryptoki calls at trace level and log all Cryptoki errors.
- Implement all Signer functionality except random number generation and key deletion.
- Wire up integration with the SignerMapper.
- Correct KMIP copy-pasted references that should say PKCS#11.
- Support locating theslot ID by label.
- Support reporting Cryptoki and token details.
- Support logging in (required to make many Cryptoki calls).
…reating the second Alice publisher and for any other tests needing two signer different configurations (as configuration is hard-coded at present). Disable the migrate repository test which fails for a similar reason. This is all caused by SoftHSMv2 not supporting more than one user or that one user to be logged in more than once at a time. Cargo test --features hsm,hsm-tests-pkcs11 passes with these temporary hacks.
- Rename temporary second_signer_hack flag to alternate_config flag. Document the reason for the flag.
- Added some code comments.
- Support login without user pin.
- Support token use without login.
- Include signer name in log messages.
- Dump full Cryptoki, slot and token info at trace level.
@ximon18 ximon18 added the hsm Relates to adding HSM support to Krill label Nov 3, 2021
@ximon18 ximon18 requested a review from timbru November 3, 2021 11:27
@ximon18 ximon18 requested a review from partim November 10, 2021 09:26
@ximon18 ximon18 changed the title PKCS#11 walking skeleton HSM 3: PKCS#11 walking skeleton Nov 10, 2021
@ximon18 ximon18 changed the base branch from hsm-persistent-signer-key-mappings to main November 15, 2021 10:57
@ximon18 ximon18 changed the base branch from main to hsm-persistent-signer-key-mappings November 15, 2021 10:57
@ximon18
Copy link
Member Author

ximon18 commented Nov 15, 2021

GitHub is currently failing to see new commit 19cede0 that was pushed to the PR branch. It is therefore wrongly showing that there are merge conflicts. I assume this is a temporary issue at GH and will resolve itself shortly.

@ximon18
Copy link
Member Author

ximon18 commented Nov 15, 2021

I will close and re-create this PR as GH is still not showing the most recent commit.

@ximon18 ximon18 closed this Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hsm Relates to adding HSM support to Krill
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant