-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PKI - Fix order of chain building writes #17772
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
When troubleshooting a recent migration failure from 1.10->1.11, it was noted that some PKI mounts had bad chain construction despite having valid, chaining issuers. Due to the cluster's leadership trashing between nodes, the migration logic was re-executed several times, partially succeeding each time. While the legacy CA bundle migration logic was written with this in mind, one shortcoming in the chain building code lead us to truncate the ca_chain: by sorting the list of issuers after including non-written issuers (with random IDs), these issuers would occasionally be persisted prior to storage _prior_ to existing CAs with modified chains. The migration code carefully imported the active issuer prior to its parents. However, due to this bug, there was a chance that, if write to the pending parent succeeded but updating the active issuer didn't, the active issuer's ca_chain field would only contain the self-reference and not the parent's reference as well. Ultimately, a workaround of setting and subsequently unsetting a manual chain would force a chain regeneration. In this patch, we simply fix the write ordering: because we need to ensure a stable chain sorting, we leave the sort location in the same place, but delay writing the provided referenceCert to the last position. This is because the reference is meant to be the user-facing action: without transactional write capabilities, other chains may succeed, but if the last user-facing action fails, the user will hopefully retry the action. This will also correct migration, by ensuring the subsequent issuer import will be attempted again, triggering another chain build and only persisting this issuer when all other issuers have also been updated. Signed-off-by: Alexander Scheel <[email protected]>
In the previous commit, we identified an issue that would occur on legacy issuer migration to the new storage format. This is easy enough to detect for any given mount (by an operator), but automating scanning and remediating all PKI mounts in large deployments might be difficult. Write a new storage migration version to regenerate all chains on upgrade, once. Signed-off-by: Alexander Scheel <[email protected]>
Signed-off-by: Alexander Scheel <[email protected]>
e0acf81
to
6c0a7de
Compare
Signed-off-by: Alexander Scheel <[email protected]>
Signed-off-by: Alexander Scheel <[email protected]>
6c0a7de
to
b964e77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great.
@@ -43,7 +43,7 @@ func (sc *storageContext) rebuildIssuersChains(referenceCert *issuerEntry /* opt | |||
// To begin, we fetch all known issuers from disk. | |||
issuers, err := sc.listIssuers() | |||
if err != nil { | |||
return fmt.Errorf("unable to list issuers to build chain: %v", err) | |||
return fmt.Errorf("unable to list issuers to build chain: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice fix, thank you
// Ensure the chains are correct on the intermediate. By using the | ||
// issuerId saved above, this ensures we didn't change any issuerIds, | ||
// we merely updated the existing issuers. | ||
intIssuer, err := sc.fetchIssuerById(intIssuerId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really well done, thank you!
Thank you both! Merging... |
Probably best to defer to the commit messages here:
As noted above, patching the issuer to set
manual_chain=self
followed bymanual_chain=
will be an adequate workaround.