-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force Rotation: local authority scaffolding and implements X.509 authority endpoints #4020
Conversation
go.mod
Outdated
@@ -2,6 +2,8 @@ module github.com/spiffe/spire | |||
|
|||
go 1.19 | |||
|
|||
replace github.com/spiffe/spire-api-sdk => github.com/MarcosDY/spire-api-sdk v1.0.0-pre.0.20230321150416-5642a0450459 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR depends of spire-api-sdk PR
} | ||
|
||
func (s *Service) TaintX509Authority(ctx context.Context, req *localauthorityv1.TaintX509AuthorityRequest) (*localauthorityv1.TaintX509AuthorityResponse, error) { | ||
parseRequest := func() logrus.Fields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcosDY What do you think about deanonymizing this function? It is duplicated in the method below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func getRequestLogFields(publicKey []byte) logrus.Fields {
fields := logrus.Fields{}
if len(publicKey) > 0 {
fields[telemetry.X509AuthorityPublicKeySHA256] = api.HashByte(publicKey)
}
return fields
}
}, | ||
}, | ||
{ | ||
name: "slot has no public key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "slot has no public key", | |
name: "current slot has no public key", |
if next.GetPublicKey() != nil { | ||
status := localauthorityv1.AuthorityState_PREPARED | ||
// CA is removed from slot on rotation | ||
if next.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcosDY, why do we need this check here? If the "next.GetPublicKey()" method returns something different than nil, wouldn't this method always return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, when we reset a slot, the x509 field is set to be nil
but we keep the public key, so with that we know than that slot contains an OLD authority (you can take a look to RotateX509
localauthorityv1 "github.com/spiffe/spire-api-sdk/proto/spire/api/server/localauthority/v1" | ||
"github.com/spiffe/spire/pkg/common/telemetry" | ||
"github.com/spiffe/spire/pkg/server/api" | ||
localauthority "github.com/spiffe/spire/pkg/server/api/localauthority/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not, if we does not set alias it will try to use v1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to run the tests without this alias, and they are all passing; the package name being imported from github.com/spiffe/spire/pkg/server/api/localauthority/v1 is already "localauthority".
expectResp *localauthorityv1.RevokeX509AuthorityResponse | ||
}{ | ||
{ | ||
name: "taint old authority", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "taint old authority", | |
name: "revoke old authority", |
bab498c
to
358989e
Compare
93300c7
to
4d9e494
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @MarcosDY thank you so much! Code quality is high and it was easy to read through ❤️ 🙏
|
||
var states []*localauthorityv1.AuthorityState | ||
current := s.ca.GetCurrentX509CASlot() | ||
if !current.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions could current be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically this must never happens... we can just not validate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then perhaps we can have this conditional just immediately return an error if it's true? This logic was a little harder to follow than the rest and removing some layers will help
next := s.ca.GetNextX509CASlot() | ||
// when next has a key indicates that it was initialized | ||
if next.AuthorityID() != "" { | ||
status := localauthorityv1.AuthorityState_PREPARED | ||
// CA is removed from slot on rotation | ||
if next.IsEmpty() { | ||
status = localauthorityv1.AuthorityState_OLD | ||
} | ||
|
||
states = append(states, &localauthorityv1.AuthorityState{ | ||
AuthorityId: next.AuthorityID(), | ||
Status: status, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of this stuff is good enough for now, but in the long term, we will probably want to move state tracking into the CA package itself somewhere ... these kind of detection that depends on internal behavior of CA package might break someday. Also could be a good time to introduce a third "previous" key slot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have an State field on journal now, and we can just parse that status into a localauthorityv1.AuthorityState
I was thinking on do that on this PR but I choose to keep it out for now,
About the third slot
approach, are you ok with creating another issue for that?
I can start working on that and we can adapt this code if required, I think
since it is an ID we'll not affect key managers but I will need to review that,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you ok with creating another issue for that?
Yes for sure! Didn't intend for all that to be in scope for this PR, just that it seems like a good future direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #4271
rpccontext.AuditRPC(ctx) | ||
|
||
return &localauthorityv1.GetX509AuthorityStateResponse{ | ||
States: states, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I see this here, I wonder if it would be more ergonomic to have a field for each of the valid states, and a state value for each one ... which would prevent weird things like multiple active keys ... and may also make it easier to consume (e.g. don't have to tear through a slice looking for the active key)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, basically refactor local authority interface to have something like:
message GetX509AuthorityStateResponse {
// In use Authority
AuthorityState Active = 1;
// Ready to use
AuthorityState Prepared = 2;
// Previously used
AuthorityState Old = 3;
// WE MUST NEVER GET THIS
AuthorityState Unknown = 4;
}
For sure easier to consume and prevents multiple uses of the same type, I think we can avoid Unknown we may never get into that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels much better 😁 but I also know that we already reviewed the API etc and I missed this, so I don't think we should block on it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always expected to have changes on implementation, based on how things looks,
I like to have improvements as soon as possible,
I created a PR on spire-api-sdk,
as soon it is merged I'll apply changes here
// Prepare is going to use current slot when it is empty | ||
if slot.IsEmpty() { | ||
slot = s.ca.GetCurrentX509CASlot() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question here, in what cases is current slot empty? Mostly wondering if it's limited to startup or not, and if it is worth handling such edge case in this API (as opposed to returning some kind of error like "initializing")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree it must never happen, I'm happy to return an error instead, but we must never get into this case, even when initializing, we must load authorities before open APIs, so I dont expect it to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's unexpected, feels like a good idea to return an error
// TODO: implement a way to activate an OLD authority | ||
if req.AuthorityId != "" { | ||
log = log.WithField(telemetry.LocalAuthorityID, req.AuthorityId) | ||
return nil, api.MakeErr(log, codes.InvalidArgument, "activating an old authority is not supported yet", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a bad request (probably more likely bad request than attempt to activate old key?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my intention here is that we can activate the Previous activated one, may as part of adding that new slot on journal?
however, what do you suggest to do here? validate if the ID really exists before failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, activating a previous key is a big use case, but we should not support it until CA has changes to know about a previous slot, we can directly address it, etc. I worry about overloading the meaning of empty string because it could signal an error condition that we would then take a sensitive action on. So maybe for now, we don't support activating previous key .. and if/when we get CA changes to support a previous slot, we can also add the functionality here?
} | ||
log = log.WithField(telemetry.LocalAuthorityID, authorityID) | ||
|
||
if err := s.ds.TaintX509CA(ctx, s.td.IDString(), publicKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of neat that all we need to do is write this bit into the datastore
What happens if I try to taint the active authority? Same question for Revoke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in getX509PublicKey
I validate that the provided authority is not the active one, so this must not happens, another option is to update datastore to store states too
}, nil | ||
} | ||
|
||
// getX509PublicKey validates provided authority ID, and return OLD associated publick key, in case of authority ID is not provided, use the current OLD authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// getX509PublicKey validates provided authority ID, and return OLD associated publick key, in case of authority ID is not provided, use the current OLD authority | |
// getX509PublicKey validates provided authority ID, and return OLD associated public key, in case of authority ID is not provided, use the current OLD authority |
Relying on empty authority id to mean the old authority feels kind of dangerous (related to above comment about missing authority id in req)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do you suggest here to make authorityID required, but it must be the old one?
bundle, err := s.ds.FetchBundle(ctx, s.td.IDString()) | ||
if err != nil { | ||
return "", nil, err | ||
} | ||
|
||
for _, ca := range bundle.RootCas { | ||
cert, err := x509.ParseCertificate(ca.DerBytes) | ||
if err != nil { | ||
return "", nil, err | ||
} | ||
|
||
subjectKeyID := x509util.SubjectKeyIDToString(cert.SubjectKeyId) | ||
if authorityID == subjectKeyID { | ||
return subjectKeyID, cert.PublicKey, nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be searching the bundle for these keys? I think one side effect of this is I can taint a key in the bundle that isn't managed locally ... Since it's the Local Authority API, kind of feels like you should be interacting with the server instance which owns the key? Or are there use cases that require this kind of thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
searching be publickey was my initial idea too, as you said we may be able to taint a no local authority, I dont have a use case about that "yet",
So are you ok with getting authorityID and validate if it is used in the Old authority and reject if that is not the case? (we can have the Old one once we add the third Slot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that sounds good .. probably better to start with local only, and extend it if/when we have a good use case for that
} | ||
|
||
if !taintedKeyFound { | ||
return status.Error(codes.InvalidArgument, "it is not possible to revoke an untainted root CA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good check! I think it may be better to handle in local authority api though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure about that, I'm ok in moving it to service layer, but revoking on datastore without this validation worry me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok .. it felt weird to me for datastore to be knowing about this kind of thing, but I think it's fine to leave as a defensive measure 👍
|
||
/** Whether or not the key used to sign the certificate is tainted */ | ||
bool tainted_key = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this to the top level? Feels natural to have on a per-cert and per-key basis ... it remains set on PublicKey message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue here is that we does not have intermediate certificates on bundle when UpstreamAuthority is used,
So we keep "only" bundle here in that case, but intermediates lives on Journal,
So in order to taint a key in that case we must store the public key in some place
@@ -27,3 +28,21 @@ func GetSubjectKeyID(pubKey interface{}) ([]byte, error) { | |||
keyID := sha1.Sum(subjectKeyInfo.SubjectPublicKey.Bytes) //nolint: gosec // usage of SHA1 is according to specification | |||
return keyID[:], nil | |||
} | |||
|
|||
// SubjectKeyIDToString parse Subject Key ID into string | |||
func SubjectKeyIDToString(ski []byte) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ity service Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
…alized, and update authorityID format for x.509 local authorities Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
d59dc81
to
430cc5a
Compare
Signed-off-by: Marcos Yacob <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple more comments, I think this is good to go otherwise. Thanks again @MarcosDY 🙏
case req.AuthorityId == "": | ||
return nil, api.MakeErr(log, codes.InvalidArgument, "no authority ID provided", nil) | ||
|
||
// It is not possible to taint Active authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking nit: indentation error
// Only next local authority can be tainted | ||
case req.AuthorityId != nextSlot.AuthorityID(): | ||
return nil, api.MakeErr(log, codes.InvalidArgument, "unexpected authority ID", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to taint "old" authorities, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, the next case validates that the next is not in prepared status (so only old ones can be tainted)
bundle, err := s.ds.FetchBundle(ctx, s.td.IDString()) | ||
if err != nil { | ||
return "", nil, err | ||
} | ||
|
||
for _, ca := range bundle.RootCas { | ||
cert, err := x509.ParseCertificate(ca.DerBytes) | ||
if err != nil { | ||
return "", nil, err | ||
} | ||
|
||
subjectKeyID := x509util.SubjectKeyIDToString(cert.SubjectKeyId) | ||
if authorityID == subjectKeyID { | ||
return subjectKeyID, cert.PublicKey, nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that sounds good .. probably better to start with local only, and extend it if/when we have a good use case for that
Signed-off-by: Marcos Yacob <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @MarcosDY thank you!! 🎸
…ority endpoints (spiffe#4020) * Create local authority scaffolding, and implements X.509 local authority service Signed-off-by: Marcos Yacob <[email protected]> Signed-off-by: Neniel <[email protected]>
Force Rotation: local authority scaffolding and implements X.509 authority endpoints
Which issue this PR fixes
fixes #3895 and fixes #3897