Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force Rotation: local authority scaffolding and implements X.509 authority endpoints #4020

Merged
merged 21 commits into from
Jul 21, 2023

Conversation

MarcosDY
Copy link
Collaborator

Force Rotation: local authority scaffolding and implements X.509 authority endpoints

Which issue this PR fixes
fixes #3895 and fixes #3897

go.mod Outdated
@@ -2,6 +2,8 @@ module github.com/spiffe/spire

go 1.19

replace github.com/spiffe/spire-api-sdk => github.com/MarcosDY/spire-api-sdk v1.0.0-pre.0.20230321150416-5642a0450459
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR depends of spire-api-sdk PR

}

func (s *Service) TaintX509Authority(ctx context.Context, req *localauthorityv1.TaintX509AuthorityRequest) (*localauthorityv1.TaintX509AuthorityResponse, error) {
parseRequest := func() logrus.Fields {
Copy link
Contributor

@guilhermocc guilhermocc Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcosDY What do you think about deanonymizing this function? It is duplicated in the method below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 func getRequestLogFields(publicKey []byte) logrus.Fields {
	fields := logrus.Fields{}
	if len(publicKey) > 0 {
		fields[telemetry.X509AuthorityPublicKeySHA256] = api.HashByte(publicKey)
	}
	return fields
}

},
},
{
name: "slot has no public key",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "slot has no public key",
name: "current slot has no public key",

if next.GetPublicKey() != nil {
status := localauthorityv1.AuthorityState_PREPARED
// CA is removed from slot on rotation
if next.IsEmpty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcosDY, why do we need this check here? If the "next.GetPublicKey()" method returns something different than nil, wouldn't this method always return false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, when we reset a slot, the x509 field is set to be nil but we keep the public key, so with that we know than that slot contains an OLD authority (you can take a look to RotateX509

localauthorityv1 "github.com/spiffe/spire-api-sdk/proto/spire/api/server/localauthority/v1"
"github.com/spiffe/spire/pkg/common/telemetry"
"github.com/spiffe/spire/pkg/server/api"
localauthority "github.com/spiffe/spire/pkg/server/api/localauthority/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant alias

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not, if we does not set alias it will try to use v1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to run the tests without this alias, and they are all passing; the package name being imported from github.com/spiffe/spire/pkg/server/api/localauthority/v1 is already "localauthority".

expectResp *localauthorityv1.RevokeX509AuthorityResponse
}{
{
name: "taint old authority",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "taint old authority",
name: "revoke old authority",

@MarcosDY MarcosDY force-pushed the implement-localauthority-api branch from bab498c to 358989e Compare May 2, 2023 13:40
@MarcosDY MarcosDY marked this pull request as ready for review May 2, 2023 18:02
@MarcosDY MarcosDY force-pushed the implement-localauthority-api branch from 93300c7 to 4d9e494 Compare June 13, 2023 17:41
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @MarcosDY thank you so much! Code quality is high and it was easy to read through ❤️ 🙏


var states []*localauthorityv1.AuthorityState
current := s.ca.GetCurrentX509CASlot()
if !current.IsEmpty() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions could current be empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this must never happens... we can just not validate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then perhaps we can have this conditional just immediately return an error if it's true? This logic was a little harder to follow than the rest and removing some layers will help

Comment on lines 97 to 108
next := s.ca.GetNextX509CASlot()
// when next has a key indicates that it was initialized
if next.AuthorityID() != "" {
status := localauthorityv1.AuthorityState_PREPARED
// CA is removed from slot on rotation
if next.IsEmpty() {
status = localauthorityv1.AuthorityState_OLD
}

states = append(states, &localauthorityv1.AuthorityState{
AuthorityId: next.AuthorityID(),
Status: status,
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of this stuff is good enough for now, but in the long term, we will probably want to move state tracking into the CA package itself somewhere ... these kind of detection that depends on internal behavior of CA package might break someday. Also could be a good time to introduce a third "previous" key slot

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have an State field on journal now, and we can just parse that status into a localauthorityv1.AuthorityState I was thinking on do that on this PR but I choose to keep it out for now,
About the third slot approach, are you ok with creating another issue for that?
I can start working on that and we can adapt this code if required, I think since it is an ID we'll not affect key managers but I will need to review that,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you ok with creating another issue for that?

Yes for sure! Didn't intend for all that to be in scope for this PR, just that it seems like a good future direction

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #4271

rpccontext.AuditRPC(ctx)

return &localauthorityv1.GetX509AuthorityStateResponse{
States: states,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I see this here, I wonder if it would be more ergonomic to have a field for each of the valid states, and a state value for each one ... which would prevent weird things like multiple active keys ... and may also make it easier to consume (e.g. don't have to tear through a slice looking for the active key)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, basically refactor local authority interface to have something like:

message GetX509AuthorityStateResponse {
        // In use Authority
        AuthorityState Active = 1;
        // Ready to use
        AuthorityState Prepared = 2;
        // Previously used
        AuthorityState Old = 3;
        // WE MUST NEVER GET THIS
        AuthorityState Unknown = 4;
}

For sure easier to consume and prevents multiple uses of the same type, I think we can avoid Unknown we may never get into that one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels much better 😁 but I also know that we already reviewed the API etc and I missed this, so I don't think we should block on it here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is always expected to have changes on implementation, based on how things looks,
I like to have improvements as soon as possible,
I created a PR on spire-api-sdk,
as soon it is merged I'll apply changes here

Comment on lines 127 to 130
// Prepare is going to use current slot when it is empty
if slot.IsEmpty() {
slot = s.ca.GetCurrentX509CASlot()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here, in what cases is current slot empty? Mostly wondering if it's limited to startup or not, and if it is worth handling such edge case in this API (as opposed to returning some kind of error like "initializing")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree it must never happen, I'm happy to return an error instead, but we must never get into this case, even when initializing, we must load authorities before open APIs, so I dont expect it to happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's unexpected, feels like a good idea to return an error

// TODO: implement a way to activate an OLD authority
if req.AuthorityId != "" {
log = log.WithField(telemetry.LocalAuthorityID, req.AuthorityId)
return nil, api.MakeErr(log, codes.InvalidArgument, "activating an old authority is not supported yet", nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a bad request (probably more likely bad request than attempt to activate old key?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intention here is that we can activate the Previous activated one, may as part of adding that new slot on journal?
however, what do you suggest to do here? validate if the ID really exists before failing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, activating a previous key is a big use case, but we should not support it until CA has changes to know about a previous slot, we can directly address it, etc. I worry about overloading the meaning of empty string because it could signal an error condition that we would then take a sensitive action on. So maybe for now, we don't support activating previous key .. and if/when we get CA changes to support a previous slot, we can also add the functionality here?

}
log = log.WithField(telemetry.LocalAuthorityID, authorityID)

if err := s.ds.TaintX509CA(ctx, s.td.IDString(), publicKey); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of neat that all we need to do is write this bit into the datastore

What happens if I try to taint the active authority? Same question for Revoke

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in getX509PublicKey I validate that the provided authority is not the active one, so this must not happens, another option is to update datastore to store states too

}, nil
}

// getX509PublicKey validates provided authority ID, and return OLD associated publick key, in case of authority ID is not provided, use the current OLD authority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// getX509PublicKey validates provided authority ID, and return OLD associated publick key, in case of authority ID is not provided, use the current OLD authority
// getX509PublicKey validates provided authority ID, and return OLD associated public key, in case of authority ID is not provided, use the current OLD authority

Relying on empty authority id to mean the old authority feels kind of dangerous (related to above comment about missing authority id in req)

Copy link
Collaborator Author

@MarcosDY MarcosDY Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do you suggest here to make authorityID required, but it must be the old one?

Comment on lines +253 to +284
bundle, err := s.ds.FetchBundle(ctx, s.td.IDString())
if err != nil {
return "", nil, err
}

for _, ca := range bundle.RootCas {
cert, err := x509.ParseCertificate(ca.DerBytes)
if err != nil {
return "", nil, err
}

subjectKeyID := x509util.SubjectKeyIDToString(cert.SubjectKeyId)
if authorityID == subjectKeyID {
return subjectKeyID, cert.PublicKey, nil
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be searching the bundle for these keys? I think one side effect of this is I can taint a key in the bundle that isn't managed locally ... Since it's the Local Authority API, kind of feels like you should be interacting with the server instance which owns the key? Or are there use cases that require this kind of thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

searching be publickey was my initial idea too, as you said we may be able to taint a no local authority, I dont have a use case about that "yet",
So are you ok with getting authorityID and validate if it is used in the Old authority and reject if that is not the case? (we can have the Old one once we add the third Slot)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that sounds good .. probably better to start with local only, and extend it if/when we have a good use case for that

}

if !taintedKeyFound {
return status.Error(codes.InvalidArgument, "it is not possible to revoke an untainted root CA")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good check! I think it may be better to handle in local authority api though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about that, I'm ok in moving it to service layer, but revoking on datastore without this validation worry me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok .. it felt weird to me for datastore to be knowing about this kind of thing, but I think it's fine to leave as a defensive measure 👍

Comment on lines -128 to -130

/** Whether or not the key used to sign the certificate is tainted */
bool tainted_key = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this to the top level? Feels natural to have on a per-cert and per-key basis ... it remains set on PublicKey message

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue here is that we does not have intermediate certificates on bundle when UpstreamAuthority is used,
So we keep "only" bundle here in that case, but intermediates lives on Journal,
So in order to taint a key in that case we must store the public key in some place

@@ -27,3 +28,21 @@ func GetSubjectKeyID(pubKey interface{}) ([]byte, error) {
keyID := sha1.Sum(subjectKeyInfo.SubjectPublicKey.Bytes) //nolint: gosec // usage of SHA1 is according to specification
return keyID[:], nil
}

// SubjectKeyIDToString parse Subject Key ID into string
func SubjectKeyIDToString(ski []byte) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #4216 @rturner3 landed on lowercase hex, no colon

Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
…alized, and update authorityID format for x.509 local authorities

Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Marcos Yacob <[email protected]>
@MarcosDY MarcosDY force-pushed the implement-localauthority-api branch from d59dc81 to 430cc5a Compare July 3, 2023 15:50
Signed-off-by: Marcos Yacob <[email protected]>
@MarcosDY MarcosDY requested a review from evan2645 July 17, 2023 14:54
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple more comments, I think this is good to go otherwise. Thanks again @MarcosDY 🙏

case req.AuthorityId == "":
return nil, api.MakeErr(log, codes.InvalidArgument, "no authority ID provided", nil)

// It is not possible to taint Active authority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking nit: indentation error

Comment on lines +195 to +197
// Only next local authority can be tainted
case req.AuthorityId != nextSlot.AuthorityID():
return nil, api.MakeErr(log, codes.InvalidArgument, "unexpected authority ID", nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to taint "old" authorities, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, the next case validates that the next is not in prepared status (so only old ones can be tainted)

Comment on lines +253 to +284
bundle, err := s.ds.FetchBundle(ctx, s.td.IDString())
if err != nil {
return "", nil, err
}

for _, ca := range bundle.RootCas {
cert, err := x509.ParseCertificate(ca.DerBytes)
if err != nil {
return "", nil, err
}

subjectKeyID := x509util.SubjectKeyIDToString(cert.SubjectKeyId)
if authorityID == subjectKeyID {
return subjectKeyID, cert.PublicKey, nil
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that sounds good .. probably better to start with local only, and extend it if/when we have a good use case for that

@evan2645 evan2645 added this to the 1.7.2 milestone Jul 20, 2023
Signed-off-by: Marcos Yacob <[email protected]>
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @MarcosDY thank you!! 🎸

@MarcosDY MarcosDY merged commit fcf921e into spiffe:main Jul 21, 2023
@MarcosDY MarcosDY deleted the implement-localauthority-api branch July 21, 2023 20:50
Neniel pushed a commit to Neniel/spire that referenced this pull request Aug 24, 2023
…ority endpoints (spiffe#4020)

* Create local authority scaffolding, and implements X.509 local authority service

Signed-off-by: Marcos Yacob <[email protected]>
Signed-off-by: Neniel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement X.509 Local Authority RPCs Create LocalAuthority Scaffolding
3 participants