-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refresh-credential] rotate the AWS ECR credential by schedule #15313
Conversation
According to Kubernetes doc, a container using a Secret as a subPath volume mount will not receive Secret updates. Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
9481d86
to
0c0918c
Compare
fd1920e
to
2993448
Compare
started the job as gitpod-build-jenting-aws-ecr-14914.68 because the annotations in the pull request description changed |
c962856
to
37a3115
Compare
ee58365
to
fe1542d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks broadly ok, but I've put a couple of questions/comments in there. Around on Slack if you want to chat about them.
install/installer/pkg/components/registry-credential/cronjob.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits, but looks good to me!
/hold just in case. Feel free to remove
install/installer/pkg/components/registry-credential/objects.go
Outdated
Show resolved
Hide resolved
@@ -81,6 +81,7 @@ func deployment(ctx *common.RenderContext) ([]runtime.Object, error) { | |||
VolumeSource: corev1.VolumeSource{ | |||
Secret: &corev1.SecretVolumeSource{ | |||
SecretName: secretName, | |||
Items: []corev1.KeyToPath{{Key: ".dockerconfigjson", Path: "pull-secret.json"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is pull-secret
the right name? as we would also use it for pushes here. Could be tackled with https://github.com/gitpod-io/security/issues/89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is pull-secret.json
now, ref code.
I updated the https://github.com/gitpod-io/security/issues/89 to track it.
- Provide AWS doc/code link - Check IsAWSECRURL in Object once - Check credential AWS access/secret key pair exists Signed-off-by: JenTing Hsiao <[email protected]>
10bc7a3
to
ae4f39d
Compare
@mrsimonemms PTAL 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one additional question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jenting , wow, solid work!
Can you share an updated Loom video showing the test?
Also, I think we'll need follow-on PRs:
- We should be using IRSA, instead of persisting client_id and client_secret as a Kubernetes secret
- What do you think of renaming registry-credential to refresh-ecr-credential? Naming things is hard. 😉
- What is the plan for creating an alert for when this job fails? If it attempts 10 times and still fails, how much time would that leave for us to react, before the cluster is unable to interact with the registry?
accessKey := string(credSecret.Data[accessKeyIdName]) | ||
secretKey := string(credSecret.Data[secretAccessKeyName]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenting we should not be persisting an access and secret key to Kubernetes to interface with AWS. Instead, we should be leveraging an IAM role for a service account, aka IRSA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, we should avoid persisting long lived secrets to workspace clusters, and negotiate a temporary AWS role credential instead, to do the GetAuthorizationToken
work.
After validating the token's signature, IAM exchanges the Kubernetes issued token for a temporary AWS role credential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll need to use awsconfig.WithAssumeRoleCredentialOptions
as an alternative to awsconfig.WithCredentialsProvider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see an example from @mrzarquon here via ecr-helper
service account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kylos101 I saw for S3 object storage, we use the shared config.
@Furisto Could you tell me how did you prepare the credential file you developed for the S3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Furisto Looks like you were using the shared config for development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I update the doc, same as how the S3 load the credentials file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenting I created a Kubernetes secret that contains the AWS shared config file (which gives access only to a specific bucket) and then mounted the file into the container. Then I reference that file when I am creating the client.. The file contains the credentials for a user that has no permissions except for the ones assigned by a bucket policy.
Certificate ObjectRef `json:"certificate" validate:"required"` | ||
URL string `json:"url" validate:"required"` | ||
Certificate ObjectRef `json:"certificate" validate:"required"` | ||
Credential *ObjectRef `json:"credential,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am skeptical we need this Credential field, please see my IRSA feedback for example.
|
||
// CredentialSecret points to a Kubernetes secret which contains the credential to rotate | ||
// the container registry credential . | ||
CredentialSecret string `json:"credentialSecret"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CredentialSecret
should not be needed, I think, if we rely on IRSA.
log.Infof("Secret %s/%s updated with new ECR credentials", cfg.SecretToUpdate, cfg.Namespace) | ||
} | ||
|
||
func newAWSConfig(region, accessKeyId, secretAccessKey, session string) (aws.Config, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be changed, for example, to configure AWS given the service account associated with this new component.
@@ -101,7 +101,7 @@ func configmap(ctx *common.RenderContext) ([]runtime.Object, error) { | |||
MaxSize: MaxSizeBytes, | |||
}, | |||
}, | |||
AuthCfg: "/mnt/pull-secret.json", | |||
AuthCfg: "/mnt/pull-secret/pull-secret.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget, but, nesting the secret within a sub-directory was necessary, so that we can watch it for changes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
credentialSecretName, err := credentialSecretName(ctx) | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to remove, given IRSA feedback
install/installer/pkg/components/registry-credential/cronjob.go
Outdated
Show resolved
Hide resolved
We could rename it to
According to the Kubernetes doc, 10-time retry means the alert triggered after 10s+20s+40s+80s+160s+320s+360s+360s+360s+360s=34.5minutes. |
3998706
to
796b970
Compare
I talked with @kylos101, and Kyle said since S3 requires the same secret, don't worry about my feedback for IRSA. Correct me if I am wrong, but if I remember correctly, the IRSA (the AWs STS assume-role) is workable when the Kubernetes cluster lands on the AWS EC2 because the STS assume-role relies on the AWS EC2 information to grant the temporary credential. |
d4f8cbe
to
9d61e17
Compare
Signed-off-by: JenTing Hsiao <[email protected]>
Signed-off-by: JenTing Hsiao <[email protected]>
2697c6a
to
c09d2f9
Compare
When using concurrentPolicy=Replace and the job failed but haven't reach the backoff limit, the new job will replace the original one if the schedule time is less than the sum of the backoff time. It causes a problem that the job alert `kube_job_status_failed{job_name=~"refresh-credential.*",reason="BackoffLimitExceeded"}` can't be fired. Signed-off-by: JenTing Hsiao <[email protected]>
/unhold All the comments are addressed. Thanks all the reviewers' comments. We could have follow up PRs
|
Description
Add a new component
refresh-credential
to rotate the AWS ECR authorization token by schedule.The refresh-credential component is installed once the
containerRegistry.external.url
is an AWS ECR URL.Remember to configure the
containerRegistry.inCluster = false
to indicate to use external container registry.containerRegistry.external.certificate
indicates the secret name the AWS ECR authorization token write to.containerRegistry.external.credentials
which points to the AWS secret that with the AWS access/secret key pair.Example yaml manifest:
The refresh-credential run as Kubernetes CronJob. The AWS ECR authorization token expires every 12 hours. The CronJob runs every 6 hours. Therefore, the on-caller have time to mitigate it once the ECR authorization token rotation fails.
We can add an alert rule when the job hits the backoffLimit. The trigger criteria is to query
kube_job_status_failed{job_name=~"refresh-credential.*",reason="BackoffLimitExceeded"} > 0
.Reference to kube_job_status_failed.
https://www.loom.com/share/6d4b9dedc2df4a32bea87c12e3533230
Related Issue(s)
Fixes #12104
How to test
Try https://g704b4152c75da6e18a1605.workspace-preview.gitpod-io-dev.com/workspaces
Release Notes
Documentation
None
Werft options:
If enabled this will build
install/preview
Valid options are
all
,workspace
,webapp
,ide
,jetbrains
,vscode
,ssh