-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding private registry auth as task resource #1427
Conversation
This change adds a terminalReason message to the task struct. The intended use is for the field in the task to be set when the agent explicitly transitions a task to stopped
WIP COMMENT: 1. pra resource still needs to implement the marshalling logic + add some concept of durable/not durable resource type for resources that have to be recreated on agent restart. 2. the asm package, needs some refactoring. needs to add some kind of create client factor (for testing) since we're building regional clients for these calls 3. the STSC "reason" field changes have to be consumed by this feature since we're using that to communicate async pra failiures
Added a new factory interface in asm package, which creates ASM clients based on the region in ASMAuthConfig. Wired this into the ResourceFields struct in the taskresource package as well.
Move ContainerStatus and TaskStatus into container/status and task/status respectively. This is required for TaskStatus to be used by the TaskResource struct, to avoid circular dependecies.
Implemented the Initialize() method for ASM Auth resource, which helps with retrieving registry creds on agent restart. Also, added a factory for creating new ASM SDK clients.
This commit add tests for ASMAuthStatus and also engine test for failiure cases in private registry over ASM code path. This also adds the agent capability for private-registry-authentication.secretsmanager Also includes lint edits.
This commit branches the docker clients auth data handling logic, ecr and asm auth are handled seperatly. Also includes changes to propagate resource creation failiure reason up to task terminal reason field.
This commit adds a code to check the asm resource cache before making a call to asm. Also adds changes to populate STSC reason field when a stopped containers force a task stop.
resourceFields *taskresource.ResourceFields) { | ||
asmAuthResource := asmauth.NewASMAuthResource(task.Arn, task.getAllASMAuthDataRequirements(), | ||
task.ExecutionCredentialsID, credentialsManager, resourceFields.ASMClientCreator) | ||
task.AddResource(asmauth.ResourceName, asmAuthResource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestMarshalUnmarshalJSON of asmauth doesn't test the marshal/unmarshal from state file
okay that makes sense. adding test.
|
||
// ShouldPullWithASMAuth returns true if this container needs to retrieve | ||
// private registry authentication data from ASM | ||
func (c *Container) ShouldPullWithASMAuth() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a unit test for this?
@@ -202,6 +212,11 @@ func (task *Task) PostUnmarshalTask(cfg *config.Config, | |||
return apierrors.NewResourceInitError(task.Arn, err) | |||
} | |||
} | |||
|
|||
if task.requiresASMDockerAuthData() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this covered by a unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added TestPostUnmarshalTaskASMDockerAuth
in task_test.go
@@ -1214,6 +1263,50 @@ func (task *Task) AddResource(resourceType string, resource taskresource.TaskRes | |||
task.ResourcesMapUnsafe[resourceType] = append(task.ResourcesMapUnsafe[resourceType], resource) | |||
} | |||
|
|||
// SetTerminalReason sets the terminalReason string and this can only be set | |||
// once per the task's lifecycle. This field does not accept updates. | |||
func (task *Task) SetTerminalReason(reason string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a unit test.
} | ||
|
||
// PopulateASMAuthData sets docker auth credentials for a container | ||
func (task *Task) PopulateASMAuthData(container *apicontainer.Container) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this covered by an existing test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's captured by image pull tests, but added unit tests specifically for this.
|
||
// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to | ||
// retrieve the docker auth data | ||
func GetDockerAuthFromASM(secretID string, client secretsmanageriface.SecretsManagerAPI) (docker.AuthConfiguration, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a unit test for both happy path and failure case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added asm_test.go
, somehow was missed during merge 🤔
} | ||
|
||
func TestMarshalASMAuthStatus(t *testing.T) { | ||
status := ASMAuthStatusNone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also test "ASMAuthStatusCreated"? And this can be combined together with below using table test.
SomeStatus ASMAuthStatus `json:"status"` | ||
} | ||
|
||
func TestUnmarshalASMAuthStatus(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these tests combined together with table tests?
}) | ||
} | ||
|
||
func (auth *ASMAuthResource) UnmarshalJSON(b []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comments
ExecutionCredentialsID string `json:"executionCredentialsID"` | ||
} | ||
|
||
func (auth *ASMAuthResource) MarshalJSON() ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lint: please add comments.
@richardpen added tests you called out, this was very helpful thanks 😁 |
This commit adds unit tests to code paths that are exercised using the private registry auth over asm feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code mostly LGTM, did you have the end to end test pass?
return authConfig, CannotPullECRContainerError{err} | ||
|
||
switch authData.Type { | ||
case "ecr": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the constant defined above, same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making those constants exportable then.
agent/engine/docker_task_engine.go
Outdated
seelog.Errorf("Task engine [%s]: unable to acquire Docker registry credentials for container [%s]", | ||
task.Arn, container.Name) | ||
return dockerapi.DockerContainerMetadata{ | ||
Error: dockerapi.CannotPullECRContainerError{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using CannotPullECRContainerError
seems confusion here, as this is not necessary pulling from ECR, can you create a new error type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea you're right, i'm going to add some distinct error types here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a CannotPullContainerAuthError
type
c.lock.RLock() | ||
defer c.lock.RUnlock() | ||
|
||
return c.RegistryAuthentication != nil && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this, but would prefer it to be expressed more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer it to be expressed more explicitly.
how do you mean exactly? i dont follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to make it so that it isn't a multi-line boolean expression
// express or implied. See the License for the specific language governing | ||
// permissions and limitations under the License. | ||
|
||
package asm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does asm mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws secrets manager https://aws.amazon.com/secrets-manager/
Password *string | ||
} | ||
|
||
// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i get it now. but add a package doc note maybe?
agent/asm/asm.go
Outdated
err := json.Unmarshal([]byte(secretValue), &authDataValue) | ||
if err != nil { | ||
// could not unmarshal, incorrect secret value schema | ||
return docker.AuthConfiguration{}, errors.Wrapf(err, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always safe to print this err? Does the json unmarshal error sometimes return parts of the original string? I'd lean toward not printing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean toward not printing it.
why exactly? this is an error that would be passed back up to CP as reason for why the the auth data wasn't acquired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might print some of the secret text, which we don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might print some of the secret text, which we don't want.
hm i can't seem to find anything that confirms that errors will not contain partials. probably best to leave it out then?
ShouldError: false, | ||
}, | ||
{ | ||
Name: "MissingUsername", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is username always required for all repos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it needs authentication, then it needs username and password.
ShouldError: true, | ||
}, | ||
{ | ||
Name: "MalformedJson", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about empty json like {}
or []
? Or is that covered by the other empty cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be captured under "...malformed empty field", but added an explicit case w/ {}
@@ -0,0 +1,143 @@ | |||
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//+build unit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you miss this, I didn't see this was added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea missed it, added now.
} | ||
gomock.InOrder( | ||
credentialsManager.EXPECT().GetTaskCredentials(executionCredentialsID).Return(creds, true), | ||
asmClientCreator.EXPECT().NewASMClient(region, iamRoleCreds).Return(mockASMClient), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a solution for you here, but we need to be careful with how many layers we go especially with new code. Remember:
Every time a mock returns a mock a fairy dies *
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea ended up adding this to be able to test more complete workflows like post unmarshall or the image pull workflows.
asmResOut := &ASMAuthResource{} | ||
err = json.Unmarshal(bytes, asmResOut) | ||
require.NoError(t, err) | ||
assert.Equal(t, asmResIn.taskARN, asmResOut.taskARN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert does deep equal so assert.Equal(t,asmResIn,asmResOut) should work for you (I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh why not check field by field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its fine as is -- its just more concise the other way.
/// this...
assert.Equal(t, expectedStruct, actualStruct)
/// instead of ...
assert.Equal(t, expectedStruct.field1, actualStruct.field1)
assert.Equal(t, expectedStruct.field2, actualStruct.field2)
assert.Equal(t, expectedStruct.field3, actualStruct.field3)
// ...
assert.Equal(t, asmResIn.requiredASMResources[0], asmResOut.requiredASMResources[0]) | ||
} | ||
|
||
func TestInitialize(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we test what happens when we initialize twice? Might be more appropriate for integ test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IE: what happens when we initialize, agent gets shut down, and then agent starts back up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when we initialize, agent gets shut down, and then agent starts back up
so initialize is part of resource provisioning, and if the agent is restarted the resources will be reinitialized as part of the task engine's synchronizeState()
workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
questions and nits :)
asmAuthResource := asmauth.NewASMAuthResource(task.Arn, task.getAllASMAuthDataRequirements(), | ||
task.ExecutionCredentialsID, credentialsManager, resourceFields.ASMClientCreator) | ||
task.AddResource(asmauth.ResourceName, asmAuthResource) | ||
for _, container := range task.Containers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to add asm resource dependency for even containers that don't need to be pulled with ASM auth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's scoped as a task resource and gated behind if task.requiresASMDockerAuthData() ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is a task resource, but other containers(that don't need asm auth) need not wait for this resource's steady state to transition to PULLED. Isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see what you mean, this bit should be gated behind container.ShouldPullWithASMAuth() { ...
good catch, fixed.
IOUtil: ioutilwrapper.NewIOUtil(), | ||
ResourceFieldsCommon: &taskresource.ResourceFieldsCommon{ | ||
IOUtil: ioutilwrapper.NewIOUtil(), | ||
ASMClientCreator: factory.NewClientCreator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is ASMClientCreator
inside ResourceFieldsCommon
? Do other resources use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mainly to be able to test the codepaths that need these clients. it's the regionalised asm clients requirement thats driving this, the resource will carry the creator with it, but the creation of the clients is still gated behind checks that ensure the task actually needs asm clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceFields
seemed oddly OS specific when I created this. ResourceFieldsCommon
here is referring to common fields across OS's not to any "common" resources. Factories, managers etc shouldn't be OS specific imo and in the long term, there shouldn't be a need for the ResourceFieldsCommon
struct. Everything should just be in one struct. But, this is a workaround for now.
agent/asm/genereate_mocks.go
Outdated
@@ -0,0 +1,16 @@ | |||
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: generate_mocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes, fixed.
// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to | ||
// retrieve the docker auth data | ||
func GetDockerAuthFromASM(secretID string, client secretsmanageriface.SecretsManagerAPI) (docker.AuthConfiguration, error) { | ||
in := &secretsmanager.GetSecretValueInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are timeouts handled within the API call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea the sdk does
executionCredentials, ok := auth.credentialsManager.GetTaskCredentials(auth.GetExecutionCredentialsID()) | ||
if !ok { | ||
// No need to log here. managedTask.applyResourceState already does that | ||
return errors.Errorf("asm resource: unable find execution role credentials") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: This should be errors.New
and 'unable to find
'.
@richardpen: yea end to end tests are passing |
|
||
// MarshalJSON overrides the logic for JSON-encoding the ResourceStatus type | ||
func (as *ASMAuthStatus) MarshalJSON() ([]byte, error) { | ||
if as == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in taskresource/asmauth, this is treated as an error. Here, it's not. Any reason for the differing behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that's a miss, will also treat as error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code lgtm. Won't give it a shitpit as i wrote some of this code and it'd be incredibly narcissistic of me 😄 to do so!
lol fairrrrr enough |
Previously we were gating creation of factories and managers required by task resources behind the task cpu/memory limits flag. Now we gate the usage behind feature specific flags that require these dependencies.
Summary
CI tests passing here: https://github.com/adnxn/amazon-ecs-agent/commits/pra-dev
Implementation details
E2E Manual Testing
Testing
make release
)go build -out amazon-ecs-agent.exe ./agent
)make test
) passgo test -timeout=25s ./agent/...
) passmake run-integ-tests
) pass.\scripts\run-integ-tests.ps1
) passmake run-functional-tests
) pass.\scripts\run-functional-tests.ps1
) passNew tests cover the changes:
Description for the changelog
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.