Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding private registry auth as task resource #1427

Merged
merged 12 commits into from
Jul 10, 2018
Merged

adding private registry auth as task resource #1427

merged 12 commits into from
Jul 10, 2018

Conversation

adnxn
Copy link
Contributor

@adnxn adnxn commented Jul 2, 2018

Summary

CI tests passing here: https://github.com/adnxn/amazon-ecs-agent/commits/pra-dev

Implementation details

E2E Manual Testing

  1. execution role does not have appropriate permissions → task is stopped, task reason error msg is set
  2. asm resource does not exist → task is stopped, task reason error msg is set
  3. asm resource has malformed key names → task is stopped, task reason error msg is set
  4. asm resource has empty values for username/password → task is stopped, task reason error msg is set
  5. the auth data is incorrect, wrong username, password → agent will attempt to start container, but auth will fail, task will be stopped
  6. ensure the cache is working. a task with multiple containers using the same asm resource to authenticate will only make a single service call and cache the auth data

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

New tests cover the changes:

Description for the changelog

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

adnxn and others added 8 commits June 7, 2018 09:12
This change adds a terminalReason message to the task struct. The
intended use is for the field in the task to be set when the agent
explicitly transitions a task to stopped
WIP COMMENT:
1. pra resource still needs to implement the marshalling logic + add some
concept of durable/not durable resource type for resources that have to
be recreated on agent restart.

2. the asm package, needs some refactoring. needs to add some kind of
create client factor (for testing) since we're building regional clients
for these calls

3. the STSC "reason" field changes have to be consumed by this feature
since we're using that to communicate async pra failiures
Added a new factory interface in asm package, which creates ASM clients
based on the region in ASMAuthConfig. Wired this into the ResourceFields
struct in the taskresource package as well.
Move ContainerStatus and TaskStatus into container/status and
task/status respectively. This is required for TaskStatus to be used by
the TaskResource struct, to avoid circular dependecies.
Implemented the Initialize() method for ASM Auth resource, which helps
with retrieving registry creds on agent restart.

Also, added a factory for creating new ASM SDK clients.
This commit add tests for ASMAuthStatus and also engine test for
failiure cases in private registry over ASM code path.

This also adds the agent capability
for private-registry-authentication.secretsmanager

Also includes lint edits.
This commit branches the docker clients auth data handling logic, ecr
and asm auth are handled seperatly.

Also includes changes to propagate resource creation failiure reason up
to task terminal reason field.
This commit adds a code to check the asm resource cache before making a
call to asm.

Also adds changes to populate STSC reason field when a stopped
containers force a task stop.
@adnxn adnxn requested a review from petderek July 3, 2018 18:20
resourceFields *taskresource.ResourceFields) {
asmAuthResource := asmauth.NewASMAuthResource(task.Arn, task.getAllASMAuthDataRequirements(),
task.ExecutionCredentialsID, credentialsManager, resourceFields.ASMClientCreator)
task.AddResource(asmauth.ResourceName, asmAuthResource)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMarshalUnmarshalJSON of asmauth doesn't test the marshal/unmarshal from state file, as resources are stored as an interface in the task, you will need to make sure correct struct was used during unmarshal, like this and this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMarshalUnmarshalJSON of asmauth doesn't test the marshal/unmarshal from state file

okay that makes sense. adding test.


// ShouldPullWithASMAuth returns true if this container needs to retrieve
// private registry authentication data from ASM
func (c *Container) ShouldPullWithASMAuth() bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test for this?

@@ -202,6 +212,11 @@ func (task *Task) PostUnmarshalTask(cfg *config.Config,
return apierrors.NewResourceInitError(task.Arn, err)
}
}

if task.requiresASMDockerAuthData() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this covered by a unit test?

Copy link
Contributor Author

@adnxn adnxn Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added TestPostUnmarshalTaskASMDockerAuth in task_test.go

@@ -1214,6 +1263,50 @@ func (task *Task) AddResource(resourceType string, resource taskresource.TaskRes
task.ResourcesMapUnsafe[resourceType] = append(task.ResourcesMapUnsafe[resourceType], resource)
}

// SetTerminalReason sets the terminalReason string and this can only be set
// once per the task's lifecycle. This field does not accept updates.
func (task *Task) SetTerminalReason(reason string) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a unit test.

}

// PopulateASMAuthData sets docker auth credentials for a container
func (task *Task) PopulateASMAuthData(container *apicontainer.Container) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this covered by an existing test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's captured by image pull tests, but added unit tests specifically for this.


// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to
// retrieve the docker auth data
func GetDockerAuthFromASM(secretID string, client secretsmanageriface.SecretsManagerAPI) (docker.AuthConfiguration, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a unit test for both happy path and failure case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added asm_test.go, somehow was missed during merge 🤔

}

func TestMarshalASMAuthStatus(t *testing.T) {
status := ASMAuthStatusNone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test "ASMAuthStatusCreated"? And this can be combined together with below using table test.

SomeStatus ASMAuthStatus `json:"status"`
}

func TestUnmarshalASMAuthStatus(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these tests combined together with table tests?

})
}

func (auth *ASMAuthResource) UnmarshalJSON(b []byte) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments

ExecutionCredentialsID string `json:"executionCredentialsID"`
}

func (auth *ASMAuthResource) MarshalJSON() ([]byte, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint: please add comments.

@adnxn
Copy link
Contributor Author

adnxn commented Jul 6, 2018

@richardpen added tests you called out, this was very helpful thanks 😁

This commit adds unit tests to code paths that are exercised using the
private registry auth over asm feature.
Copy link

@richardpen richardpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code mostly LGTM, did you have the end to end test pass?

return authConfig, CannotPullECRContainerError{err}

switch authData.Type {
case "ecr":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the constant defined above, same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making those constants exportable then.

seelog.Errorf("Task engine [%s]: unable to acquire Docker registry credentials for container [%s]",
task.Arn, container.Name)
return dockerapi.DockerContainerMetadata{
Error: dockerapi.CannotPullECRContainerError{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CannotPullECRContainerError seems confusion here, as this is not necessary pulling from ECR, can you create a new error type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea you're right, i'm going to add some distinct error types here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a CannotPullContainerAuthError type

c.lock.RLock()
defer c.lock.RUnlock()

return c.RegistryAuthentication != nil &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this, but would prefer it to be expressed more explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer it to be expressed more explicitly.

how do you mean exactly? i dont follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to make it so that it isn't a multi-line boolean expression

// express or implied. See the License for the specific language governing
// permissions and limitations under the License.

package asm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does asm mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Password *string
}

// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i get it now. but add a package doc note maybe?

agent/asm/asm.go Outdated
err := json.Unmarshal([]byte(secretValue), &authDataValue)
if err != nil {
// could not unmarshal, incorrect secret value schema
return docker.AuthConfiguration{}, errors.Wrapf(err,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always safe to print this err? Does the json unmarshal error sometimes return parts of the original string? I'd lean toward not printing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean toward not printing it.

why exactly? this is an error that would be passed back up to CP as reason for why the the auth data wasn't acquired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might print some of the secret text, which we don't want.

Copy link
Contributor Author

@adnxn adnxn Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might print some of the secret text, which we don't want.

hm i can't seem to find anything that confirms that errors will not contain partials. probably best to leave it out then?

ShouldError: false,
},
{
Name: "MissingUsername",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is username always required for all repos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it needs authentication, then it needs username and password.

ShouldError: true,
},
{
Name: "MalformedJson",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about empty json like {} or [] ? Or is that covered by the other empty cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be captured under "...malformed empty field", but added an explicit case w/ {}

@@ -0,0 +1,143 @@
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//+build unit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss this, I didn't see this was added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea missed it, added now.

}
gomock.InOrder(
credentialsManager.EXPECT().GetTaskCredentials(executionCredentialsID).Return(creds, true),
asmClientCreator.EXPECT().NewASMClient(region, iamRoleCreds).Return(mockASMClient),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a solution for you here, but we need to be careful with how many layers we go especially with new code. Remember:

Every time a mock returns a mock a fairy dies *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea ended up adding this to be able to test more complete workflows like post unmarshall or the image pull workflows.

asmResOut := &ASMAuthResource{}
err = json.Unmarshal(bytes, asmResOut)
require.NoError(t, err)
assert.Equal(t, asmResIn.taskARN, asmResOut.taskARN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert does deep equal so assert.Equal(t,asmResIn,asmResOut) should work for you (I think?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh why not check field by field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its fine as is -- its just more concise the other way.

/// this... 
assert.Equal(t, expectedStruct, actualStruct)

/// instead of ...
assert.Equal(t, expectedStruct.field1, actualStruct.field1)
assert.Equal(t, expectedStruct.field2, actualStruct.field2)
assert.Equal(t, expectedStruct.field3, actualStruct.field3)
// ...

assert.Equal(t, asmResIn.requiredASMResources[0], asmResOut.requiredASMResources[0])
}

func TestInitialize(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we test what happens when we initialize twice? Might be more appropriate for integ test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE: what happens when we initialize, agent gets shut down, and then agent starts back up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when we initialize, agent gets shut down, and then agent starts back up

so initialize is part of resource provisioning, and if the agent is restarted the resources will be reinitialized as part of the task engine's synchronizeState() workflow.

Copy link
Contributor

@sharanyad sharanyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

questions and nits :)

asmAuthResource := asmauth.NewASMAuthResource(task.Arn, task.getAllASMAuthDataRequirements(),
task.ExecutionCredentialsID, credentialsManager, resourceFields.ASMClientCreator)
task.AddResource(asmauth.ResourceName, asmAuthResource)
for _, container := range task.Containers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add asm resource dependency for even containers that don't need to be pulled with ASM auth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's scoped as a task resource and gated behind if task.requiresASMDockerAuthData() ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is a task resource, but other containers(that don't need asm auth) need not wait for this resource's steady state to transition to PULLED. Isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see what you mean, this bit should be gated behind container.ShouldPullWithASMAuth() { ...

good catch, fixed.

IOUtil: ioutilwrapper.NewIOUtil(),
ResourceFieldsCommon: &taskresource.ResourceFieldsCommon{
IOUtil: ioutilwrapper.NewIOUtil(),
ASMClientCreator: factory.NewClientCreator(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ASMClientCreator inside ResourceFieldsCommon? Do other resources use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly to be able to test the codepaths that need these clients. it's the regionalised asm clients requirement thats driving this, the resource will carry the creator with it, but the creation of the clients is still gated behind checks that ensure the task actually needs asm clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceFields seemed oddly OS specific when I created this. ResourceFieldsCommon here is referring to common fields across OS's not to any "common" resources. Factories, managers etc shouldn't be OS specific imo and in the long term, there shouldn't be a need for the ResourceFieldsCommon struct. Everything should just be in one struct. But, this is a workaround for now.

@@ -0,0 +1,16 @@
// Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: generate_mocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yikes, fixed.

// GetDockerAuthFromASM makes the api call to the AWS Secrets Manager service to
// retrieve the docker auth data
func GetDockerAuthFromASM(secretID string, client secretsmanageriface.SecretsManagerAPI) (docker.AuthConfiguration, error) {
in := &secretsmanager.GetSecretValueInput{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are timeouts handled within the API call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea the sdk does

executionCredentials, ok := auth.credentialsManager.GetTaskCredentials(auth.GetExecutionCredentialsID())
if !ok {
// No need to log here. managedTask.applyResourceState already does that
return errors.Errorf("asm resource: unable find execution role credentials")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: This should be errors.New and 'unable to find'.

@adnxn
Copy link
Contributor Author

adnxn commented Jul 6, 2018

Code mostly LGTM, did you have the end to end test pass?

@richardpen: yea end to end tests are passing


// MarshalJSON overrides the logic for JSON-encoding the ResourceStatus type
func (as *ASMAuthStatus) MarshalJSON() ([]byte, error) {
if as == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in taskresource/asmauth, this is treated as an error. Here, it's not. Any reason for the differing behavior here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a miss, will also treat as error.

Copy link
Contributor

@aaithal aaithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm. Won't give it a shitpit as i wrote some of this code and it'd be incredibly narcissistic of me 😄 to do so!

@adnxn
Copy link
Contributor Author

adnxn commented Jul 9, 2018

Code lgtm. Won't give it a shitpit as i wrote some of this code and it'd be incredibly narcissistic of me 😄 to do so!

lol fairrrrr enough

Previously we were gating creation of factories and managers required by
task resources behind the task cpu/memory limits flag. Now we gate the
usage behind feature specific flags that require these dependencies.
@adnxn adnxn merged commit 4388e84 into aws:dev Jul 10, 2018
adnxn added a commit that referenced this pull request Jul 10, 2018
@adnxn adnxn deleted the pra-dev branch October 19, 2018 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants