Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu functional testing #1744

Merged
merged 1 commit into from
Dec 27, 2018
Merged

gpu functional testing #1744

merged 1 commit into from
Dec 27, 2018

Conversation

sharanyad
Copy link
Contributor

@sharanyad sharanyad commented Dec 14, 2018

Summary

A simple GPU functional test that verifies if the right number of GPU devices are assigned to the task's container
NOTE: this cannot be merged until backend changes are in prod(test passes in gamma)

Implementation details

Check if instance is of the type p2/p3/g3, set the config var ECS_ENABLE_GPU_SUPPORT and bind mount the gpu info file in the instance created by init to the functional test's agent container.
Verify if two gpus are assigned to an nvidia cuda container. **

** For the test, use a GPU instance that has atleast 2 Nvidia GPUs

Testing

  • Builds on Linux (make release)
  • Builds on Windows (go build -out amazon-ecs-agent.exe ./agent)
  • Unit tests on Linux (make test) pass
  • Unit tests on Windows (go test -timeout=25s ./agent/...) pass
  • Integration tests on Linux (make run-integ-tests) pass
  • Integration tests on Windows (.\scripts\run-integ-tests.ps1) pass
  • Functional tests on Linux (make run-functional-tests) pass
  • Functional tests on Windows (.\scripts\run-functional-tests.ps1) pass

Ran the test manually on p2,p3,g3 instances and it passes

Test output:

$ go test -v -tags functional -timeout 5m ./agent/functional_tests/... -run=TestRunGPUTask
?   	github.com/aws/amazon-ecs-agent/agent/functional_tests/generators	[no test files]
=== RUN   TestRunGPUTask
--- PASS: TestRunGPUTask (17.16s)
	utils_unix.go:130: Created directory /tmp/ecs_integ_testdata287691962 to store test data in
	utils_unix.go:146: Launching agent with image: amazon/amazon-ecs-agent:make
	utils_unix.go:233: Agent started as docker container: e759828da041456bbe2e9a02136a8eca47b490577b35a96306b3aeb65fe8a4ef
	utils.go:165: Found agent metadata: {Cluster:ecs-functional-tests ContainerInstanceArn:0xc420241630 Version:Amazon ECS Agent - v1.23.0 (*4451de4b)}
	utils.go:186: Task definition: ecsftest-nvidia-gpu-a56e78224672d662fdd4b5afea8cda9d:1
	utils.go:206: Started task: arn:aws:ecs:us-west-2:task/7bac22fc-ff75-41cf-a5e4-79fb577bec88
	utils.go:175: Removing test dir for passed test /tmp/ecs_integ_testdata287691962
PASS
ok  	github.com/aws/amazon-ecs-agent/agent/functional_tests/tests	17.272s

New tests cover the changes: yes

Description for the changelog

N/A

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sharanyad sharanyad changed the title [wip] gpu functional testing gpu functional testing Dec 25, 2018
}],
"command": ["sh", "-c", "nvidia-smi -L | wc -l | grep \"2\" && exit 42 || exit 1"]
}]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at the end of file

@@ -1245,3 +1246,42 @@ func TestSSMSecretsEncryptedASMSecrets(t *testing.T) {
exitCode, _ := task.ContainerExitcode("ssmsecrets-environment-variables")
assert.Equal(t, 42, exitCode, fmt.Sprintf("Expected exit code of 42; got %d", exitCode))
}

// Note: This functional test requires ECS GPU instance which has atleast 4 GPUs
// Please use instance like p3.8xlarge for running this test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to confirm this with justin/jake to see if this will work well with the release tests

"type":"GPU",
"value": "2"
}],
"command": ["sh", "-c", "nvidia-smi -L | wc -l | grep \"2\" && exit 42 || exit 1"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep \"2\" seems to be to broad, 20/22/12 will also pass the test, is there a way to check if it's exact 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use grep -w for absolute match

iid, _ := ec2.NewEC2MetadataClient(nil).InstanceIdentityDocument()
for _, gpuInstance := range gpuInstances {
if strings.HasPrefix(iid.InstanceType, gpuInstance) {
// GPU test should only run on p2/p3 ECS instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p2/p3/g3

@sharanyad
Copy link
Contributor Author

TestOOMContainer failing, being tracked here - #1763
Merging.

@sharanyad sharanyad merged commit 5cd7723 into aws:gpu-support Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants