Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting AWS ECR integration #59

Closed
microbioticajon opened this issue Feb 17, 2021 · 4 comments
Closed

Attempting AWS ECR integration #59

microbioticajon opened this issue Feb 17, 2021 · 4 comments

Comments

@microbioticajon
Copy link

microbioticajon commented Feb 17, 2021

Hi Guys,

Thanks for your help over on NVIDIA/pyxis#34.

Im attempting to configure enroot on an EC2 instance within my VPC to point to our ECR but am running into an issue Im struggling to debug. Im assuming it is something I have misconfigured but cannot see what...

I have confirmed the following:

  • the aws-cli is installed.
  • that the instance has sufficient IAM Role privileges to access our registry and the.
  • that aws ecr get-login-password returns a token.
  • that I can pull an image via the docker cli using the ecr-credential-helper.

(Note: I have obfuscated the ECR url)

I have created a readable credentials file at /etc/enroot/.credentials:

machine 1234.dkr.ecr.eu-west-2.amazonaws.com login AWS password $(aws ecr get-login-password)

when I call the following command I get a 404 not found error:

ENROOT_CONFIG_PATH=/etc/enroot enroot import --output bob.sqsh 'docker://1234.dkr.ecr.eu-west-2.amazonaws.com/my-repo'
[INFO] Querying registry for permission grant
[INFO] Authenticating with user: AWS
[ERROR] URL https://1234.dkr.ecr.eu-west-2.amazonaws.com/ returned error code: 404 Not Found

I hacked in a bit of additional logging to see what was going on:

docker::_authenticate() {
    local -r user="$1" registry="$2" url="$3"
    local realm= token= req_params=() resp_headers=

    # Query the registry to see if we're authorized.
    common::log INFO "Querying registry for permission grant"

    resp_headers=$(CURL_IGNORE=401 common::curl "${curl_opts[@]}" -I ${req_params[@]+"${req_params[@]}"} -- "${url}")
    common::log INFO "curl command opts: ${curl_opts[*]} ${url}"
    common::log INFO "resp_headers: ${resp_headers}"

    # If we don't need to authenticate, we're done.
    if ! grep -qi '^www-authenticate:' <<< "${resp_headers}"; then
        common::log INFO "Permission granted"
        return
    fi

    # Otherwise, craft a new token request from the WWW-Authenticate header.
    printf "%s" "${resp_headers}" | awk -F '="|",' '(tolower($1) ~ "^www-authenticate:"){
        sub(/"\r/, "", $0)
        print $2
        for (i=3; i<=NF; i+=2) print "--data-urlencode\n" $i"="$(i+1)
    }' | { common::read -r realm; readarray -t req_params; }

    if [ -z "${realm}" ]; then
        common::err "Could not parse authentication realm from ${url}"
    fi

    # If a user was specified, lookup his credentials.
    common::log INFO "Authenticating with user: ${user:-<anonymous>}"
    if [ -n "${user}" ]; then
        if grep -qs "machine[[:space:]]\+${registry}[[:space:]]\+login[[:space:]]\+${user}" "${creds_file}"; then
            common::log INFO "Using credentials from file: ${creds_file}"
            exec {fd}< <(common::evalnetrc "${creds_file}" 2> /dev/null)
            req_params+=("--netrc-file" "/proc/self/fd/${fd}")
        else
            req_params+=("-u" "${user}")
        fi
    fi

    # Request a new token.
    common::log INFO "Fetching new token"
    common::curl "${curl_opts[@]}" -G ${req_params[@]+"${req_params[@]}"} -- "${realm}" \
      | common::jq -r '.token? // .access_token? // empty' \
      | common::read -r token
    common::log INFO "Fetching new token - complete"

    [ -v fd ] && exec {fd}>&-

    # Store the new token.
    if [ -n "${token}" ]; then
        mkdir -m 0700 -p "${token_dir}"
        (umask 077 && printf 'header "Authorization: Bearer %s"' "${token}" > "${token_dir}/${registry}.$$")
        common::log INFO "Authentication succeeded"
    fi
}

and got the following output:

[INFO] Querying registry for permission grant
[INFO] curl command opts: --proto =https --retry 0 --connect-timeout 30 --max-time 0 -SsL https://1234.dkr.ecr.eu-west-2.amazonaws.com/v2/my-repo/manifests/latest
[INFO] resp_headers: HTTP/1.1 401 Unauthorized
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Basic realm="https://1234.dkr.ecr.eu-west-2.amazonaws.com/",service="ecr.amazonaws.com"
Date: Wed, 17 Feb 2021 14:14:11 GMT
Content-Length: 15
Content-Type: text/plain; charset=utf-8

[INFO] Authenticating with user: AWS
[INFO] Using credentials from file: /etc/enroot/.credentials
[INFO] Fetching new token
[ERROR] Could not process JSON input -r .token? // .access_token? // empty
[ERROR] URL --proto =https --retry 0 --connect-timeout 30 --max-time 0 -SsL -G --data-urlencode service=ecr.amazonaws.com --netrc-file /proc/self/fd/10 -- https://1234.dkr.ecr.eu-west-2.amazonaws.com/ returned error code: 404 Not Found

If I attempt to use the doker daemon, the following command does works as expected:

enroot import --output bob.sqsh 'dockerd://1234.dkr.ecr.eu-west-2.amazonaws.com/my-repo'

I feel like I am missing something simple...

I appreciate that AWS is not your primary target but any suggestions would be well received!
Best
Jon

@3XX0
Copy link
Member

3XX0 commented Feb 18, 2021

Actually I looked into it a little more, and it's kind of a mess.
It looks like ECR doesn't follow the same spec as Docker (and other registries).

I tried to hack it quickly, see if the patch below works for you.
I'm not exactly sure how to handle this cleanly yet, so this will have to do for the time being.

--- docker.sh   2021-02-18 03:04:09.439331125 -0800
+++ /usr/lib/enroot/docker.sh   2021-02-18 03:07:30.389338296 -0800
@@ -62,17 +62,26 @@ docker::_authenticate() {
         fi
     fi

-    # Request a new token.
-    common::curl "${curl_opts[@]}" -G ${req_params[@]+"${req_params[@]}"} -- "${realm}" \
-      | common::jq -r '.token? // .access_token? // empty' \
-      | common::read -r token
+    if [[ "${registry}" =~ \.amazonaws\.com$ ]] && [ -v fd ]; then
+        grep "machine[[:space:]]\+${registry}[[:space:]]\+login[[:space:]]\+${user}" <&${fd} \
+         | awk '{print "AWS:"$6}' \
+         | base64 -w 0 \
+         | common::read -r token
+       auth="Basic"
+    else
+        # Request a new token.
+        common::curl "${curl_opts[@]}" -G ${req_params[@]+"${req_params[@]}"} -- "${realm}" \
+          | common::jq -r '.token? // .access_token? // empty' \
+          | common::read -r token
+       auth="Bearer"
+    fi

     [ -v fd ] && exec {fd}>&-

     # Store the new token.
     if [ -n "${token}" ]; then
         mkdir -m 0700 -p "${token_dir}"
-        (umask 077 && printf 'header "Authorization: Bearer %s"' "${token}" > "${token_dir}/${registry}.$$")
+        (umask 077 && printf 'header "Authorization: %s %s"' "${auth}" "${token}" > "${token_dir}/${registry}.$$")
         common::log INFO "Authentication succeeded"
     fi
 }
@@ -111,7 +120,7 @@ docker::_download() {
     local -r user="$1" registry="${2:-registry-1.docker.io}" tag="${4:-latest}" arch="$5"
     local image="$3"

-    if  [[ "${image}" != */* ]]; then
+    if  [[ "${image}" != */* ]] && [[ ! "${registry}" =~ \.amazonaws\.com$ ]]; then
         image="library/${image}"
     fi

I tried with the following credentials:

machine 763104351884.dkr.ecr.us-east-1.amazonaws.com login AWS password $(enroot start -e AWS_ACCESS_KEY_ID=<ID> -e AWS_SECRET_ACCESS_KEY=<KEY> aws ecr get-login-password --region us-east-1)

Where the AWS CLI is installed this way:
enroot import -o aws.sqsh docker://amazon/aws-cli && enroot create aws.sqsh

And I could pull the following (from here):
enroot import docker://763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:1.13-cpu-py36-ubuntu16.04

@microbioticajon
Copy link
Author

microbioticajon commented Feb 22, 2021

Hi @3XX0

Many thanks for the patch - I have successfully applied the patch and can confirm that this appears to successfully authenticate against out ECR.

Im assuming that the .credentials file should look like:

machine 1234.dkr.ecr.eu-west-2.amazonaws.com login AWS password $(aws ecr get-login-password --region eu-west-2)

and not have enroot start in the login command??

srun --container-image=debian grep PRETTY /etc/os-release
pyxis: importing docker image ...
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

srun --container-image=1234.dkr.ecr.eu-west-2.amazonaws.com/baseimages/python:3.7 grep PRETTY /etc/os-release
pyxis: importing docker image ...
PRETTY_NAME="Ubuntu 18.04.4 LTS"

Frustratingly, I did come up against another issue for a few of our images:

srun --container-image=1234.dkr.ecr.eu-west-2.amazonaws.com/my/repo:2.2 grep PRETTY /etc/os-release
pyxis: importing docker image ...
slurmstepd: error: pyxis: child 410888 failed with error code: 1
slurmstepd: error: pyxis: failed to import docker image
slurmstepd: error: pyxis: printing contents of log file ...
slurmstepd: error: pyxis:     [INFO] Querying registry for permission grant
slurmstepd: error: pyxis:     [INFO] Authenticating with user: AWS
slurmstepd: error: pyxis:     [INFO] Using credentials from file: /etc/enroot/.credentials
slurmstepd: error: pyxis:     [INFO] Authentication succeeded
slurmstepd: error: pyxis:     [INFO] Fetching image manifest list
slurmstepd: error: pyxis:     [INFO] Fetching image manifest
slurmstepd: error: pyxis:     [INFO] Found all layers in cache
slurmstepd: error: pyxis:     [INFO] Extracting image layers...
slurmstepd: error: pyxis:     tar: ./dev/full: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis:     tar: ./dev/zero: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis:     tar: ./dev/tty: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis:     tar: ./dev/null: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis:     tar: ./dev/random: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis:     tar: ./dev/urandom: Cannot mknod: Operation not permitted
slurmstepd: error: pyxis: couldn't start container
slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd: error: Failed to invoke spank plugin stack

However I think this is probably enroot configuration related and Im still digging into that...

Edit: ok, it looks like the 'Cannot mknod' issue might be a problem with the images themselves not being able to run unprivileged....

Thanks again.

@3XX0
Copy link
Member

3XX0 commented Feb 23, 2021

Good to know, I will clean up the patch and merge it when I have time.

Regarding your error, /dev should be excluded, see here. Somehow the --exclude regex doesn't trigger on your image, you can try tweaking it and let me know what did the trick.

@microbioticajon
Copy link
Author

microbioticajon commented Feb 23, 2021

Thanks @3XX0

I had a quick look and hacked in the following line:

parallel --plain ${TTY_ON+--bar} -j "${ENROOT_MAX_PROCESSORS}" mkdir {\#}\; tar -C {\#} --warning=no-timestamp --anchored --exclude='dev/*' --exclude='./dev/*' \

This is working for me tho I will be honest Im not 100% sure there will be no side effects!!

I had a look at the following serverfault post which suggests a convoluted work around - I assumed you could supply tar with multiple patterns.
https://serverfault.com/questions/803831/tar-extract-a-member-reliably-with-possible-leading-dot-slash
Let me know if this seems appropriate

Edit: This was the base image that was causing us issues if you want to test for yourself:

bitnami/minideb:buster
https://hub.docker.com/r/bitnami/minideb/

Best,
Jon

@3XX0 3XX0 closed this as completed in c54edbe Apr 6, 2021
@lukeyeager lukeyeager changed the title Attempting AWS ERC integration Attempting AWS ECR integration Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants