Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssm connection plugin fails if s3 transfer bucket is server-side encrypted via KMS #127

Closed
bdellegrazie opened this issue Jul 7, 2020 · 8 comments
Labels
affects_2.10 bug This issue/PR relates to a bug needs_triage python3

Comments

@bdellegrazie
Copy link

bdellegrazie commented Jul 7, 2020

SUMMARY

While attempting to use an ssm connection where the S3 bucket has KMS server-side encryption enabled the existing code
returns an InvalidArgument response (400 status code) with the message body:

Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.
ISSUE TYPE
  • Bug Report
COMPONENT NAME

aws_ssm connection plugin

ANSIBLE VERSION
ansible 2.9.10
  config file = elided/ansible/ansible.cfg
  configured module search path = ['elided/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /elided/.direnv/python-3.8.2-venv/lib/python3.8/site-packages/ansible
  executable location = /elided/.direnv/python-3.8.2-venv/bin/ansible
  python version = 3.8.2 (default, May  9 2020, 20:43:08) [GCC 9.3.0]
CONFIGURATION
CACHE_PLUGIN(/elided/ansible/ansible.cfg) = jsonfile
CACHE_PLUGIN_CONNECTION(/elided/ansible/ansible.cfg) = .cache/facts
COLLECTIONS_PATHS(/elided/ansible/ansible.cfg) = ['/elided/ansible/collections', '/elided/ansible/galaxy_collections']
DEFAULT_GATHERING(/elided/ansible/ansible.cfg) = smart
DEFAULT_KEEP_REMOTE_FILES(env: ANSIBLE_KEEP_REMOTE_FILES) = True
DEFAULT_ROLES_PATH(/elided/ansible/ansible.cfg) = ['/elided/ansible/roles', '/elided/ansible/galaxy_roles']
INJECT_FACTS_AS_VARS(/elided/ansible/ansible.cfg) = False
INVENTORY_CACHE_ENABLED(/elided/ansible/ansible.cfg) = True
INVENTORY_CACHE_PLUGIN_CONNECTION(/elided/ansible/ansible.cfg) = .cache/inventory
RETRY_FILES_ENABLED(/elided/ansible/ansible.cfg) = False
OS / ENVIRONMENT

Controller: Ubuntu 20.04 running ansible inside a python venv with boto3 1.14.16, botocore 1.17.16
Target: Amazon Linux 2 with SSM.

STEPS TO REPRODUCE
  1. Create S3 bucket, enable server side encryption using the default account S3 key alias
    note: - it may be necessary to wait 24 hours for the bucket name to be propagated via DNS before continuing
  2. Create target Amazon Linux 2 system, install curl and enable and start SSM
  3. Confirm SSM working by connecting locally using aws ssm start-session --target <id>
  4. Run ansible playbook as below
- hosts: all
  collections:
    - community.aws
  vars:
    ansible_connection: community.aws.aws_ssm
    ansible_aws_ssm_region: your-region
    ansible_aws_ssm_bucket_name: 'your-bucket-name'
  tasks:
    - shell: echo "Hello World"
EXPECTED RESULTS

Expecting "Hello world" to be reported by Ansible.

ACTUAL RESULTS

Ansible uploads the AnsiballZ_setup.py file successfully, but on retrieval by curl within the target system the file is replaced by the body of the failed request.
In this case the following content is returned:

<i-065641ea2afa0e2b8> EXEC /usr/bin/python /home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py
<i-065641ea2afa0e2b8> _wrap_command: 'echo mZupBuwTQWDOoanBihZsCvLbXx
sudo /usr/bin/python /home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py
echo $'\n'$?
echo DEVDTjikvphuhKknRHPKGhAOkS
'
<i-065641ea2afa0e2b8> EXEC stdout line: mZupBuwTQWDOoanBihZsCvLbXx
<i-065641ea2afa0e2b8> EXEC stdout line:   File "/home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py", line 1
<i-065641ea2afa0e2b8> EXEC stdout line:     <?xml version="1.0" encoding="UTF-8"?>
<i-065641ea2afa0e2b8> EXEC stdout line:     ^
<i-065641ea2afa0e2b8> EXEC stdout line: SyntaxError: invalid syntax
<i-065641ea2afa0e2b8> EXEC stdout line: 
<i-065641ea2afa0e2b8> EXEC stdout line: 1
<i-065641ea2afa0e2b8> EXEC stdout line: DEVDTjikvphuhKknRHPKGhAOkS
<i-065641ea2afa0e2b8> POST_PROCESS:   File "/home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py", line 1
    <?xml version="1.0" encoding="UTF-8"?>
    ^
SyntaxError: invalid syntax

1
<i-065641ea2afa0e2b8> (1, '  File "/home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py", line 1\r\r\n    <?xml version="1.0" encoding="UTF-8"?>\r\r\n    ^\r\r\nSyntaxError: invalid syntax\r\r', '')
<i-065641ea2afa0e2b8> CLOSING SSM CONNECTION TO: i-065641ea2afa0e2b8
<i-065641ea2afa0e2b8> TERMINATE SSM SESSION: me@test
fatal: [i-065641ea2afa0e2b8]: FAILED! => {
    "ansible_facts": {},
    "changed": false,
    "failed_modules": {
        "setup": {
            "ansible_facts": {
                "discovered_interpreter_python": "/usr/bin/python"
            },
            "failed": true,
            "module_stderr": "",
            "module_stdout": "  File \"/home/ssm-user/.ansible/tmp/ansible-tmp-1594113678.8116896-25457-211691010476604/AnsiballZ_setup.py\", line 1\r\r\n    <?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\r\n    ^\r\r\nSyntaxError: invalid syntax\r\r",
            "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
            "rc": 1,
            "warnings": [
                "Platform linux on host i-065641ea2afa0e2b8 is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change this. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more information."
            ]
        }
    },
    "msg": "The following modules failed to execute: setup\n"
}

Contents of AnsiballZ_setup.py are:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidArgument</Code><Message>Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>198A9FFBC724E9CE</RequestId><HostId>elided</HostId></Error>

Root causes and work-around:

  1. curl commands do not specify --silent --show-error --fail so the exit code of curl does not reflect the failed http status code (400 in this case) and Ansible mistakenly continues to try to execute AnsiballZ_setup.pyas if it were a python script
  2. The function _get_url uses the client.generate_presigned_url function from Boto3 but for this to work in the presence of encrypted content it requires passing in a signature version = s3v4 as part of a config object as follows:
try:
    import boto3
    HAS_BOTO_3 = True
    from botocore.config import Config
except ImportError as e:
    HAS_BOTO_3_ERROR = str(e)
    HAS_BOTO_3 = False

...

    def _get_url(self, client_method, bucket_name, out_path, http_method):
        ''' Generate URL for get_object / put_object '''
        config = Config(signature_version='s3v4')
        client = boto3.client('s3', config=config)
        return client.generate_presigned_url(client_method, Params={'Bucket': bucket_name, 'Key': out_path}, ExpiresIn=3600, HttpMethod=http_method)
@bdellegrazie bdellegrazie changed the title ssm connection plugin fails if s3 transfer bucket is server-side encrypted ssm connection plugin fails if s3 transfer bucket is server-side encrypted via KMS Jul 7, 2020
@bdellegrazie
Copy link
Author

bdellegrazie commented Jul 7, 2020

Further, it is possible to get the connection to work out-of-the-box by modifying the aws profile that boto3 uses as follows:

[profile profile-name]
s3 =
    signature_version = s3v4

What surprises me is that this is supposed to be the default configuration for boto3 as per Boto Configuration

Digging even further into the codebase the documentation for botocore does not mention the s3.signature_version as an option:
Botocore config

@bdellegrazie
Copy link
Author

bdellegrazie commented Jul 7, 2020

Digging further into AWS documentation:
boto/boto3#1982 and
boto/boto3#1644 (comment)

There is an up to 24 hour period when the bucket is first created where the virtual pre-signed URL won't work due to the DNS name needing to be propagated globally and the redirect causing a CORS problem.
This can be worked around by forcing the endpoint to be the region where the bucket is created (config property endpoint_url) but that assumes the bucket URL is accessible in the target region.

In essence, for this to work reliably (except during up to 24 hours after bucket creation):

  • region_name needs to be supplied to the s3 client, possibly with an endpoint_url override.
  • the signature version needs to be explicitly set to s3v4

@abeluck
Copy link

abeluck commented Sep 3, 2020

Not only does it fail when the bucket is server-side encrypted, but some regions require signature version s3v4 and just don't accept anything else (such as eu-central-1) (ref).

Here is a patch that works for me.

with endpoint override

assumes bucket is in the same region as the instance

--- a/plugins/connection/aws_ssm.py	2020-09-03 18:36:37.309000000 +0200
+++ b/plugins/connection/aws_ssm.py	2020-09-03 18:37:57.748000000 +0200
@@ -158,6 +158,7 @@
 
 try:
     import boto3
+    from botocore.client import Config
     HAS_BOTO_3 = True
 except ImportError as e:
     HAS_BOTO_3_ERROR = str(e)
@@ -483,7 +484,13 @@
 
     def _get_url(self, client_method, bucket_name, out_path, http_method):
         ''' Generate URL for get_object / put_object '''
-        client = boto3.client('s3')
+        config = Config(signature_version='s3v4',
+                region_name=self.get_option('region'),
+                s3={'addressing_style': 'virtual'}
+            )
+        client = boto3.client('s3',
+                endpoint_url='https://s3.{0}.amazonaws.com'.format(self.get_option('region')),
+                config=config)
         return client.generate_presigned_url(client_method, Params={'Bucket': bucket_name, 'Key': out_path}, ExpiresIn=3600, HttpMethod=http_method)
 
     @_ssm_retry
@@ -499,9 +506,9 @@
             get_command = "Invoke-WebRequest '%s' -OutFile '%s'" % (
                 self._get_url('get_object', self.get_option('bucket_name'), s3_path, 'GET'), out_path)
         else:
-            put_command = "curl --request PUT --upload-file '%s' '%s'" % (
+            put_command = "curl --show-error --silent --fail --request PUT --upload-file '%s' '%s'" % (
                 in_path, self._get_url('put_object', self.get_option('bucket_name'), s3_path, 'PUT'))
-            get_command = "curl '%s' -o '%s'" % (
+            get_command = "curl --show-error --silent --fail '%s' -o '%s'" % (
                 self._get_url('get_object', self.get_option('bucket_name'), s3_path, 'GET'), out_path)
 
         client = boto3.client('s3')

without endpoint override

--- a/plugins/connection/aws_ssm.py	2020-09-03 18:36:37.309000000 +0200
+++ b/plugins/connection/aws_ssm.py	2020-09-03 18:37:57.748000000 +0200
@@ -158,6 +158,7 @@
 
 try:
     import boto3
+    from botocore.client import Config
     HAS_BOTO_3 = True
 except ImportError as e:
     HAS_BOTO_3_ERROR = str(e)
@@ -483,7 +484,11 @@
 
     def _get_url(self, client_method, bucket_name, out_path, http_method):
         ''' Generate URL for get_object / put_object '''
-        client = boto3.client('s3')
+        config = Config(signature_version='s3v4',
+                region_name=self.get_option('region')
+            )
+        client = boto3.client('s3',
+                config=config)
         return client.generate_presigned_url(client_method, Params={'Bucket': bucket_name, 'Key': out_path}, ExpiresIn=3600, HttpMethod=http_method)
 
     @_ssm_retry
@@ -499,9 +504,9 @@
             get_command = "Invoke-WebRequest '%s' -OutFile '%s'" % (
                 self._get_url('get_object', self.get_option('bucket_name'), s3_path, 'GET'), out_path)
         else:
-            put_command = "curl --request PUT --upload-file '%s' '%s'" % (
+            put_command = "curl --show-error --silent --fail --request PUT --upload-file '%s' '%s'" % (
                 in_path, self._get_url('put_object', self.get_option('bucket_name'), s3_path, 'PUT'))
-            get_command = "curl '%s' -o '%s'" % (
+            get_command = "curl --show-error --silent --fail '%s' -o '%s'" % (
                 self._get_url('get_object', self.get_option('bucket_name'), s3_path, 'GET'), out_path)
 
         client = boto3.client('s3')

abeluck added a commit to abeluck/community.aws that referenced this issue Sep 3, 2020
…ections#127)

* always use signature version 4
* pass region to the bucket client
* detect when curl fails and abort appropriately

Some regions only support signature v4, and any bucket that is encrypted
also requires v4 signatures. Likewise some regions require the
region_name passed.
abeluck added a commit to abeluck/community.aws that referenced this issue Sep 3, 2020
…ections#127)

* always use signature version 4
* pass region to the bucket client
* detect when curl fails and abort appropriately

Some regions only support signature v4, and any bucket that is encrypted
also requires v4 signatures. Likewise some regions require the
region_name passed.
abeluck added a commit to abeluck/community.aws that referenced this issue Sep 3, 2020
…ections#127)

* always use signature version 4
* pass region to the bucket client
* detect when curl fails and abort appropriately

Some regions only support signature v4, and any bucket that is encrypted
also requires v4 signatures. Likewise some regions require the
region_name passed.
@abeluck
Copy link

abeluck commented Sep 6, 2020

I've preliminary patches for this issue (and other aws_ssm connection plugin issues) in my fork.

@piotrplenik
Copy link

Hi @abeluck
thanks for your work on it.
Did you prepared PR created with fix?

@abeluck
Copy link

abeluck commented Jan 21, 2021

@jupeter no I did not create a PR. The patches I posted and linked above worked at the time. We moved away from the ssm connection plugin because of this and other stability issues, it just wasn't ready for production (in our experience).

@nikolai-derzhak-distillery
Copy link

nikolai-derzhak-distillery commented Jan 26, 2021

So what we gonna do ? I am deploying cluster in eu-central-1 - what would be most elegant fix so far ?

Oh I see it in repo: https://github.com/ansible-collections/community.aws/blob/main/plugins/connection/aws_ssm.py#L520

Not sure why galaxy collection is so old though ...

"version": "1.3.0",

@goneri
Copy link
Member

goneri commented Feb 10, 2021

Hi all,

The problem has been fixed by #352 and it will be part of the 1.4.0 release. Feel free to reopen if you think we've missed something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.10 bug This issue/PR relates to a bug needs_triage python3
Projects
None yet
Development

No branches or pull requests

6 participants