s3api --sse-customer-key cannot accept binary data #815

quiver · 2014-06-16T15:35:59Z

This report is a feature related to the following :
http://aws.amazon.com/about-aws/whats-new/2014/06/12/amazon-s3-now-supports-server-side-encryption-with-customer-provided-keys-sse-c/

When you manipulate S3 through SSE with Customer-provided Keys, aws cli's sse-customer-key parameter can only accept utf-8 encodable data.

For AES-256, you need 256 bit key. When you generate a key using /dev/urandom (as described here http://docs.oracle.com/cd/E19253-01/816-4557/scftask-10/index.html), you end up with UnicodeDecodeError.

My understanding is that AES's key can consist of arbitrary bytes as long as its key length is correct( and that's why x-amz-copy-source-server-side-encryption-customer-key request header is base-64 encoded)

How to reproduce

$ dd if=/dev/urandom bs=1 count=32 > sse_c.key
32+0 records in
32+0 records out
32 bytes (32 B) copied, 0.000150536 s, 213 kB/s
$ touch foo.txt
$ aws s3api put-object --bucket my-bucket-name --key foo.txt --body foo.txt --sse-customer-algorithm AES256 --sse-customer-key file://sse_c.key

'utf8' codec can't decode byte 0xe7 in position 0: invalid continuation byte

AWS Version

$ aws --version
aws-cli/1.3.17 Python/2.7.3 Linux/3.2.0-23-generic

Related Issues

The text was updated successfully, but these errors were encountered:

jamesls · 2014-06-17T21:41:06Z

So the problem here is that file:// is typically meant for text types, which we by default will use the user's configured encoding from their locale settings when we read the file.

However, there are situations where a user may want to indicate that the data is binary and should not be decoded, as shown in this issue here. I think there's two general approaches:

We try to figure this out automatically. We update the code so that any parameter of type "blob" will always have the file read in binary mode, and no decoding will occur. If we do this, the example above should just work.
We can allow the user to tell us the encoding. This can be either globally across all args in a single invocation (i.e. --file-encoding=utf-8, --file-encoding=binary), or we could allow it to be specified in the file:// syntax somehow, maybe file://sse_c.key?encoding=binary, file://user-data.txt?encoding=utf8.

2 is an additive change so there shouldn't be any breaking changes, but it does put the burden on the user to specify their encodings. 1 has potential impact for things like EC2's user-data where even if local encoding != utf-8, we decode using the local encoding and then encode to utf-8. Now we'd be passing the file in its local encoding untouched. You could make the case that this is what it should have been doing all along though.

Any thoughts?

quiver · 2014-06-19T16:07:12Z

@jamesls thanks for the feedback. I'll sleep on it and get back to you tomorrow.

quiver · 2014-06-23T19:37:28Z

As for # 2, as you described, you must provide ways to specify encoding per file and file://sse_c.key?encoding=binary syntax looks too tedious and ugly for me. Is there any programs to support this kind of syntax?

1 has potential impact for things like EC2's user-data where even if local encoding != utf-8, we decode using the local encoding and then encode to utf-8.

If we switch to #1, this current behavior can cause trouble. But this behavior has existed only a month or so(from 1.3.11/May 15 #765), so we don't need to care too much about breaking backward compatibility.

Mercurial project had same encoding issues and has a good wrap-up on encoding strategy:

http://mercurial.selenic.com/wiki/EncodingStrategy

For user inputs

All user input in the form of command line arguments, configuration files, etc. are assumed to be in the local encoding.

This is same as aws-cli.

For files

we do not attempt to convert file contents to match user locales and instead preserve files intact.

This is your # 1 plan.

I'm +1 for # 1 (blob) approach.

jamesls · 2014-11-18T19:23:56Z

Related tickets:

#890
#814

kyleknap · 2014-11-20T02:06:10Z

Pull Request: #1010 should fix this issue. Closing issue.

jamesls added accepted labels Jun 16, 2014

jamesls self-assigned this Jun 16, 2014

jamesls added the s3 label Jul 16, 2014

spazm mentioned this issue Aug 21, 2014

#814 - open compressed files with binary mode #890

Closed

jamesls added confirmed and removed accepted labels Nov 18, 2014

kyleknap mentioned this issue Nov 19, 2014

Add read file in bytes using fileb:// #1010

Merged

kyleknap closed this as completed Nov 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s3api --sse-customer-key cannot accept binary data #815

s3api --sse-customer-key cannot accept binary data #815

quiver commented Jun 16, 2014

jamesls commented Jun 17, 2014

quiver commented Jun 19, 2014

quiver commented Jun 23, 2014

jamesls commented Nov 18, 2014

kyleknap commented Nov 20, 2014

s3api --sse-customer-key cannot accept binary data #815

s3api --sse-customer-key cannot accept binary data #815

Comments

quiver commented Jun 16, 2014

jamesls commented Jun 17, 2014

quiver commented Jun 19, 2014

quiver commented Jun 23, 2014

jamesls commented Nov 18, 2014

kyleknap commented Nov 20, 2014