Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3api --sse-customer-key cannot accept binary data #815

Closed
quiver opened this issue Jun 16, 2014 · 5 comments
Closed

s3api --sse-customer-key cannot accept binary data #815

quiver opened this issue Jun 16, 2014 · 5 comments
Assignees
Labels
bug This issue is a bug. s3

Comments

@quiver
Copy link
Contributor

quiver commented Jun 16, 2014

This report is a feature related to the following :
http://aws.amazon.com/about-aws/whats-new/2014/06/12/amazon-s3-now-supports-server-side-encryption-with-customer-provided-keys-sse-c/

When you manipulate S3 through SSE with Customer-provided Keys, aws cli's sse-customer-key parameter can only accept utf-8 encodable data.

For AES-256, you need 256 bit key. When you generate a key using /dev/urandom (as described here http://docs.oracle.com/cd/E19253-01/816-4557/scftask-10/index.html), you end up with UnicodeDecodeError.

My understanding is that AES's key can consist of arbitrary bytes as long as its key length is correct( and that's why x-amz-copy-source​-server-side​-encryption​-customer-key request header is base-64 encoded)

How to reproduce

$ dd if=/dev/urandom bs=1 count=32 > sse_c.key
32+0 records in
32+0 records out
32 bytes (32 B) copied, 0.000150536 s, 213 kB/s
$ touch foo.txt
$ aws s3api put-object --bucket my-bucket-name --key foo.txt --body foo.txt --sse-customer-algorithm AES256 --sse-customer-key file://sse_c.key

'utf8' codec can't decode byte 0xe7 in position 0: invalid continuation byte

AWS Version

$ aws --version
aws-cli/1.3.17 Python/2.7.3 Linux/3.2.0-23-generic

Related Issues

@jamesls jamesls self-assigned this Jun 16, 2014
@jamesls
Copy link
Member

jamesls commented Jun 17, 2014

So the problem here is that file:// is typically meant for text types, which we by default will use the user's configured encoding from their locale settings when we read the file.

However, there are situations where a user may want to indicate that the data is binary and should not be decoded, as shown in this issue here. I think there's two general approaches:

  1. We try to figure this out automatically. We update the code so that any parameter of type "blob" will always have the file read in binary mode, and no decoding will occur. If we do this, the example above should just work.
  2. We can allow the user to tell us the encoding. This can be either globally across all args in a single invocation (i.e. --file-encoding=utf-8, --file-encoding=binary), or we could allow it to be specified in the file:// syntax somehow, maybe file://sse_c.key?encoding=binary, file://user-data.txt?encoding=utf8.

2 is an additive change so there shouldn't be any breaking changes, but it does put the burden on the user to specify their encodings. 1 has potential impact for things like EC2's user-data where even if local encoding != utf-8, we decode using the local encoding and then encode to utf-8. Now we'd be passing the file in its local encoding untouched. You could make the case that this is what it should have been doing all along though.

Any thoughts?

@quiver
Copy link
Contributor Author

quiver commented Jun 19, 2014

@jamesls thanks for the feedback. I'll sleep on it and get back to you tomorrow.

@quiver
Copy link
Contributor Author

quiver commented Jun 23, 2014

As for # 2, as you described, you must provide ways to specify encoding per file and file://sse_c.key?encoding=binary syntax looks too tedious and ugly for me. Is there any programs to support this kind of syntax?

1 has potential impact for things like EC2's user-data where even if local encoding != utf-8, we decode using the local encoding and then encode to utf-8.

If we switch to #1, this current behavior can cause trouble. But this behavior has existed only a month or so(from 1.3.11/May 15 #765), so we don't need to care too much about breaking backward compatibility.

Mercurial project had same encoding issues and has a good wrap-up on encoding strategy:

http://mercurial.selenic.com/wiki/EncodingStrategy

For user inputs

All user input in the form of command line arguments, configuration files, etc. are assumed to be in the local encoding.

This is same as aws-cli.

For files

we do not attempt to convert file contents to match user locales and instead preserve files intact.

This is your # 1 plan.


I'm +1 for # 1 (blob) approach.

@jamesls
Copy link
Member

jamesls commented Nov 18, 2014

Related tickets:

#890
#814

@kyleknap
Copy link
Contributor

Pull Request: #1010 should fix this issue. Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. s3
Projects
None yet
Development

No branches or pull requests

3 participants