Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace chars we can't decode with placeholder char #953

Merged
merged 2 commits into from
Jun 20, 2016

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Jun 16, 2016

It's possible that the console output from EC2 contains
non string chars we can't utf-8 decode. When that happens
we were previously just leaving the input untouched, i.e
base64 encoded.

This made for an awkward interface for customers. We couldn't guarantee
whether or not the content was base64 decoded because it would depend
on the actual ec2 console output content.

This change attempts to make things more consistent by
replacing any chars we can't decode with the unicode
\ufffd char (the unicode replace char, i.e the question mark).

The one downside with this approach is that it is possible
to sometimes not get the unaltered contents of the console
output because chars have been replaced with \ufffd.
We'll need to weigh whether or not we think that's worth
the benefit. Given this content is usually text content,
this seems like a reasonable compromise.

cc @kyleknap @JordonPhillips

It's possible that the console output from EC2 contains
non string chars we can't utf-8 decode.  When that happens
we were previously just leaving the input untouched, i.e
base64 encoded.

This made for an awkward interface for customers.  We couldn't guarantee
whether or not the content was base64 decoded because it would depend
on the actual ec2 console output content.

This change attempts to make things more consistent by
replacing any chars we can't decode with the unicode
`\ufffd` char (the unicode replace char, i.e the question mark).

The one downside with this approach is that it is possible
to sometimes not get the unaltered contents of the console
output because chars have been replaced with `\ufffd`.
We'll need to weigh whether or not we think that's worth
the benefit.  Given this content is usually text content,
this seems like a reasonable compromise.
@codecov-io
Copy link

codecov-io commented Jun 16, 2016

Current coverage is 97.27%

Merging #953 into develop will increase coverage by 0.01%

@@            develop       #953   diff @@
==========================================
  Files            43         43          
  Lines          6909       6909          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6720       6721     +1   
+ Misses          189        188     -1   
  Partials          0          0          

Powered by Codecov. Last updated by ee37bf7...742bd2e

@kyleknap
Copy link
Contributor

🚢 The only backwards incompatibility I see is if a user consistently got base64 encoded data back they would have base64 decoded it, but that seems very unlikely since it is a console output. It would also be pretty difficult and a weird workflow to accurately differentiate between base64 encoded and utf-8 encoded.

@jamesls
Copy link
Member Author

jamesls commented Jun 20, 2016

I agree. I'll add a little more info in the changelog description for this change to explain this. I do think that not having a consistent return type is reasonably in the "bug fix" bucket.

@jamesls jamesls added the pr/ready-to-merge This PR is ready to be merged. label Jun 20, 2016
@jamesls jamesls force-pushed the unicode-decode-error branch from 742bd2e to 9781a71 Compare June 20, 2016 16:55
@jamesls jamesls merged commit 9781a71 into boto:develop Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/ready-to-merge This PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants