Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dogstatsd] support unicode tags 🔣 #152

Merged
merged 1 commit into from
Aug 24, 2016
Merged

Conversation

yannmh
Copy link
Member

@yannmh yannmh commented Aug 24, 2016

Using unicode with DogStatsD tags causes UnicodeDecodeError
exceptions.

Do not raise on unicode tags. Fix #132.

@yannmh yannmh added this to the 0.13.0 milestone Aug 24, 2016
@@ -103,8 +103,8 @@ def test_tagged_gauge(self):
t.assert_equal('gt:123.4|g|#country:china,age:45,blue', self.recv())

def test_tagged_counter(self):
self.statsd.increment('ct', tags=['country:canada', 'red'])
t.assert_equal('ct:1|c|#country:canada,red', self.recv())
self.statsd.increment('ct', tags=[u'country:españa', 'red'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right 💯

@degemer
Copy link
Member

degemer commented Aug 24, 2016

👍

Using unicode with DogStatsD tags causes `UnicodeDecodeError`
exceptions.

Do not raise on unicode tags. Fix #132.
@yannmh yannmh force-pushed the yann/dogstatsd-unicode-tags branch from e2ae0ff to ccbd94d Compare August 24, 2016 20:14
@yannmh yannmh merged commit 8b91432 into master Aug 24, 2016
@yannmh yannmh deleted the yann/dogstatsd-unicode-tags branch August 24, 2016 20:25
@Julian
Copy link

Julian commented Sep 13, 2016

This doesn't seem correct to me. I'm trying to diagnose a related issue (I'll update when I do), but from a very quick read, IMHO #132 should be wontfixed and this backed out. tags should either be unicode or bytes, and the current behavior asks for bytes. This changeset will break existing tags.

@Julian
Copy link

Julian commented Sep 13, 2016

OK, it looks like the current behavior actually isn't consistent about which it's after, so I'm not sure what's intended. I can survey what the behavior is on different versions I guess after finding my own issue. This patchset is certainly assuming that everything passed to dogstatsd is unicode though (or accidentally-ASCII-encoded-bytes-on-py2).

@yannmh
Copy link
Member Author

yannmh commented Sep 13, 2016

Hi @Julian,
I am not sure to understand exactly what issue you are facing. Could you give us more details about it please ? For instance

  • Steps to reproduce
  • Results you received versus the ones you would expect

Thanks!

@Julian
Copy link

Julian commented Sep 13, 2016

@yannmh I've had to switch to debugging something else, I'll try to circle back and provide more detail, but this patchset changed something from calling str to calling unicode. That's not really safe -- the former fails when you call it on non-ASCII representable unicode, the latter fails in the inverse case, on non-ASCII representable bytes. So either the string literal, and the surrounding code here, are misleading, and should be unicode objects (and probably the documentation updated to say that tags should always be unicode objects now), or something has changed here.

You can reproduce by e.g. passing in a tag of "foo:שלום", i.e. bytes, instead of u"foo:שלום". It did look like however that the former is already broken because the DogStatsd object for some reason wants to call packet.encode("utf-8") on everything going over the socket, instead of just asking for bytes -- I'm not really sure what the intention is -- it seems likely that it's to accept unicode everywhere (that's not the decision I would have made, but maybe it's the one that exists already). I'll try to provide some more details if I get some free time to read through what's here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants