Better handle different encodings #316

p-lambert · 2018-01-16T20:09:54Z

This PR provides Datadog::Utils.utf8_encode utility method.
We're also refactoring parts of the code that used to perform encoding operations.

delner

Looking pretty good. Couple of minor comments, and would suggest adding some unit tests with a variety of inputs for #utf8_encode.

delner · 2018-01-16T20:18:47Z

lib/ddtrace/utils.rb

+      elsif options[:binary]
+        # This option is useful for "gracefully" displaying binary data that
+        # often contains text such as marshalled objects
+        str.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')


Non-blocking comment.

I think as its written, this is readable and acceptable.

However, I would suggest that perhaps, given the different combinations of encodings and options, the responsibilites of #utf8_encode is actually broader than it looks, and that binary encoding itself might warrant its own function to simplify #utf8_encode's responsibilities.

Breaking binary encoding out into its own function would provide the benefit of directly testing binary encoding itself via unit tests. I do realize the proposed function would be one line, and likely not contribute anything to the readability of this already satisfactory code, so I defer to your judgment whether you think this separation of responsibility is warranted by its benefits or not.

One of the reasons we're experiencing encoding issues is that we often resort to different approaches on how to solve this. I don't think we should have more than one method in the public API doing that.

Considering that, I don't understand what's the gain of having another method for testing purposes as opposed to assertions with binary: true options. Both can provide the same amount of test granularity.

delner · 2018-01-16T20:21:36Z

lib/ddtrace/utils.rb


-    reset!
+    STRING_PLACEHOLDER = ''.freeze


Minor, but I think maybe this could be defined at the top of the file? (As is more conventional.)

palazzem · 2018-01-16T23:14:29Z

lib/ddtrace/contrib/dalli/quantize.rb

-        rescue ::Encoding::CompatibilityError
-          "#{operation} BLOB (OMITTED)"
+        rescue => e
+          Tracer.log.error("Error sanitizing Dalli operation: #{e}")


can you set this to debug? otherwise we may hit a critical path with a lot of log entries.

palazzem · 2018-01-16T23:14:40Z

lib/ddtrace/utils.rb

+        str.encode(::Encoding::UTF_8)
+      end
+    rescue => e
+      Tracer.log.error("Error encoding string in UTF-8: #{e}")


palazzem

Looks good to me! We're going to merge this one as a part of our 0.11.0 release!

palazzem · 2018-01-17T14:44:08Z

lib/ddtrace/error.rb

-      value.encode(::Encoding::UTF_8)
+      @type = Utils.utf8_encode(type)
+      @message = Utils.utf8_encode(message)
+      @backtrace = Utils.utf8_encode(backtrace)


palazzem · 2018-01-17T14:46:39Z

test/utils_test.rb

+
+    assert_equal(::Encoding::UTF_8, Datadog::Utils.utf8_encode(str).encoding)
+
+    # we don't allocate new objects when a valid UTF-8 string is provided


Great test! this ensures that we don't introduce any performance slowdown for common cases.

p-lambert added 2 commits January 16, 2018 11:38

Handle non UTF-8 compatible values for Error object

8211ede

Add Datadog::Utils.utf8_encode method

ad437cc

delner reviewed Jan 16, 2018

View reviewed changes

p-lambert added 2 commits January 16, 2018 15:44

Use Utils.utf8_encode for sanitization and quantization

f912cba

Add test cases for Utils.utf8_encode

d8385df

p-lambert force-pushed the pedro/handle-enconding-problem branch from df3b6e3 to d8385df Compare January 16, 2018 20:44

Move constant to top of the file

0a4b7b9

p-lambert force-pushed the pedro/handle-enconding-problem branch 2 times, most recently from 505301b to ffc8613 Compare January 16, 2018 22:34

delner added this to the 0.11.0 milestone Jan 16, 2018

delner added bug Involves a bug core Involves Datadog core libraries labels Jan 16, 2018

palazzem suggested changes Jan 16, 2018

View reviewed changes

Fix tests

cda0d3d

p-lambert force-pushed the pedro/handle-enconding-problem branch from ffc8613 to cda0d3d Compare January 16, 2018 23:17

Use debug log level for encoding issues

76bfea9

p-lambert force-pushed the pedro/handle-enconding-problem branch from 7684d73 to 76bfea9 Compare January 16, 2018 23:41

palazzem approved these changes Jan 17, 2018

View reviewed changes

palazzem merged commit 79bfe4b into master Jan 17, 2018

palazzem deleted the pedro/handle-enconding-problem branch January 17, 2018 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handle different encodings #316

Better handle different encodings #316

p-lambert commented Jan 16, 2018 •

edited

Loading

delner left a comment

delner Jan 16, 2018

p-lambert Jan 16, 2018

delner Jan 16, 2018

p-lambert Jan 16, 2018

palazzem Jan 16, 2018

palazzem Jan 16, 2018

palazzem left a comment

palazzem Jan 17, 2018

palazzem Jan 17, 2018


		assert_equal(::Encoding::UTF_8, Datadog::Utils.utf8_encode(str).encoding)

		# we don't allocate new objects when a valid UTF-8 string is provided

Better handle different encodings #316

Better handle different encodings #316

Conversation

p-lambert commented Jan 16, 2018 • edited Loading

delner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palazzem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-lambert commented Jan 16, 2018 •

edited

Loading