KafkaError.str() raises UnicodeDecodeError #129

johnrgregg3 · 2017-02-17T16:49:55Z

I am running confluent_kafka.version ('0.9.2', 590080), confluent_kafka.libversion ('0.9.3', 590847)
In my on_delivery callback:

def fe_analytics_kafka_cb(err, msg):
if err:
LOG.error("fe_analytics_kafka_cb: error: code %d, name %s",
err.code(), str(err.name()))
fred = err.str()
LOG.error("fe_analytics_kafka_cb: error: code %d, name %s: %s",
err.code(), str(err.name()), str(err.str()))

Note the first LOG message does not have the err.str(), so I can see the code and name. They are:

fe_analytics_kafka_cb: error: code -192, name _MSG_TIMED_OUT

Interestingly, I get hit with the exception merely by calling err.str(), not from trying to log it. That is, the fred = err.str() gets it.

I get this exception:

File "/a/bin/vocfe_app/vocfe.py", line 195, in fe_analytics_kafka_cb
fred = err.str()
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 76: invalid start byte

Of course I'd like to be able to see the string, otherwise I only have the name to go on.

I think what is going on is that for _MSG_TIMED_OUT, str() just echos the message that was sent, but the length is off, so random memory beyond the message is attempted to be displayed. I'm sending 76 bytes, and the exceptions are in position 76 or 77, beyond the end of my message. Moreover, sometimes it does not raise the exception, but displays weird stuff after the text of my message (which was a JSON blob):

fe_analytics_kafka_cb: error: code -192, name _MSG_TIMED_OUT: {"test_message": "Fri Feb 17 16:43:53 2017 INITIAL TEST FROM 23.79.234.201"}^Y

See the ^Y at the end? I guess sometimes it gets lucky and the random memory looks like valid unicode, and sometimes it doesn't.

johnrgregg3 · 2017-02-17T16:54:28Z

It strikes me that even if the message length issue is sorted out, str() should not just echo the content of the message. What if I'm sending binary data over Kafka? Won't I still get this exception? It seems like str() should do some sanitizing in general.

johnrgregg3 · 2017-02-17T17:08:57Z

rd_kafka_message_errstr(const rd_kafka_message_t *rkmessage) {
if (!rkmessage->err)
return NULL;

    if (rkmessage->payload)
            return (const char *)rkmessage->payload;

    return rd_kafka_err2str(rkmessage->err);

}

Just returns a pointer to the payload, no NUL-termination, no guarantee that the payload is a trging or unicode. Later, it is passed to KafkaError_new_or_None as argument str, who does this:

PyObject *KafkaError_new_or_None (rd_kafka_resp_err_t err, const char *str) {
if (!err)
Py_RETURN_NONE;
return KafkaError_new0(err, "%s", str);
}

edenhill · 2017-02-17T17:44:43Z

@johnrgregg3 Great work finding the root cause of this, I'm impressed!

The key here is this small comment in rdkafka.h:

void   *payload;           /**< Producer: original message payload.
				    * Consumer: Depends on the value of \c err :
				    * - \c err==0: Message payload.
				    * - \c err!=0: Error string */

https://github.com/edenhill/librdkafka/blob/master/src/rdkafka.h#L832

What this means in practice is that rd_kafka_message_errstr() should never be used on a producer message, and that misuse is a bug in the Python client.

Proper error string handling in Producer, issue #129

johnrgregg3 · 2017-03-03T01:59:25Z

Do you know which release this fix will be in? Will it be the upcoming 0.9.4, or the one after that?

edenhill · 2017-03-03T13:49:34Z

This just missed 0.9.4, so it'll be available in the next release (no timeline yet)

edenhill · 2017-03-03T13:49:50Z

If this turns out to be a blocker we'll ship a maintenance release

johnrgregg3 · 2017-03-15T15:54:23Z

I would like to emphasize that in the cases where I could get the str(), it was quite useful, so it would be nice to simply not have that function on the producer side. In particular, "198.18.130.13:9092/bootstrap: Connect to ipv4#198.18.130.13:9092 failed: Connection refused" tells me a lot more than simply the code/name, "_TRANSPORT". Similarly, the detail in _RESOLVE helps a lot.

edenhill · 2017-03-20T13:44:09Z

The per-message delivery report errors are sparse - they only have the error code (which can be humanized) that only provides a static error indication.

But I think that is okay in this specific case because the error codes you mention, _TRANSPORT, _RESOLVE, etc, will not be propagated to the delivery report handler since they are temporary errors that librdkafka will handle.
The error code for a failed message will be more abstracted, e.g., _TIMED_OUT if the message could not be delivered in message.timeout.ms (for whatever underlying reasons), or a direct error code from the broker such as REPLICA_NOT_AVAILABLE.

edenhill added the bug label Feb 17, 2017

edenhill added a commit that referenced this issue Feb 19, 2017

Proper use of message error on Producer (#129)

b535590

edenhill mentioned this issue Feb 19, 2017

Proper error string handling in Producer, issue #129 #131

Merged

edenhill added a commit that referenced this issue Feb 27, 2017

Merge pull request #131 from confluentinc/errors_129

ece3df4

Proper error string handling in Producer, issue #129

edenhill closed this as completed Mar 2, 2017

edenhill mentioned this issue May 30, 2017

KafkaError.str() fails if trying to produce binary data #199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KafkaError.str() raises UnicodeDecodeError #129

KafkaError.str() raises UnicodeDecodeError #129

johnrgregg3 commented Feb 17, 2017

johnrgregg3 commented Feb 17, 2017

johnrgregg3 commented Feb 17, 2017

edenhill commented Feb 17, 2017 •

edited

Loading

johnrgregg3 commented Mar 3, 2017

edenhill commented Mar 3, 2017

edenhill commented Mar 3, 2017

johnrgregg3 commented Mar 15, 2017

edenhill commented Mar 20, 2017

KafkaError.str() raises UnicodeDecodeError #129

KafkaError.str() raises UnicodeDecodeError #129

Comments

johnrgregg3 commented Feb 17, 2017

johnrgregg3 commented Feb 17, 2017

johnrgregg3 commented Feb 17, 2017

edenhill commented Feb 17, 2017 • edited Loading

johnrgregg3 commented Mar 3, 2017

edenhill commented Mar 3, 2017

edenhill commented Mar 3, 2017

johnrgregg3 commented Mar 15, 2017

edenhill commented Mar 20, 2017

edenhill commented Feb 17, 2017 •

edited

Loading