[WIP] [RFC] Correctly handle pymongo error messages with unicode data under Python 2 #2147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After upgrading to latest version of
pymongo
and mongoengine, I noticed our tests which rely on unicode document keys started to fail.It turns out the root cause is related to this change introduced in pymongo (mongodb/mongo-python-driver@e711408#diff-83f588285a24ab0d1ee8a57f88312a6aR54) and how mongoengine handles pymongo error messages as strings (#1428 (comment)).
mongoengine calls
six.text_type()
on the error class instance which will result inunicode(err)
call in Python 2.This call will fail if encoding used by Python (
sys.getdefaultencoding()
) is notutf-8
. Keep in mind that is fairly common for default encoding to beascii
.I believe the best solution to solve that issue is to update
mongoengine
code to correctly handle unicode strings even if default Python encoding isascii
instead of "blindly" callingsix.text_type()
on the error class instances.I believe this is a correct approach since Python apps should still handle unicode strings correctly even if default encoding which is used is
ascii
aka we can't rely on default encoding to always beutf-8
(in fact, that's how I handle it in other libraries and I also seen other projects handle it).This means that instead of doing:
We should do:
This means the returning value will always be a correct unicode type.
I propose adding a new
to_unicode
or similar utility function which performs that step under Python 2 and update code to use this in all the places where we cast pymongo error class instances to string and use it with a unicode type.TODO
six.text_type
on the pymongo error class instance