Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [RFC] Correctly handle pymongo error messages with unicode data under Python 2 #2147

Closed
wants to merge 1 commit into from

Conversation

Kami
Copy link

@Kami Kami commented Aug 16, 2019

After upgrading to latest version of pymongo and mongoengine, I noticed our tests which rely on unicode document keys started to fail.

Traceback (most recent call last):
  File "/data/stanley/st2common/st2common/router.py", line 515, in __call__
    resp = func(**kw)
  File "/data/stanley/st2api/st2api/controllers/v1/actions.py", line 129, in post
    action_db = Action.add_or_update(action_model)
  File "/data/stanley/st2common/st2common/persistence/base.py", line 173, in add_or_update
    model_object = cls._get_impl().add_or_update(model_object, validate=True)
  File "/data/stanley/st2common/st2common/models/db/__init__.py", line 451, in add_or_update
    instance.save(validate=validate)
  File "/data/stanley/virtualenv/local/lib/python2.7/site-packages/mongoengine/document.py", line 412, in save
    raise NotUniqueError(message % six.text_type(err))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 129: ordinal not in range(128)

It turns out the root cause is related to this change introduced in pymongo (mongodb/mongo-python-driver@e711408#diff-83f588285a24ab0d1ee8a57f88312a6aR54) and how mongoengine handles pymongo error messages as strings (#1428 (comment)).

mongoengine calls six.text_type() on the error class instance which will result in unicode(err) call in Python 2.

This call will fail if encoding used by Python (sys.getdefaultencoding()) is not utf-8. Keep in mind that is fairly common for default encoding to be ascii.

I believe the best solution to solve that issue is to update mongoengine code to correctly handle unicode strings even if default Python encoding is ascii instead of "blindly" calling six.text_type() on the error class instances.

I believe this is a correct approach since Python apps should still handle unicode strings correctly even if default encoding which is used is ascii aka we can't rely on default encoding to always be utf-8 (in fact, that's how I handle it in other libraries and I also seen other projects handle it).

This means that instead of doing:

six.text_type(err)

We should do:

str(err).decode('utf-8')

This means the returning value will always be a correct unicode type.

I propose adding a new to_unicode or similar utility function which performs that step under Python 2 and update code to use this in all the places where we cast pymongo error class instances to string and use it with a unicode type.

TODO

  • Tests
  • Update rest of the code where we call six.text_type on the pymongo error class instance

@Kami
Copy link
Author

Kami commented Aug 16, 2019

/cc @wojcikstefan^ since you originally contributed six.text_type changes quite a while ago.

Kami added a commit to StackStorm/st2 that referenced this pull request Aug 16, 2019
@Kami
Copy link
Author

Kami commented Aug 16, 2019

Closing this in favor of a fix in pymongo library - https://jira.mongodb.org/browse/PYTHON-1966.

@Kami Kami closed this Aug 16, 2019
blag pushed a commit to StackStorm/st2 that referenced this pull request Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant