Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JSON serialization and deserialization code to use orjson library #5153

Merged
merged 45 commits into from
Mar 15, 2021

Conversation

Kami
Copy link
Member

@Kami Kami commented Feb 14, 2021

This pull request updates code base to use orjson library for serialization and de-serialization everywhere.

Background, Context

Right now StackStorm code-base uses native python json library for serializing / de-serialiazing JSON. Only exception to that is our version of copy.deepcopy function which uses ujson since it's up to 10x faster than native copy.deepcopy.

Now that we have dropped support for Python 2, we can look at using orjson (https://github.com/ijl/orjson) everywhere. orjson should be more or less backward compatible with json, but also quite a bit faster (even faster than ujson and unlike ujson, it's also actively maintainer).

Proposed Change

I will update / add new json_encode() and json_decode() function which should be used everywhere in the code where we need to deal with JSON.

Those functions will utilize new feature flag / config option which will allow user to specify which json library to use. This feature flag should mostly serve as a safe-guard so user can revert back to native json library in case issues arise once the code is out.

TODO

  • Add new json_encode and json_decode functions which handle json encoding and decoding using diffrent backends (json, ujson, orjson)
  • Add config option / feature flag which dictates which JSON library to use
  • Update API layer code (requests, responses) to use json_encode and decode
  • Update rest of the code to use json_encode and decode
  • Update fast_deepcopy to use orjson
  • Move ujson, simplejson from requirements to tests-requirements.txt
  • Decide if / when orjson should be default for the new config option value

@Kami Kami changed the title webtest.app.AppError: Bad response: 500 Internal Server Error (not 200 OK or 3xx redirect for http://localhost/v1/webhooks/git) b'{"faultstring":"Internal Server Error"}' st2common.json_implementation = orjson|json|ujson [WIP] Update JSON serialization and deserialization code to use orjson library. ## TODO - [ ] Add new json_encode and json_decode functions which handle json encoding and decoding using diffrent backends (json, ujson, orjson) - [ ] Add config option / feature flag which dictates which JSON library to use - [ ] Update API layer code (requests, responses) to use json_encode and decode - [ ] Update rest of the code to use json_encode and decode - [ ] Update fast_deepcopy to use orjson - [ ] Decide if / when orjson should be default for the new config option valu [WIP] Update JSON serialization and deserialization code to use orjson library. Feb 14, 2021
@Kami Kami changed the title [WIP] Update JSON serialization and deserialization code to use orjson library. [WIP] Update JSON serialization and deserialization code to use orjson library Feb 14, 2021
library will be using for encoding and decoding.

Default it to orjson and add some tests for it.
@Kami Kami changed the title [WIP] Update JSON serialization and deserialization code to use orjson library [WIP] [RFC] Update JSON serialization and deserialization code to use orjson library Feb 14, 2021
@blag
Copy link
Contributor

blag commented Feb 14, 2021

A few thoughts:

  • We should try to avoid custom magic methods, as they are subject to break without notice. See this SO post and the Python documentation itself
  • I don't like the idea of configuring a JSON parser backend, because this is something that is used so frequently within StackStorm that every future non-negligible bug report will need to post the config value for this, and users have a difficult enough time following our current directions for reporting issues - they aren't going to post their configs by default, so reported issues are going to have much more of a back-and-forth to get anywhere.
  • That being said, the orjson library emphasizes "correctness" and does not emphasize "compatibility". JSON is so simple that I hope there isn't a lot of invalid-but-still-widely-parsable JSON out there, but if there is, then switching to orjson could expose JSON encoding issues with other systems that get reported as regressions in StackStorm because all a user will see is that "it worked in the previous version of StackStorm". So if we switch to orjson, we need to be prepared to handle issues like that.

I think this is not a configuration option we should give to users. We should trust our tests and testing procedures to catch and uncover JSON parsing issues and switch to the backend that we (developers) think gives the most benefit to our users. I do think that orjson is probably the best library for this, and I'm happy to switch to it, but I want us to be prepared for any issues that crop up, and I don't think we should make this user-configurable.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

@blag

We should try to avoid custom magic methods, as they are subject to break without notice. See this SO post and the Python documentation itself

Which magic method are you referring to? If you mean __json__, that's an existing behavior (https://github.com/StackStorm/st2/pull/5153/files#diff-1e4f931a9b8df7fb6a0ff106666a98807f94d0f09c777089ae9ddf540d2010c4R43) and nothing I'm changing (and also nothing I want to change at this point to keep the scope down).

That being said, the orjson library emphasizes "correctness" and does not emphasize "compatibility". JSON is so simple that I hope there isn't a lot of invalid-but-still-widely-parsable JSON out there, but if there is, then switching to orjson could expose JSON encoding issues with other systems that get reported as regressions in StackStorm because all a user will see is that "it worked in the previous version of StackStorm". So if we switch to orjson, we need to be prepared to handle issues like that.

That's exactly the reason why that configuration option / feature flag is there.

In general, as mentioned in the discussions thread, this change already exposed some issues with incorrect code in some places which mostly results from incorrect and inconsistent conversion between bytes <-> unicode.

Incorrect in a sense that right now the code doesn't throw, but the response will be "incorrect" - incorrect in a sense that some string data in responses will contain b"" prefix which it shouldn't.

I think this is not a configuration option we should give to users. We should trust our tests and testing procedures to catch and uncover JSON parsing issues and switch to the backend that we (developers) think gives the most benefit to our users

I think that's living in an ideal and realistic world which simply doesn't exist in real-life (aka there will always be edge cases which tests won't catch).

I do agree though that we shouldn't publicly advertise that feature and treat it more as a feature flag which should be used in the worst case scenario (which hopefully won't happen).

I believe we utilized configuration options in similar situations in the past.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

And in general I also think orjson is the best choice from the strictness perspective.

In the past, we have been bit so many times by non-strict nature of various casting and other code. Problem with using best effort, non strict, etc. approach is that it may work for a lot of scenarios, but when it breaks, it's very hard to troubleshoot and fix it.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

I believe packages step is failing because we are using older pip version which doesn't include support for manywheel wheel format orjson uses so it tries to compile it from scratch instead of using a wheel (https://github.com/pypa/manylinux).

Will look into fixing that.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

Related PR to get st2-packages build to pass - StackStorm/st2packaging-dockerfiles#103.

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

I noticed one minor docstring typo, but whatever. This looks awesome.

st2common/benchmarks/micro/test_fast_deepcopy.py Outdated Show resolved Hide resolved
Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the changes in tools/config_gen.py now that json_library is a module constant instead of a config option.

@Kami
Copy link
Member Author

Kami commented Mar 6, 2021

@blag Can I please get a review when you get a chance. I would like to wrap this on then move to wrapping up DB one now that I'm unblocked by pip stuff.

I removed the config option, changed it to a module level constant so we can still exercise it in compatibility tests.

@blag
Copy link
Contributor

blag commented Mar 6, 2021

On mobile right now, will review later today. 👍

@Kami
Copy link
Member Author

Kami commented Mar 6, 2021

@blag Thanks.

I'm somewhat confused by u16 e2e test failure. Is this a known issue, or a race or smth?

Looking at the test output, I'm quite confused - https://gist.github.com/Kami/0d83da44842daa3cbb9dcef670299df6. That alias doesn't have representation (https://github.com/StackStorm-Exchange/stackstorm-st2/blob/master/aliases/actions_list.yaml) so that assertion looks wrong to me. But how did it pass before?

And my code touched none of that.

I would assume it was race or smth, but assertion doesn't seem to make sense since it doesn't add up with the actual alias defined in stackstorm-st2 pack.

To me it seems like that bats test should check for st2 list {{ limit=10 }} actions - List available StackStorm actions., but I could be missing something (aka we are trying to assert on the actual hubot response and not the start up string where it lists the loaded commands).

Same tests also pass on other distros which makes me think it's actually some kind of race - if it was a bigger issue related to this change, that would already be caught by other tests.

@Kami
Copy link
Member Author

Kami commented Mar 6, 2021

OK, ignore my comment above - I'm still digging into end to end tests output and the instance itself...

Made some progress:

2021-03-06 22:51:19,147 139824208471152 DEBUG shell [-] Returning.
2021-03-06 22:51:19,148 139824208471152 DEBUG python_runner [-] Returning values: 1, %%%%%~=~=~=************=~=~=~%%%%, Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/python_runner/python_action_wrapper.py", line 381, in <module>
    obj.run()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/python_runner/python_action_wrapper.py", line 252, in run
    sys.stdout.write(print_output + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 58: ordinal not in range(128)
, False

Looks like it's some default sys.encoding issue on u16, looks like it's set to ascii and not utf8 and that's why it's failing.

EDIT: I believe the issue is that locate on u16 is not set correctly for action runner process.

@Kami
Copy link
Member Author

Kami commented Mar 6, 2021

OK, yeah I was able to reproduce the issue - it's indeed locale related on Ubuntu 16.04.

If it's set to utf8 it works fine, but it's set to ascii it blows up due to change in how orjson serialized unicodes (it doesn't require serialized it to ascii escape sequences, but utilizes unicode directly which is preferred).

I'm having troubles dynamically setting PYTHONIOENCODING=utf8 for action runner wrapper inside the Python runner. It doesn't play nicely with the eventlet patched subprocess module, but I'm still looking into other possible solutions.

Previously it would work because json uses ascii escape sequences.

@blag I believe this is the same issue you noticed with regards to very large audit logs and a bunch of ``\\\```. This issue results in that as well - it basically happens when server is not using utf-8 locale (which is the case here_ and something tries to log a unicode message. This would cause endless loop when trying to format logging value and constantly failing. The issue is a combination of Python 3 (using unicode instead of string with ascii escape sequence) and server / action runner not having set correct locale.

In short, it looks like this issue predates my PR, but my PR made it more likely to happen and got exposes since u16 doesn't seem to have locale set up correctly for action runner.

Kami added 4 commits March 7, 2021 11:10
sys.tdout.buffer works with bytes where as sys.stdout works with
unicode.

orjson returns bytes and this way we avoid unncessary conversation back
and forth between bytes and unicode.

In addition to that, using bytes means it will also work correctly if
the system locale is not set to utf-8.
This will make it easier to troubleshoot locale / encoding related
issue.

Also make sure we print those version related messages under INFO log
level instead of DEBUG since they may be material when troubleshooting
various issues so we should use INFO.
@Kami
Copy link
Member Author

Kami commented Mar 7, 2021

OK, finally the whole build is green 🎉 🍾

@@ -41,6 +41,7 @@
]

SKIP_GROUPS = ["api_pecan", "rbac", "results_tracker"]
SKIP_OPTIONS = ["json_library"]
Copy link
Member

@cognifloyd cognifloyd Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that json_library is not a config option, do you still want to leave this here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We talked about not having user config options for json lib.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the config was removed but the this flag was left here so we can also support ignoring specific config options (previously this script didn't support that).

I will change SKIP_OPTIONS to an empty list.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly OK but some clean up requests.

@@ -48,7 +48,7 @@
from st2common.util import jinja as jinja_utils
from st2common.util import param as param_utils
from st2common.util.config_loader import get_config
from st2common.util.ujson import fast_deepcopy
from st2common.util.deep_copy import fast_deepcopy
Copy link
Contributor

@m4dcoder m4dcoder Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this module json util or something so to be clear this is json specific operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally renamed it to deep_copy since I think previous name was a bad one (I know, I picked it originally :D) - previous name was leaking implementation details which should not matter to the end user.

I think deep_copy is a better name since it conveys what the module is used for - deep copying dictionaries.

And the module is not JSON specific either, it's can deep copy arbitrary dictionaries with simple types. orjson is just an implementation detail of that function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want this to be confused with copy.deepcopy which is not json or dict specific.

Copy link
Member Author

@Kami Kami Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can rename it to fast_deepcopy_dict then? (the function name that is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess I need to be more explicitly - it doesn't just support dicts either (that's just how we use it).

It supports any kind of value as long as it only contains simple types (so no class instances, etc.).

Copy link
Member Author

@Kami Kami Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion on Slack, I pushed a couple of changes - add some more tests, update function docstring comment, rename it to fast_deepcopy_dict (bd4eb01, c80d07a, 9ba8d45).

As discussed, it can also be used on other simple values (think lists) and not just dictionaries, but using it on dictionaries with simple value types is our primary use case for it so I think that name is an OK compromise for now.

@@ -41,6 +41,7 @@
]

SKIP_GROUPS = ["api_pecan", "rbac", "results_tracker"]
SKIP_OPTIONS = ["json_library"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We talked about not having user config options for json lib.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kami Thank you! This is going to significantly improve st2.

@Kami Kami merged commit b113704 into master Mar 15, 2021
@Kami Kami deleted the orjson_prototype branch March 15, 2021 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants