Update JSON serialization and deserialization code to use `orjson` library #5153

Kami · 2021-02-14T16:38:42Z

This pull request updates code base to use orjson library for serialization and de-serialization everywhere.

Background, Context

Right now StackStorm code-base uses native python json library for serializing / de-serialiazing JSON. Only exception to that is our version of copy.deepcopy function which uses ujson since it's up to 10x faster than native copy.deepcopy.

Now that we have dropped support for Python 2, we can look at using orjson (https://github.com/ijl/orjson) everywhere. orjson should be more or less backward compatible with json, but also quite a bit faster (even faster than ujson and unlike ujson, it's also actively maintainer).

Proposed Change

I will update / add new json_encode() and json_decode() function which should be used everywhere in the code where we need to deal with JSON.

Those functions will utilize new feature flag / config option which will allow user to specify which json library to use. This feature flag should mostly serve as a safe-guard so user can revert back to native json library in case issues arise once the code is out.

TODO

Add new json_encode and json_decode functions which handle json encoding and decoding using diffrent backends (json, ujson, orjson)
Add config option / feature flag which dictates which JSON library to use
Update API layer code (requests, responses) to use json_encode and decode
Update rest of the code to use json_encode and decode
Update fast_deepcopy to use orjson
Move ujson, simplejson from requirements to tests-requirements.txt
Decide if / when orjson should be default for the new config option value

JSON fixture file.

library will be using for encoding and decoding. Default it to orjson and add some tests for it.

blag · 2021-02-14T18:17:43Z

A few thoughts:

We should try to avoid custom magic methods, as they are subject to break without notice. See this SO post and the Python documentation itself
I don't like the idea of configuring a JSON parser backend, because this is something that is used so frequently within StackStorm that every future non-negligible bug report will need to post the config value for this, and users have a difficult enough time following our current directions for reporting issues - they aren't going to post their configs by default, so reported issues are going to have much more of a back-and-forth to get anywhere.
That being said, the orjson library emphasizes "correctness" and does not emphasize "compatibility". JSON is so simple that I hope there isn't a lot of invalid-but-still-widely-parsable JSON out there, but if there is, then switching to orjson could expose JSON encoding issues with other systems that get reported as regressions in StackStorm because all a user will see is that "it worked in the previous version of StackStorm". So if we switch to orjson, we need to be prepared to handle issues like that.

I think this is not a configuration option we should give to users. We should trust our tests and testing procedures to catch and uncover JSON parsing issues and switch to the backend that we (developers) think gives the most benefit to our users. I do think that orjson is probably the best library for this, and I'm happy to switch to it, but I want us to be prepared for any issues that crop up, and I don't think we should make this user-configurable.

Kami · 2021-02-14T18:23:04Z

@blag

We should try to avoid custom magic methods, as they are subject to break without notice. See this SO post and the Python documentation itself

Which magic method are you referring to? If you mean __json__, that's an existing behavior (https://github.com/StackStorm/st2/pull/5153/files#diff-1e4f931a9b8df7fb6a0ff106666a98807f94d0f09c777089ae9ddf540d2010c4R43) and nothing I'm changing (and also nothing I want to change at this point to keep the scope down).

That being said, the orjson library emphasizes "correctness" and does not emphasize "compatibility". JSON is so simple that I hope there isn't a lot of invalid-but-still-widely-parsable JSON out there, but if there is, then switching to orjson could expose JSON encoding issues with other systems that get reported as regressions in StackStorm because all a user will see is that "it worked in the previous version of StackStorm". So if we switch to orjson, we need to be prepared to handle issues like that.

That's exactly the reason why that configuration option / feature flag is there.

In general, as mentioned in the discussions thread, this change already exposed some issues with incorrect code in some places which mostly results from incorrect and inconsistent conversion between bytes <-> unicode.

Incorrect in a sense that right now the code doesn't throw, but the response will be "incorrect" - incorrect in a sense that some string data in responses will contain b"" prefix which it shouldn't.

I think this is not a configuration option we should give to users. We should trust our tests and testing procedures to catch and uncover JSON parsing issues and switch to the backend that we (developers) think gives the most benefit to our users

I think that's living in an ideal and realistic world which simply doesn't exist in real-life (aka there will always be edge cases which tests won't catch).

I do agree though that we shouldn't publicly advertise that feature and treat it more as a feature flag which should be used in the worst case scenario (which hopefully won't happen).

I believe we utilized configuration options in similar situations in the past.

Kami · 2021-02-14T18:29:38Z

And in general I also think orjson is the best choice from the strictness perspective.

In the past, we have been bit so many times by non-strict nature of various casting and other code. Problem with using best effort, non strict, etc. approach is that it may work for a lot of scenarios, but when it breaks, it's very hard to troubleshoot and fix it.

Kami · 2021-02-14T19:33:13Z

I believe packages step is failing because we are using older pip version which doesn't include support for manywheel wheel format orjson uses so it tries to compile it from scratch instead of using a wheel (https://github.com/pypa/manylinux).

Will look into fixing that.

Kami · 2021-02-14T19:46:56Z

Related PR to get st2-packages build to pass - StackStorm/st2packaging-dockerfiles#103.

cognifloyd

🎉

I noticed one minor docstring typo, but whatever. This looks awesome.

st2common/benchmarks/micro/test_fast_deepcopy.py

just rely on module level constant which we can patch during tests to perform compatibility tests.

Co-authored-by: Jacob Floyd <[email protected]>

cognifloyd

You can drop the changes in tools/config_gen.py now that json_library is a module constant instead of a config option.

Kami · 2021-03-06T20:07:58Z

@blag Can I please get a review when you get a chance. I would like to wrap this on then move to wrapping up DB one now that I'm unblocked by pip stuff.

I removed the config option, changed it to a module level constant so we can still exercise it in compatibility tests.

blag · 2021-03-06T20:51:47Z

On mobile right now, will review later today. 👍

Kami · 2021-03-06T21:34:13Z

@blag Thanks.

I'm somewhat confused by u16 e2e test failure. Is this a known issue, or a race or smth?

Looking at the test output, I'm quite confused - https://gist.github.com/Kami/0d83da44842daa3cbb9dcef670299df6. That alias doesn't have representation (https://github.com/StackStorm-Exchange/stackstorm-st2/blob/master/aliases/actions_list.yaml) so that assertion looks wrong to me. But how did it pass before?

And my code touched none of that.

I would assume it was race or smth, but assertion doesn't seem to make sense since it doesn't add up with the actual alias defined in stackstorm-st2 pack.

To me it seems like that bats test should check for st2 list {{ limit=10 }} actions - List available StackStorm actions., but I could be missing something (aka we are trying to assert on the actual hubot response and not the start up string where it lists the loaded commands).

Same tests also pass on other distros which makes me think it's actually some kind of race - if it was a bigger issue related to this change, that would already be caught by other tests.

Kami · 2021-03-06T22:43:50Z

OK, ignore my comment above - I'm still digging into end to end tests output and the instance itself...

Made some progress:

2021-03-06 22:51:19,147 139824208471152 DEBUG shell [-] Returning.
2021-03-06 22:51:19,148 139824208471152 DEBUG python_runner [-] Returning values: 1, %%%%%~=~=~=************=~=~=~%%%%, Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/python_runner/python_action_wrapper.py", line 381, in <module>
    obj.run()
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/python_runner/python_action_wrapper.py", line 252, in run
    sys.stdout.write(print_output + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 58: ordinal not in range(128)
, False

Looks like it's some default sys.encoding issue on u16, looks like it's set to ascii and not utf8 and that's why it's failing.

EDIT: I believe the issue is that locate on u16 is not set correctly for action runner process.

Kami · 2021-03-06T23:45:51Z

OK, yeah I was able to reproduce the issue - it's indeed locale related on Ubuntu 16.04.

If it's set to utf8 it works fine, but it's set to ascii it blows up due to change in how orjson serialized unicodes (it doesn't require serialized it to ascii escape sequences, but utilizes unicode directly which is preferred).

I'm having troubles dynamically setting PYTHONIOENCODING=utf8 for action runner wrapper inside the Python runner. It doesn't play nicely with the eventlet patched subprocess module, but I'm still looking into other possible solutions.

Previously it would work because json uses ascii escape sequences.

@blag I believe this is the same issue you noticed with regards to very large audit logs and a bunch of ``\\\```. This issue results in that as well - it basically happens when server is not using utf-8 locale (which is the case here_ and something tries to log a unicode message. This would cause endless loop when trying to format logging value and constantly failing. The issue is a combination of Python 3 (using unicode instead of string with ascii escape sequence) and server / action runner not having set correct locale.

In short, it looks like this issue predates my PR, but my PR made it more likely to happen and got exposes since u16 doesn't seem to have locale set up correctly for action runner.

sys.tdout.buffer works with bytes where as sys.stdout works with unicode. orjson returns bytes and this way we avoid unncessary conversation back and forth between bytes and unicode. In addition to that, using bytes means it will also work correctly if the system locale is not set to utf-8.

This will make it easier to troubleshoot locale / encoding related issue. Also make sure we print those version related messages under INFO log level instead of DEBUG since they may be material when troubleshooting various issues so we should use INFO.

…son_prototype

Kami · 2021-03-07T13:03:06Z

OK, finally the whole build is green 🎉 🍾

cognifloyd · 2021-03-08T03:12:15Z

tools/config_gen.py

@@ -41,6 +41,7 @@
 ]

 SKIP_GROUPS = ["api_pecan", "rbac", "results_tracker"]
+SKIP_OPTIONS = ["json_library"]


Now that json_library is not a config option, do you still want to leave this here?

I agree. We talked about not having user config options for json lib.

Yeah, the config was removed but the this flag was left here so we can also support ignoring specific config options (previously this script didn't support that).

I will change SKIP_OPTIONS to an empty list.

m4dcoder

Mostly OK but some clean up requests.

m4dcoder · 2021-03-15T17:22:17Z

contrib/runners/action_chain_runner/action_chain_runner/action_chain_runner.py

@@ -48,7 +48,7 @@
 from st2common.util import jinja as jinja_utils
 from st2common.util import param as param_utils
 from st2common.util.config_loader import get_config
-from st2common.util.ujson import fast_deepcopy
+from st2common.util.deep_copy import fast_deepcopy


Can we call this module json util or something so to be clear this is json specific operation?

I intentionally renamed it to deep_copy since I think previous name was a bad one (I know, I picked it originally :D) - previous name was leaking implementation details which should not matter to the end user.

I think deep_copy is a better name since it conveys what the module is used for - deep copying dictionaries.

And the module is not JSON specific either, it's can deep copy arbitrary dictionaries with simple types. orjson is just an implementation detail of that function.

I don't want this to be confused with copy.deepcopy which is not json or dict specific.

Maybe I can rename it to fast_deepcopy_dict then? (the function name that is)

And I guess I need to be more explicitly - it doesn't just support dicts either (that's just how we use it).

It supports any kind of value as long as it only contains simple types (so no class instances, etc.).

Per discussion on Slack, I pushed a couple of changes - add some more tests, update function docstring comment, rename it to fast_deepcopy_dict (bd4eb01, c80d07a, 9ba8d45).

As discussed, it can also be used on other simple values (think lists) and not just dictionaries, but using it on dictionaries with simple value types is our primary use case for it so I think that name is an OK compromise for now.

m4dcoder · 2021-03-15T17:27:22Z

tools/config_gen.py

@@ -41,6 +41,7 @@
 ]

 SKIP_GROUPS = ["api_pecan", "rbac", "results_tracker"]
+SKIP_OPTIONS = ["json_library"]


I agree. We talked about not having user config options for json lib.

contrib/runners/python_runner/python_runner/python_action_wrapper.py

dictionari with simple (think JSON) types.

m4dcoder

@Kami Thank you! This is going to significantly improve st2.

Kami added 4 commits February 14, 2021 14:21

Add WIP code changes to utilize orjson instead of json library.

19d27cd

Add WIP micro-benchmarks for fast_deepcopy() implementations.

4ce29a7

Update more code.

a05cbc2

Add another micro benchmarks for fast_deepcopy which utilizes actual

6db1daf

JSON fixture file.

Kami changed the title ~~[WIP] Update JSON serialization and deserialization code to use orjson library.~~ [WIP] Update JSON serialization and deserialization code to use orjson library Feb 14, 2021

Kami added 2 commits February 14, 2021 18:02

Add new system.json_library config option which specifies which json

4970944

library will be using for encoding and decoding. Default it to orjson and add some tests for it.

Also include json and simplejson in the benchmark.

e10295e

Kami changed the title ~~[WIP] Update JSON serialization and deserialization code to use orjson library~~ [WIP] [RFC] Update JSON serialization and deserialization code to use orjson library Feb 14, 2021

Kami added 4 commits February 14, 2021 18:41

More tests and changes for cross compatibility.

f910be1

More compatibility fixes.

19357c4

Update more code to use json_encode / json_decode wrapper functions.

2d38d2c

Fix lint.

f6aa540

Use better module name.

205beb3

Kami added 7 commits February 14, 2021 19:36

Hook it up to CI.

d1b370d

Fix lint.

7a2a1fb

Update more affected code, transparently handle ObjectIds.

6847fb9

Also handle ObjectId instances transparently.

cf319d2

Update affected code.

0277f9b

Generate requirements files.

f3daf73

Update comment.

aa937a1

Fix lint.

dd26179

Kami mentioned this pull request Feb 14, 2021

Pin pip to 20.0.2 which is the same version used in StackStorm/st2 repo (support for manylinux2014 wheel format) StackStorm/st2packaging-dockerfiles#103

Merged

Kami added 2 commits February 14, 2021 20:48

Update affected tests.

b68ae13

Fix lint.

69fa45f

cognifloyd approved these changes Mar 6, 2021

View reviewed changes

st2common/benchmarks/micro/test_fast_deepcopy.py Outdated Show resolved Hide resolved

Kami and others added 5 commits March 6, 2021 20:08

Add a workaround for tests.

59f2c49

Merge branch 'master' of github.com:StackStorm/st2 into orjson_prototype

1062426

Remove config option we don't want to expose to the end users anyway and

bd202f4

just rely on module level constant which we can patch during tests to perform compatibility tests.

Make sure we clear up after each test to avoid cross test pollution.

ee0bd1f

Update st2common/benchmarks/micro/test_fast_deepcopy.py

e918d23

Co-authored-by: Jacob Floyd <[email protected]>

cognifloyd reviewed Mar 6, 2021

View reviewed changes

Kami added 4 commits March 7, 2021 11:10

Merge branch 'orjson_prototype' of github.com:StackStorm/st2 into orj…

8208625

…son_prototype

Update affected tests, add test for verifying service startup messages.

5e08942

cognifloyd reviewed Mar 8, 2021

View reviewed changes

Kami mentioned this pull request Mar 14, 2021

[WIP] [DONT MERGE] Performance improvements changes #5190

Closed

m4dcoder requested changes Mar 15, 2021

View reviewed changes

Kami added 5 commits March 15, 2021 20:37

Default SKIP_OPTIONS to an empty list.

af9228c

Merge branch 'master' of github.com:StackStorm/st2 into orjson_prototype

dfd533c

Update function docstring / comment.

bd4eb01

Add additional tests for it.

c80d07a

Rename function to fast_deepcopy_dict() since it's primarily used on

9ba8d45

dictionari with simple (think JSON) types.

m4dcoder approved these changes Mar 15, 2021

View reviewed changes

Kami merged commit b113704 into master Mar 15, 2021

Kami deleted the orjson_prototype branch March 15, 2021 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update JSON serialization and deserialization code to use `orjson` library #5153

Update JSON serialization and deserialization code to use `orjson` library #5153

Kami commented Feb 14, 2021 •

edited

Loading

blag commented Feb 14, 2021

Kami commented Feb 14, 2021 •

edited

Loading

Kami commented Feb 14, 2021

Kami commented Feb 14, 2021 •

edited

Loading

Kami commented Feb 14, 2021

cognifloyd left a comment

cognifloyd left a comment

Kami commented Mar 6, 2021 •

edited

Loading

blag commented Mar 6, 2021

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 7, 2021

cognifloyd Mar 8, 2021 •

edited

Loading

m4dcoder Mar 15, 2021

Kami Mar 15, 2021

m4dcoder left a comment

m4dcoder Mar 15, 2021 •

edited

Loading

Kami Mar 15, 2021

m4dcoder Mar 15, 2021

Kami Mar 15, 2021 •

edited

Loading

Kami Mar 15, 2021

Kami Mar 15, 2021 •

edited

Loading

m4dcoder Mar 15, 2021

m4dcoder left a comment

Update JSON serialization and deserialization code to use orjson library #5153

Update JSON serialization and deserialization code to use orjson library #5153

Conversation

Kami commented Feb 14, 2021 • edited Loading

Background, Context

Proposed Change

TODO

blag commented Feb 14, 2021

Kami commented Feb 14, 2021 • edited Loading

Kami commented Feb 14, 2021

Kami commented Feb 14, 2021 • edited Loading

Kami commented Feb 14, 2021

cognifloyd left a comment

Choose a reason for hiding this comment

cognifloyd left a comment

Choose a reason for hiding this comment

Kami commented Mar 6, 2021 • edited Loading

blag commented Mar 6, 2021

Kami commented Mar 6, 2021 • edited Loading

Kami commented Mar 6, 2021 • edited Loading

Kami commented Mar 6, 2021 • edited Loading

Kami commented Mar 7, 2021

cognifloyd Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

m4dcoder Mar 15, 2021

Choose a reason for hiding this comment

Kami Mar 15, 2021

Choose a reason for hiding this comment

m4dcoder left a comment

Choose a reason for hiding this comment

m4dcoder Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Kami Mar 15, 2021

Choose a reason for hiding this comment

m4dcoder Mar 15, 2021

Choose a reason for hiding this comment

Kami Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Kami Mar 15, 2021

Choose a reason for hiding this comment

Kami Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

m4dcoder Mar 15, 2021

Choose a reason for hiding this comment

m4dcoder left a comment

Choose a reason for hiding this comment

Update JSON serialization and deserialization code to use `orjson` library #5153

Update JSON serialization and deserialization code to use `orjson` library #5153

Kami commented Feb 14, 2021 •

edited

Loading

Kami commented Feb 14, 2021 •

edited

Loading

Kami commented Feb 14, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

Kami commented Mar 6, 2021 •

edited

Loading

cognifloyd Mar 8, 2021 •

edited

Loading

m4dcoder Mar 15, 2021 •

edited

Loading

Kami Mar 15, 2021 •

edited

Loading

Kami Mar 15, 2021 •

edited

Loading