[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

Kami · 2021-02-14T11:35:39Z

Right now StackStorm code utilizes native json library for json serialization / deserialization and also ujson for fast_deepcopy function.

Now that we don't support Python 2 anymore, we should evaluate using orjson - https://github.com/ijl/orjson. Based on micro benchmarks in another project it offers substantial improvements, even over ujson.

In the past there were some issues with compatibility with native json library, but I believe most of them have been resolved.

Having said that, we should still do research specifically for this project (update the code, verify everything works + micro benchmarks).

TODO

Write some micro benchmarks for our typical data sets (also add some large executions for use with fast_deepcopy)
Test implement it in StackStorm/st2 and ensure all tests pass
Add feature flag for disabling this functionality and falling back to standard json library

The text was updated successfully, but these errors were encountered:

Kami · 2021-02-14T11:38:54Z

While quickly looking at it and trying to get tests to pass with orjson I noticed some issues with unnecessary (and in case of large requests, slower) bytes -> unicode -> bytes conversions.

Likely some of those were needed when we still needed to support Python 2 and Python 3, but now that we only support Python 3, we should be able to get rid of some of those and working directly with bytes / unicode (depending on what specific code expects).

This change also exposed more places where we don't correctly encode / decode incoming and outgong data so the request / response contains string with b'' prefix.

On a related note - does anyone happen to have some executions with large result field laying around (maybe some from ci/cd server)? So I can use it in the micro benchmarks.

arm4b · 2021-02-14T13:25:09Z

Awesome stuff, + 100! Python3 definitely should open us some new doors with the perf optimizations.
Thanks @Kami for the research and new energy here 👍

You can find a few construction blocks here as a starting example: StackStorm/st2#4798. Someone did a good job preparing it for us.

Kami · 2021-02-14T16:21:57Z

@armab That issue is actually a different one - it's related to bad mongoengine performance when working with large datasets.

Using a different json library would likely still speed that action since operation involves parsing json, but it wouldn't help with the mongoengine related issue (aka storing result in the database).

For mongoengine related issue, work would need to continue on those issues - StackStorm/st2#4838, StackStorm/st2#4846.

Kami · 2021-02-14T21:41:35Z

Re mongoengine performance issues with large documents - one option we should also consider experimenting and benchmarking with is compression (would likely require some changes to store result as string and handle that transparently inside to_mongo and from_mongo functions).

Compression is not free (CPU cycles), but executions with large results usually contain textual data which compresses well so it's, possible that trading some CPU cycles for compression would still result in overall better throughput and shorter duration of execution related DB operations because we would be passing less bytes to mongoengine later.

We should start with benchmarking (perhaps using zstandard algorithm) and then decide if it's worth pursuing those efforts.

arm4b · 2021-02-15T19:07:54Z

@Kami I just meant there was an example in that issue that may help composing a workflow which would pass some larger data along.

arm4b · 2021-10-11T19:47:59Z

Implemented in StackStorm/st2#4846

Kami added enhancement New feature or request design brainstorming labels Feb 14, 2021

arm4b added this to the 3.5.0 milestone Feb 15, 2021

arm4b closed this as completed Oct 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

Kami commented Feb 14, 2021

Kami commented Feb 14, 2021 •

edited

Loading

arm4b commented Feb 14, 2021 •

edited

Loading

Kami commented Feb 14, 2021

Kami commented Feb 14, 2021

arm4b commented Feb 15, 2021

arm4b commented Oct 11, 2021

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

Comments

Kami commented Feb 14, 2021

TODO

Kami commented Feb 14, 2021 • edited Loading

arm4b commented Feb 14, 2021 • edited Loading

Kami commented Feb 14, 2021

Kami commented Feb 14, 2021

arm4b commented Feb 15, 2021

arm4b commented Oct 11, 2021

Kami commented Feb 14, 2021 •

edited

Loading

arm4b commented Feb 14, 2021 •

edited

Loading