Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65

Closed
3 tasks
Kami opened this issue Feb 14, 2021 · 6 comments
Closed
3 tasks
Labels
Milestone

Comments

@Kami
Copy link
Member

Kami commented Feb 14, 2021

Right now StackStorm code utilizes native json library for json serialization / deserialization and also ujson for fast_deepcopy function.

Now that we don't support Python 2 anymore, we should evaluate using orjson - https://github.com/ijl/orjson. Based on micro benchmarks in another project it offers substantial improvements, even over ujson.

In the past there were some issues with compatibility with native json library, but I believe most of them have been resolved.

Having said that, we should still do research specifically for this project (update the code, verify everything works + micro benchmarks).

TODO

  • Write some micro benchmarks for our typical data sets (also add some large executions for use with fast_deepcopy)
  • Test implement it in StackStorm/st2 and ensure all tests pass
  • Add feature flag for disabling this functionality and falling back to standard json library
@Kami Kami added enhancement New feature or request design brainstorming labels Feb 14, 2021
@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

While quickly looking at it and trying to get tests to pass with orjson I noticed some issues with unnecessary (and in case of large requests, slower) bytes -> unicode -> bytes conversions.

Likely some of those were needed when we still needed to support Python 2 and Python 3, but now that we only support Python 3, we should be able to get rid of some of those and working directly with bytes / unicode (depending on what specific code expects).

This change also exposed more places where we don't correctly encode / decode incoming and outgong data so the request / response contains string with b'' prefix.

On a related note - does anyone happen to have some executions with large result field laying around (maybe some from ci/cd server)? So I can use it in the micro benchmarks.

@arm4b
Copy link
Member

arm4b commented Feb 14, 2021

Awesome stuff, + 100! Python3 definitely should open us some new doors with the perf optimizations.
Thanks @Kami for the research and new energy here 👍

You can find a few construction blocks here as a starting example: StackStorm/st2#4798. Someone did a good job preparing it for us.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

@armab That issue is actually a different one - it's related to bad mongoengine performance when working with large datasets.

Using a different json library would likely still speed that action since operation involves parsing json, but it wouldn't help with the mongoengine related issue (aka storing result in the database).

For mongoengine related issue, work would need to continue on those issues - StackStorm/st2#4838, StackStorm/st2#4846.

@Kami
Copy link
Member Author

Kami commented Feb 14, 2021

Re mongoengine performance issues with large documents - one option we should also consider experimenting and benchmarking with is compression (would likely require some changes to store result as string and handle that transparently inside to_mongo and from_mongo functions).

Compression is not free (CPU cycles), but executions with large results usually contain textual data which compresses well so it's, possible that trading some CPU cycles for compression would still result in overall better throughput and shorter duration of execution related DB operations because we would be passing less bytes to mongoengine later.

We should start with benchmarking (perhaps using zstandard algorithm) and then decide if it's worth pursuing those efforts.

@arm4b
Copy link
Member

arm4b commented Feb 15, 2021

@Kami I just meant there was an example in that issue that may help composing a workflow which would pass some larger data along.

@arm4b arm4b added this to the 3.5.0 milestone Feb 15, 2021
@arm4b
Copy link
Member

arm4b commented Oct 11, 2021

Implemented in StackStorm/st2#4846

@arm4b arm4b closed this as completed Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants