-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library #65
Comments
While quickly looking at it and trying to get tests to pass with Likely some of those were needed when we still needed to support Python 2 and Python 3, but now that we only support Python 3, we should be able to get rid of some of those and working directly with bytes / unicode (depending on what specific code expects). This change also exposed more places where we don't correctly encode / decode incoming and outgong data so the request / response contains string with On a related note - does anyone happen to have some executions with large result field laying around (maybe some from ci/cd server)? So I can use it in the micro benchmarks. |
Awesome stuff, + 100! Python3 definitely should open us some new doors with the perf optimizations. You can find a few construction blocks here as a starting example: StackStorm/st2#4798. Someone did a good job preparing it for us. |
@armab That issue is actually a different one - it's related to bad mongoengine performance when working with large datasets. Using a different json library would likely still speed that action since operation involves parsing json, but it wouldn't help with the mongoengine related issue (aka storing result in the database). For mongoengine related issue, work would need to continue on those issues - StackStorm/st2#4838, StackStorm/st2#4846. |
Re mongoengine performance issues with large documents - one option we should also consider experimenting and benchmarking with is compression (would likely require some changes to store result as string and handle that transparently inside to_mongo and from_mongo functions). Compression is not free (CPU cycles), but executions with large results usually contain textual data which compresses well so it's, possible that trading some CPU cycles for compression would still result in overall better throughput and shorter duration of execution related DB operations because we would be passing less bytes to mongoengine later. We should start with benchmarking (perhaps using |
@Kami I just meant there was an example in that issue that may help composing a workflow which would pass some larger data along. |
Implemented in StackStorm/st2#4846 |
Right now StackStorm code utilizes native
json
library for json serialization / deserialization and alsoujson
forfast_deepcopy
function.Now that we don't support Python 2 anymore, we should evaluate using
orjson
- https://github.com/ijl/orjson. Based on micro benchmarks in another project it offers substantial improvements, even overujson
.In the past there were some issues with compatibility with native
json
library, but I believe most of them have been resolved.Having said that, we should still do research specifically for this project (update the code, verify everything works + micro benchmarks).
TODO
fast_deepcopy
)The text was updated successfully, but these errors were encountered: