Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JSON serialization and deserialization code to use orjson library #5153

Merged
merged 45 commits into from
Mar 15, 2021
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
19d27cd
Add WIP code changes to utilize orjson instead of json library.
Kami Feb 14, 2021
4ce29a7
Add WIP micro-benchmarks for fast_deepcopy() implementations.
Kami Feb 14, 2021
a05cbc2
Update more code.
Kami Feb 14, 2021
6db1daf
Add another micro benchmarks for fast_deepcopy which utilizes actual
Kami Feb 14, 2021
4970944
Add new system.json_library config option which specifies which json
Kami Feb 14, 2021
e10295e
Also include json and simplejson in the benchmark.
Kami Feb 14, 2021
f910be1
More tests and changes for cross compatibility.
Kami Feb 14, 2021
19357c4
More compatibility fixes.
Kami Feb 14, 2021
2d38d2c
Update more code to use json_encode / json_decode wrapper functions.
Kami Feb 14, 2021
f6aa540
Fix lint.
Kami Feb 14, 2021
205beb3
Use better module name.
Kami Feb 14, 2021
d1b370d
Hook it up to CI.
Kami Feb 14, 2021
7a2a1fb
Fix lint.
Kami Feb 14, 2021
6847fb9
Update more affected code, transparently handle ObjectIds.
Kami Feb 14, 2021
cf319d2
Also handle ObjectId instances transparently.
Kami Feb 14, 2021
0277f9b
Update affected code.
Kami Feb 14, 2021
f3daf73
Generate requirements files.
Kami Feb 14, 2021
aa937a1
Update comment.
Kami Feb 14, 2021
dd26179
Fix lint.
Kami Feb 14, 2021
b68ae13
Update affected tests.
Kami Feb 14, 2021
69fa45f
Fix lint.
Kami Feb 14, 2021
6c2756f
Add changelog entry.
Kami Feb 14, 2021
90c088a
Also add micro benchmarks for json serialization an deserialization.
Kami Feb 14, 2021
a3581c0
Merge branch 'master' of github.com:StackStorm/st2 into orjson_prototype
Kami Feb 20, 2021
9bb9739
Update more code to use more efficient json encode / decode library.
Kami Feb 20, 2021
3282d0a
Merge branch 'master' into orjson_prototype
Kami Mar 6, 2021
b6a965f
Use latest version of orjson.
Kami Mar 6, 2021
5f75210
Update help string.
Kami Mar 6, 2021
8a6d8ae
Don't publicly document the config option which is really just used as a
Kami Mar 6, 2021
55b92b4
Merge branch 'master' into orjson_prototype
Kami Mar 6, 2021
d0b9952
Add back file which was accidentaly removed.
Kami Mar 6, 2021
59f2c49
Add a workaround for tests.
Kami Mar 6, 2021
1062426
Merge branch 'master' of github.com:StackStorm/st2 into orjson_prototype
Kami Mar 6, 2021
bd202f4
Remove config option we don't want to expose to the end users anyway and
Kami Mar 6, 2021
ee0bd1f
Make sure we clear up after each test to avoid cross test pollution.
Kami Mar 6, 2021
e918d23
Update st2common/benchmarks/micro/test_fast_deepcopy.py
Kami Mar 6, 2021
a2ef1e6
Use sys.stdout.buffer instead of sys.stdout.
Kami Mar 7, 2021
f2bd250
Update service setup code so also print used locale and fs encoding.
Kami Mar 7, 2021
8208625
Merge branch 'orjson_prototype' of github.com:StackStorm/st2 into orj…
Kami Mar 7, 2021
5e08942
Update affected tests, add test for verifying service startup messages.
Kami Mar 7, 2021
af9228c
Default SKIP_OPTIONS to an empty list.
Kami Mar 15, 2021
dfd533c
Merge branch 'master' of github.com:StackStorm/st2 into orjson_prototype
Kami Mar 15, 2021
bd4eb01
Update function docstring / comment.
Kami Mar 15, 2021
c80d07a
Add additional tests for it.
Kami Mar 15, 2021
9ba8d45
Rename function to fast_deepcopy_dict() since it's primarily used on
Kami Mar 15, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ jobs:
- name: 'Unit Tests'
task: 'ci-unit'
python-version: '3.6'
- name: 'Micro Benchmarks'
task: 'micro-benchmarks'
python-version: '3.6'
# Integration tests are not working yet, still done in Travis
# - name: 'Integration Tests'
# task: 'ci-integration'
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,18 @@ Changed

* Updated cryptography dependency to version 3.3.2 to avoid CVE-2020-36242 (security) #5151

* Update most of the code in the StackStorm API and services layer to utilize ``orjson`` library
for serializing and de-serializing json.

That should result in better json serialization and deserialization performance.

The change should be fully backward compatible, only difference is that API JSON responses now
won't be indented using 4 spaces by default (indenting adds unnecessary overhead and if needed,
the response can be pretty formatted on the client side using ``jq`` or similar). (improvement)
#5153

Contributed by @Kami

Fixed
~~~~~

Expand Down
13 changes: 13 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,19 @@ compilepy3:
find ${ROOT_DIR}/st2common/st2common/ \( -name \*.py ! -name router\.py -name \*.py \) -type f -print0 | xargs -0 cat | grep st2stream; test $$? -eq 1
find ${ROOT_DIR}/st2common/st2common/ -name \*.py -type f -print0 | xargs -0 cat | grep st2exporter; test $$? -eq 1

.PHONY: micro-benchmarks
micro-benchmarks: requirements .micro-benchmarks

.PHONY: .micro-benchmarks
.micro-benchmarks:
@echo
@echo "==================== micro-benchmarks ===================="
@echo
. $(VIRTUALENV_DIR)/bin/activate; pytest --benchmark-only --benchmark-name=short --benchmark-columns=min,max,mean,stddev,median,ops,rounds --benchmark-group-by=group,param:dict_keys_count_and_depth -s -v st2common/benchmarks/micro/test_fast_deepcopy.py -k "test_fast_deepcopy_with_dict_values"
. $(VIRTUALENV_DIR)/bin/activate; pytest --benchmark-only --benchmark-name=short --benchmark-columns=min,max,mean,stddev,median,ops,rounds --benchmark-group-by=group,param:fixture_file -s -v st2common/benchmarks/micro/test_fast_deepcopy.py -k "test_fast_deepcopy_with_json_fixture_file"
. $(VIRTUALENV_DIR)/bin/activate; pytest --benchmark-only --benchmark-name=short --benchmark-columns=min,max,mean,stddev,median,ops,rounds --benchmark-group-by=group,param:fixture_file,param:indent_sort_keys_tuple -s -v st2common/benchmarks/micro/test_json_serialization_and_deserialization.py -k "test_json_dumps"
. $(VIRTUALENV_DIR)/bin/activate; pytest --benchmark-only --benchmark-name=short --benchmark-columns=min,max,mean,stddev,median,ops,rounds --benchmark-group-by=group,param:fixture_file -s -v st2common/benchmarks/micro/test_json_serialization_and_deserialization.py -k "test_json_loads"

.PHONY: .cleanmongodb
.cleanmongodb:
@echo "==================== cleanmongodb ===================="
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
from st2common.util import jinja as jinja_utils
from st2common.util import param as param_utils
from st2common.util.config_loader import get_config
from st2common.util.ujson import fast_deepcopy
from st2common.util.deep_copy import fast_deepcopy
Copy link
Contributor

@m4dcoder m4dcoder Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this module json util or something so to be clear this is json specific operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally renamed it to deep_copy since I think previous name was a bad one (I know, I picked it originally :D) - previous name was leaking implementation details which should not matter to the end user.

I think deep_copy is a better name since it conveys what the module is used for - deep copying dictionaries.

And the module is not JSON specific either, it's can deep copy arbitrary dictionaries with simple types. orjson is just an implementation detail of that function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want this to be confused with copy.deepcopy which is not json or dict specific.

Copy link
Member Author

@Kami Kami Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can rename it to fast_deepcopy_dict then? (the function name that is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess I need to be more explicitly - it doesn't just support dicts either (that's just how we use it).

It supports any kind of value as long as it only contains simple types (so no class instances, etc.).

Copy link
Member Author

@Kami Kami Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion on Slack, I pushed a couple of changes - add some more tests, update function docstring comment, rename it to fast_deepcopy_dict (bd4eb01, c80d07a, 9ba8d45).

As discussed, it can also be used on other simple values (think lists) and not just dictionaries, but using it on dictionaries with simple value types is our primary use case for it so I think that name is an OK compromise for now.


__all__ = ["ActionChainRunner", "ChainHolder", "get_runner", "get_metadata"]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ def test_script_with_parameters_parameter_serialization(self):
self.assertIn("PARAM_FLOAT=2.55", result["stdout"])
self.assertIn("PARAM_BOOLEAN=1", result["stdout"])
self.assertIn("PARAM_LIST=a,b,c", result["stdout"])
self.assertIn('PARAM_OBJECT={"foo": "bar"}', result["stdout"])
self.assertIn('PARAM_OBJECT={"foo":"bar"}', result["stdout"])

action_parameters = {
"param_string": "test string",
Expand Down Expand Up @@ -515,12 +515,12 @@ def test_script_with_parameters_parameter_serialization(self):
self.assertIn("PARAM_FLOAT=2.55", result["stdout"])
self.assertIn("PARAM_BOOLEAN=1", result["stdout"])
self.assertIn("PARAM_LIST=a,b,c", result["stdout"])
self.assertIn('PARAM_OBJECT={"foo": "bar"}', result["stdout"])
self.assertIn('PARAM_OBJECT={"foo":"bar"}', result["stdout"])

output_dbs = ActionExecutionOutput.query(output_type="stdout")
self.assertEqual(len(output_dbs), 6)
self.assertEqual(output_dbs[0].data, "PARAM_STRING=test string\n")
self.assertEqual(output_dbs[5].data, 'PARAM_OBJECT={"foo": "bar"}\n')
self.assertEqual(output_dbs[5].data, 'PARAM_OBJECT={"foo":"bar"}\n')

output_dbs = ActionExecutionOutput.query(output_type="stderr")
self.assertEqual(len(output_dbs), 0)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from st2common.services import action as ac_svc
from st2common.services import workflows as wf_svc
from st2common.util import api as api_util
from st2common.util import ujson
from st2common.util import deep_copy

__all__ = ["OrquestaRunner", "get_runner", "get_metadata"]

Expand All @@ -59,7 +59,7 @@ def _get_notify_config(self):
)

def _construct_context(self, wf_ex):
ctx = ujson.fast_deepcopy(self.context)
ctx = deep_copy.fast_deepcopy(self.context)
ctx["workflow_execution"] = str(wf_ex.id)

return ctx
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@

import distutils.sysconfig

# NOTE: We intentionally use orjson directly here instead of json_encode - orjson.dumps relies
# on config option which we don't parse for the action wrapper since it speeds things down - action
# wrapper should rely on as little imports as possible to make Python runner executions fast -
# that's very important.
import orjson
m4dcoder marked this conversation as resolved.
Show resolved Hide resolved

# Note: This work-around is required to fix the issue with other Python modules which live
# inside this directory polluting and masking sys.path for Python runner actions.
# Since this module is ran as a Python script inside a subprocess, directory where the script
Expand All @@ -46,7 +52,6 @@
sys.path.insert(0, distutils.sysconfig.get_python_lib())

import sys
import json
import argparse

import six
Expand Down Expand Up @@ -233,12 +238,12 @@ def run(self):
# Special case if result object is not JSON serializable - aka user wanted to return a
# non-simple type (e.g. class instance or other non-JSON serializable type)
try:
json.dumps(action_output["result"])
except TypeError:
orjson.dumps(action_output["result"]).decode("utf-8")
except (TypeError, orjson.JSONDecodeError):
action_output["result"] = str(action_output["result"])

try:
print_output = json.dumps(action_output)
print_output = orjson.dumps(action_output).decode("utf-8")
except Exception:
print_output = str(action_output)

Expand Down Expand Up @@ -317,9 +322,9 @@ def _get_action_instance(self):
)
args = parser.parse_args()

config = json.loads(args.config) if args.config else {}
config = orjson.loads(args.config) if args.config else {}
user = args.user
parent_args = json.loads(args.parent_args) if args.parent_args else []
parent_args = orjson.loads(args.parent_args) if args.parent_args else []
log_level = args.log_level

if not isinstance(config, dict):
Expand All @@ -330,7 +335,7 @@ def _get_action_instance(self):
if args.parameters:
LOG.debug("Getting parameters from argument")
args_parameters = args.parameters
args_parameters = json.loads(args_parameters) if args_parameters else {}
args_parameters = orjson.loads(args_parameters) if args_parameters else {}
parameters.update(args_parameters)

if args.stdin_parameters:
Expand All @@ -349,7 +354,7 @@ def _get_action_instance(self):
stdin_data = sys.stdin.readline().strip()

try:
stdin_parameters = json.loads(stdin_data)
stdin_parameters = orjson.loads(stdin_data)
stdin_parameters = stdin_parameters.get("parameters", {})
except Exception as e:
msg = (
Expand Down
13 changes: 7 additions & 6 deletions contrib/runners/python_runner/python_runner/python_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
import os
import re
import sys
import json
import uuid
import functools
from subprocess import list2cmdline
Expand Down Expand Up @@ -54,6 +53,8 @@
from st2common.util.shell import quote_unix
from st2common.services.action import store_execution_output_data
from st2common.runners.utils import make_read_and_store_stream_func
from st2common.util.jsonify import json_decode
from st2common.util.jsonify import json_encode

from python_runner import python_action_wrapper

Expand Down Expand Up @@ -134,7 +135,7 @@ def run(self, action_parameters):
LOG.debug("Getting user.")
user = self.get_user()
LOG.debug("Serializing parameters.")
serialized_parameters = json.dumps(
serialized_parameters = json_encode(
action_parameters if action_parameters else {}
)
LOG.debug("Getting virtualenv_path.")
Expand Down Expand Up @@ -166,9 +167,9 @@ def run(self, action_parameters):
LOG.debug("Setting args.")

if self._use_parent_args:
parent_args = json.dumps(sys.argv[1:])
parent_args = json_encode(sys.argv[1:])
else:
parent_args = json.dumps([])
parent_args = json_encode([])

args = [
python_path,
Expand Down Expand Up @@ -197,7 +198,7 @@ def run(self, action_parameters):
args.append("--parameters=%s" % (serialized_parameters))

if self._config:
args.append("--config=%s" % (json.dumps(self._config)))
args.append("--config=%s" % (json_encode(self._config)))

if self._log_level != PYTHON_RUNNER_DEFAULT_LOG_LEVEL:
# We only pass --log-level parameter if non default log level value is specified
Expand Down Expand Up @@ -361,7 +362,7 @@ def _get_output_values(self, exit_code, stdout, stderr, timed_out):
# Parse the serialized action result object (if available)
if action_result:
try:
action_result = json.loads(action_result)
action_result = json_decode(action_result)
except Exception as e:
# Failed to de-serialize the result, probably it contains non-simple type or similar
LOG.warning(
Expand Down
1 change: 1 addition & 0 deletions fixed-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,5 @@ python-dateutil==2.8.0
python-statsd==2.1.0
prometheus_client==0.1.1
ujson==1.35
orjson==3.5.0
zipp>=0.5,<=1.0.0
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ networkx==1.11
nose
nose-parallel==0.3.1
nose-timer==0.7.5
orjson==3.5.0
oslo.config<1.13,>=1.12.1
oslo.utils<5.0,>=4.0.0
paramiko==2.7.1
Expand Down
4 changes: 2 additions & 2 deletions st2api/st2api/controllers/v1/inquiries.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
# limitations under the License.

import copy
import json

import six
from oslo_config import cfg
Expand All @@ -32,6 +31,7 @@
from st2common.rbac.backends import get_rbac_backend
from st2common import router as api_router
from st2common.services import inquiry as inquiry_service
from st2common.util.jsonify import json_decode


__all__ = ["InquiriesController"]
Expand Down Expand Up @@ -89,7 +89,7 @@ def get_all(
# a list of dicts, and then individually convert these to InquiryResponseAPI instances
inquiries = [
inqy_api_models.InquiryResponseAPI.from_model(raw_inquiry, skip_db=True)
for raw_inquiry in json.loads(raw_inquiries.body)
for raw_inquiry in json_decode(raw_inquiries.body)
]

# Repackage into Response with correct headers
Expand Down
3 changes: 3 additions & 0 deletions st2api/st2api/controllers/v1/webhooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ def post(self, hook, webhook_body_api, headers, requester_user):
throw_on_validation_error=True,
)

# NOTE: For url encoded request bodies, values will be bytes instead of unicode and this
# doesn't work with orjson so we first need to "cast" all the values from bytes to unicode

return Response(json=body, status=http_client.ACCEPTED)

def _is_valid_hook(self, hook):
Expand Down
13 changes: 7 additions & 6 deletions st2api/tests/unit/controllers/v1/test_webhooks.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,14 +262,15 @@ def test_json_request_body(self, dispatch_mock):
)
@mock.patch("st2common.transport.reactor.TriggerDispatcher.dispatch")
def test_form_encoded_request_body(self, dispatch_mock):
return
# TODO: Fix on deserialization on API side, body dict values being decoded as bytes
# instead of unicode which breakgs things. Likely issue / bug with form urlencoding
# parsing or perhaps in the test client when sending data
# Send request body as form urlencoded data
if six.PY3:
data = {b"form": [b"test"]}
else:
data = {"form": ["test"]}
data = {"form": ["test"]}

headers = {
"Content-Type": "application/x-www-form-urlencoded",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"St2-Trace-Tag": "tag1",
}

Expand Down Expand Up @@ -378,7 +379,7 @@ def test_authentication_headers_should_be_removed(self, dispatch_mock):
"Cookie": "foo=bar",
}

self.app.post("/v1/webhooks/git", WEBHOOK_1, headers=headers)
self.app.post_json("/v1/webhooks/git", WEBHOOK_1, headers=headers)
self.assertNotIn(
"St2-Api-Key", dispatch_mock.call_args[1]["payload"]["headers"]
)
Expand Down
Loading