-
-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize storage (serialization and de-serilization) of very large dictionaries inside MongoDB #4846
Commits on Dec 20, 2019
-
Add new JSONDictField which allows us to more efficently store,
serialize and unserialize large dictionary data (such as result field, etc.).
Configuration menu - View commit details
-
Copy full SHA for a93f9c2 - Browse repository at this point
Copy the full SHA a93f9c2View commit details
Commits on Feb 21, 2020
-
Add new JSONDictField which allows us to more efficently store,
serialize and unserialize large dictionary data (such as result field, etc.).
Configuration menu - View commit details
-
Copy full SHA for a89d658 - Browse repository at this point
Copy the full SHA a89d658View commit details -
Add a feature flag for using new json dict field, set it to false
(opt-in) by default.
Configuration menu - View commit details
-
Copy full SHA for fe5e33d - Browse repository at this point
Copy the full SHA fe5e33dView commit details -
Use new JSON dict field for dictionaries which can be very large where
escaping the values adds tons of overhead.
Configuration menu - View commit details
-
Copy full SHA for f0919c9 - Browse repository at this point
Copy the full SHA f0919c9View commit details
Commits on Feb 22, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 59e87f9 - Browse repository at this point
Copy the full SHA 59e87f9View commit details
Commits on Feb 18, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 2024a1e - Browse repository at this point
Copy the full SHA 2024a1eView commit details -
Add a micro-benchmark which comparsed execution save + read times for
using two different approaches for serializing execution / live action result.
Configuration menu - View commit details
-
Copy full SHA for 2f969f8 - Browse repository at this point
Copy the full SHA 2f969f8View commit details -
Add another micro benchmark fixture which represents a dictionary with a
single key with a large value.
Configuration menu - View commit details
-
Copy full SHA for 5971302 - Browse repository at this point
Copy the full SHA 5971302View commit details -
Add micro-benchmark for escape_chars() and unescape_chars() and update
all JSON fixture files so they contain at least one key with character which needs to be escaped.
Configuration menu - View commit details
-
Copy full SHA for f19f0fd - Browse repository at this point
Copy the full SHA f19f0fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 81672f2 - Browse repository at this point
Copy the full SHA 81672f2View commit details -
Merge branch 'optimize_escaped_dict_fields' of github.com:StackStorm/…
…st2 into optimize_escaped_dict_fields
Configuration menu - View commit details
-
Copy full SHA for 2a15e8b - Browse repository at this point
Copy the full SHA 2a15e8bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 44bdbad - Browse repository at this point
Copy the full SHA 44bdbadView commit details -
Configuration menu - View commit details
-
Copy full SHA for 052fde7 - Browse repository at this point
Copy the full SHA 052fde7View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbc5f3d - Browse repository at this point
Copy the full SHA dbc5f3dView commit details
Commits on Feb 19, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 424f3d7 - Browse repository at this point
Copy the full SHA 424f3d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff485ca - Browse repository at this point
Copy the full SHA ff485caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b1abf0 - Browse repository at this point
Copy the full SHA 6b1abf0View commit details -
Add new "finalized_timestamp" field to the Execution and LiveAction
object. This will provide us better visibility into how long action runner needs to process the execution completely - this means not just the runner running the action, but also the action runner container persisting the result and corresponding objects to the database.
Configuration menu - View commit details
-
Copy full SHA for 89617ff - Browse repository at this point
Copy the full SHA 89617ffView commit details -
Configuration menu - View commit details
-
Copy full SHA for fa03d2f - Browse repository at this point
Copy the full SHA fa03d2fView commit details -
Update more affected and broken tests to correctly specify a dict value
for a dict field.
Configuration menu - View commit details
-
Copy full SHA for 68feb47 - Browse repository at this point
Copy the full SHA 68feb47View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4cb7a1d - Browse repository at this point
Copy the full SHA 4cb7a1dView commit details -
Configuration menu - View commit details
-
Copy full SHA for e1d085d - Browse repository at this point
Copy the full SHA e1d085dView commit details -
Add python runner action which can be used for testing and timing large
execution result save times.
Configuration menu - View commit details
-
Copy full SHA for d9ad62b - Browse repository at this point
Copy the full SHA d9ad62bView commit details -
Configuration menu - View commit details
-
Copy full SHA for e20242f - Browse repository at this point
Copy the full SHA e20242fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 95de8ff - Browse repository at this point
Copy the full SHA 95de8ffView commit details -
Update the field and implement another approach which uses additional
header for the binary field value. This header tells us which serialization format and compression (if any) is used for a specific field value. Using a header format gives us more, flexibility, makes it more future proof (e.g. ability to change the format in the future) and also ability to implement things such as per-field compression.
Configuration menu - View commit details
-
Copy full SHA for 42f70e7 - Browse repository at this point
Copy the full SHA 42f70e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8017e2a - Browse repository at this point
Copy the full SHA 8017e2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for cbb4cb1 - Browse repository at this point
Copy the full SHA cbb4cb1View commit details
Commits on Feb 20, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 49b4033 - Browse repository at this point
Copy the full SHA 49b4033View commit details -
Configuration menu - View commit details
-
Copy full SHA for d93bd9d - Browse repository at this point
Copy the full SHA d93bd9dView commit details
Commits on Feb 21, 2021
-
Also apply the same field optimizations changes to all the workflows
related models. Based on end to end testings, this results in massive speed ups for workflows which pass larger data sets around. See #4846 (comment) for some numbers and details.
Configuration menu - View commit details
-
Copy full SHA for c8f4022 - Browse repository at this point
Copy the full SHA c8f4022View commit details -
Configuration menu - View commit details
-
Copy full SHA for 88151da - Browse repository at this point
Copy the full SHA 88151daView commit details -
For now, only utilize JSONDictField for fields which are for all
purposes already "immutable" and make sure we always write them out to the database, even on partial dict update. Also add tests for it.
Configuration menu - View commit details
-
Copy full SHA for a239061 - Browse repository at this point
Copy the full SHA a239061View commit details -
Implement dict value change tracking for our custom JSONDictField.
This dict value tracking allows us to track when a dict item value has changed and only write the value to the database on existing document / model update in case it has changed. This is a very important property since it allows us to implement efficient partial document updates. With that change, JSONDictField now also works in exactly the same manner as existing mongoengine DictField field type. Also add tests for various edge cases which would fail if value change tracking was not correctly implemented or working.
Configuration menu - View commit details
-
Copy full SHA for 7cd3ec4 - Browse repository at this point
Copy the full SHA 7cd3ec4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 82965ef - Browse repository at this point
Copy the full SHA 82965efView commit details -
Add orquesta workflow action which can be used to test passing large
around around (both - returning it as a result and also as a next task context).
Configuration menu - View commit details
-
Copy full SHA for eaccea2 - Browse repository at this point
Copy the full SHA eaccea2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9682fac - Browse repository at this point
Copy the full SHA 9682facView commit details -
Configuration menu - View commit details
-
Copy full SHA for bc9e9c2 - Browse repository at this point
Copy the full SHA bc9e9c2View commit details -
Apply same optimizatons to trigger_instance.payload field.
This way we also get better throughput and lower CPU utilization for rules engine when working with larger trigger instances.
Configuration menu - View commit details
-
Copy full SHA for 2ac3fda - Browse repository at this point
Copy the full SHA 2ac3fdaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 49d6134 - Browse repository at this point
Copy the full SHA 49d6134View commit details -
Configuration menu - View commit details
-
Copy full SHA for 242c676 - Browse repository at this point
Copy the full SHA 242c676View commit details
Commits on Feb 23, 2021
-
Also add benchmark for model with multiple fields of the same type and
also for the native dict field type.
Configuration menu - View commit details
-
Copy full SHA for 75ab254 - Browse repository at this point
Copy the full SHA 75ab254View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4933573 - Browse repository at this point
Copy the full SHA 4933573View commit details -
Updat the new field type and make sure we also correctly track changes
in dict list items and mark parent dict field as changed if any dict list item has changed.
Configuration menu - View commit details
-
Copy full SHA for e8745d8 - Browse repository at this point
Copy the full SHA e8745d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 560d616 - Browse repository at this point
Copy the full SHA 560d616View commit details
Commits on Feb 24, 2021
-
Simplify the code - instead of having another finalized_timestamp
attribute, update end_timestamp instead at the very end. This way execution duration will be more accurately reported.
Configuration menu - View commit details
-
Copy full SHA for 147a02b - Browse repository at this point
Copy the full SHA 147a02bView commit details -
Update st2 execution get command to also display log attribute by
default. This should make it easier to infer actual execution run time duration and state transitions.
Configuration menu - View commit details
-
Copy full SHA for 405e039 - Browse repository at this point
Copy the full SHA 405e039View commit details -
Configuration menu - View commit details
-
Copy full SHA for 78f89ab - Browse repository at this point
Copy the full SHA 78f89abView commit details
Commits on Feb 25, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7810aa8 - Browse repository at this point
Copy the full SHA 7810aa8View commit details -
Update affected tests - live action and action execution timestamp may
now be a bit different, depending on how long it takes to persist each corresponding object in the database. Also fix tests to utilize correct dict type for the result.
Configuration menu - View commit details
-
Copy full SHA for b31c006 - Browse repository at this point
Copy the full SHA b31c006View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5988b5c - Browse repository at this point
Copy the full SHA 5988b5cView commit details
Commits on Feb 26, 2021
-
micro-benchmarks task is very slow on CI so for now, only run it on
nightly scheduled basis.
Configuration menu - View commit details
-
Copy full SHA for 71791b3 - Browse repository at this point
Copy the full SHA 71791b3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d178df - Browse repository at this point
Copy the full SHA 1d178dfView commit details
Commits on Feb 27, 2021
-
Include the following changes which makes action registration 15-20%
faster (especially visible with packs which have many actions such as the aws one): * Utilize ``fast_deepcopy`` for making deep copies of dicts in json schema code (that code only works with simple native JSON type so this function can be used without any issues). * Update registrator code to use runner db cache. This means that instead of doing N queries where N is number of actions to be registered, now we will do only M queries where M is number of unique runners actions utilize (in most cases thats < 4). * Update existing action retrieval code to only retrieve fields we need (id, pack, ref). We really only need ID to check if the object already exists and perform upsert. Retrieving all the fields we don't use is wasteful and slow for actions with many parameters. * Use C version of the YAML safe loader when loading YAML metadata. C version is a lot faster.
Configuration menu - View commit details
-
Copy full SHA for 053bd93 - Browse repository at this point
Copy the full SHA 053bd93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 824c2ea - Browse repository at this point
Copy the full SHA 824c2eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a99eee - Browse repository at this point
Copy the full SHA 1a99eeeView commit details -
Configuration menu - View commit details
-
Copy full SHA for ca49b10 - Browse repository at this point
Copy the full SHA ca49b10View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4289b9e - Browse repository at this point
Copy the full SHA 4289b9eView commit details -
Update more places in the code where we only work with simple / native
JSON types to utilize fast_deepcopy() instead of copy.deepcopy(). This should result in fast copy times and as such faster secret masking, etc.
Configuration menu - View commit details
-
Copy full SHA for 9f0a6ba - Browse repository at this point
Copy the full SHA 9f0a6baView commit details -
Update nose tests target to exclude resource registrar debug log
messages by default. This should make troubleshooting failures a lot easier - before that change, those log messages would add tons of noise (we load resource fixtures for each single test) and make actual test failures hard to troubleshoot.
Configuration menu - View commit details
-
Copy full SHA for 4793ba3 - Browse repository at this point
Copy the full SHA 4793ba3View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbc1460 - Browse repository at this point
Copy the full SHA dbc1460View commit details
Commits on Mar 6, 2021
-
Merge branch 'master' of github.com:StackStorm/st2 into optimize_esca…
…ped_dict_fieldsA Also format new code with black.
Configuration menu - View commit details
-
Copy full SHA for af961fb - Browse repository at this point
Copy the full SHA af961fbView commit details -
Use lazy import since right now zstandard is only used for tests and
benchmarks and it's a testing dependency.
Configuration menu - View commit details
-
Copy full SHA for 64dbe5a - Browse repository at this point
Copy the full SHA 64dbe5aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 167ca3f - Browse repository at this point
Copy the full SHA 167ca3fView commit details
Commits on Mar 7, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 8902d06 - Browse repository at this point
Copy the full SHA 8902d06View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6eabd5b - Browse repository at this point
Copy the full SHA 6eabd5bView commit details -
Make sure we don't call unescape_chars() on the JSONDictField field
values since it's not required and may break things by decoding bytes to string and adding trailing character.
Configuration menu - View commit details
-
Copy full SHA for 2c2cb74 - Browse repository at this point
Copy the full SHA 2c2cb74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 93d859c - Browse repository at this point
Copy the full SHA 93d859cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ea37db - Browse repository at this point
Copy the full SHA 2ea37dbView commit details
Commits on Mar 12, 2021
-
Add additional timer metrics to the action runner which will provide
better operational visibility into some steps of the action runner.
Configuration menu - View commit details
-
Copy full SHA for 0f293ee - Browse repository at this point
Copy the full SHA 0f293eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for a46831e - Browse repository at this point
Copy the full SHA a46831eView commit details
Commits on Mar 14, 2021
-
Configuration menu - View commit details
-
Copy full SHA for c8c3b91 - Browse repository at this point
Copy the full SHA c8c3b91View commit details -
Remove incorrect log message which was causing unncessary log churn in
action runner. That exception does not represent a fatal error so we should not log anything. In fact, it's quite a common and expected scenario that a key doesn't contain a JSON string.
Configuration menu - View commit details
-
Copy full SHA for b2ed03b - Browse repository at this point
Copy the full SHA b2ed03bView commit details
Commits on Mar 15, 2021
-
Also json instead of orjson so action can also be used with older
versions of StackStorm.
Configuration menu - View commit details
-
Copy full SHA for 9feb81e - Browse repository at this point
Copy the full SHA 9feb81eView commit details -
Store "result_size field on the ActionExecutionDB.
This field is populated lazily on model save. It will allow us to implement more efficient data retrieval in the web ui and other clients since we will be able to avoid retrieving the whole result for executions with very large results. "
Configuration menu - View commit details
-
Copy full SHA for 9f4f523 - Browse repository at this point
Copy the full SHA 9f4f523View commit details -
Add new WIP API endpoint for returning / downloading raw action
execution result. This endpoint is to be used with webui for executions with large results.
Configuration menu - View commit details
-
Copy full SHA for d0f0d78 - Browse repository at this point
Copy the full SHA d0f0d78View commit details
Commits on Mar 16, 2021
-
Configuration menu - View commit details
-
Copy full SHA for b0dea78 - Browse repository at this point
Copy the full SHA b0dea78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 756b916 - Browse repository at this point
Copy the full SHA 756b916View commit details -
Update "result_size" field for action execution and live action DB model
inside action runner at the end after save. I was hoping we will be able to avoid one additional serialization, but sadly we can't if we don't want to massively hack and monkey patch mongoengine. And that monkeypatching is not worth it since serialization is fast enough. To put things into perspective - takes takes 7ms for 4 MB result which is nothing compared to other DB operations durations. And for smaller results it even gets to the sub ms aka nanosecond range.
Configuration menu - View commit details
-
Copy full SHA for a47461b - Browse repository at this point
Copy the full SHA a47461bView commit details -
Move calculation and setting of the result_size field to the
update_execution() serivce method and don't update end timestamp for liveaction and execution DB model at the end. Technically with new perf optimizations code, DB operations are very fast already and this way we avoid 2 additional queries and save up to 500ms when storing very large executions. And doing it inside that function also means we can correctly update it for workflow executions when they finish.
Configuration menu - View commit details
-
Copy full SHA for 2005126 - Browse repository at this point
Copy the full SHA 2005126View commit details -
Configuration menu - View commit details
-
Copy full SHA for 086be02 - Browse repository at this point
Copy the full SHA 086be02View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e0c312 - Browse repository at this point
Copy the full SHA 8e0c312View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd9eba7 - Browse repository at this point
Copy the full SHA cd9eba7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a932ca - Browse repository at this point
Copy the full SHA 1a932caView commit details -
Configuration menu - View commit details
-
Copy full SHA for e72215f - Browse repository at this point
Copy the full SHA e72215fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9e336d8 - Browse repository at this point
Copy the full SHA 9e336d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for d373cf5 - Browse repository at this point
Copy the full SHA d373cf5View commit details
Commits on Mar 18, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 224dfba - Browse repository at this point
Copy the full SHA 224dfbaView commit details -
Add micro benchmark which times saving and reading large string value
from a database using string and binary field type.
Configuration menu - View commit details
-
Copy full SHA for 3cc71ef - Browse repository at this point
Copy the full SHA 3cc71efView commit details -
Merge branch 'optimize_escaped_dict_fields' of github.com:StackStorm/…
…st2 into optimize_escaped_dict_fields
Configuration menu - View commit details
-
Copy full SHA for ac4efbd - Browse repository at this point
Copy the full SHA ac4efbdView commit details
Commits on Mar 19, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 051a691 - Browse repository at this point
Copy the full SHA 051a691View commit details -
Update CLI to use C version of the YAML safe dumper when pretty
formatting execution result for display and orjson when parsing API response. This should result in "st2 execution get" and other commands to finish faster, especially when working with large executions. For example, locally running st2 execution get on execution with 8 MB result takes 18 seconds before this change and less than 6 seconds with this change.
Configuration menu - View commit details
-
Copy full SHA for 94b6298 - Browse repository at this point
Copy the full SHA 94b6298View commit details -
Configuration menu - View commit details
-
Copy full SHA for d1df1cd - Browse repository at this point
Copy the full SHA d1df1cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for b13c195 - Browse repository at this point
Copy the full SHA b13c195View commit details -
Log a warning message if pyyaml C bindings are not available since it
means YAML loading and serialization will be significantly slower.
Configuration menu - View commit details
-
Copy full SHA for 51f811c - Browse repository at this point
Copy the full SHA 51f811cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4672495 - Browse repository at this point
Copy the full SHA 4672495View commit details -
Configuration menu - View commit details
-
Copy full SHA for a25efa6 - Browse repository at this point
Copy the full SHA a25efa6View commit details -
Configuration menu - View commit details
-
Copy full SHA for bdd8e3c - Browse repository at this point
Copy the full SHA bdd8e3cView commit details -
Configuration menu - View commit details
-
Copy full SHA for a098315 - Browse repository at this point
Copy the full SHA a098315View commit details -
Configuration menu - View commit details
-
Copy full SHA for e818158 - Browse repository at this point
Copy the full SHA e818158View commit details -
For performance reasons, use udatetime library for parsing rfc3339 /
iso8601 date strings where possible.
Configuration menu - View commit details
-
Copy full SHA for 48d612d - Browse repository at this point
Copy the full SHA 48d612dView commit details -
ujson is not only used for tests / benchmarks so move it to
tests-requirements.
Configuration menu - View commit details
-
Copy full SHA for 71ffb1a - Browse repository at this point
Copy the full SHA 71ffb1aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 46ba2c9 - Browse repository at this point
Copy the full SHA 46ba2c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for a27245f - Browse repository at this point
Copy the full SHA a27245fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a91394 - Browse repository at this point
Copy the full SHA 1a91394View commit details
Commits on Mar 20, 2021
-
Apply suggestions from code review
Co-authored-by: blag <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cbd0259 - Browse repository at this point
Copy the full SHA cbd0259View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b47856 - Browse repository at this point
Copy the full SHA 3b47856View commit details