-
-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize storage (serialization and de-serilization) of very large dictionaries inside MongoDB #4846
Optimize storage (serialization and de-serilization) of very large dictionaries inside MongoDB #4846
Changes from 64 commits
a93f9c2
a89d658
fe5e33d
f0919c9
59e87f9
2024a1e
2f969f8
5971302
f19f0fd
81672f2
2a15e8b
44bdbad
052fde7
dbc5f3d
424f3d7
ff485ca
6b1abf0
89617ff
fa03d2f
68feb47
4cb7a1d
e1d085d
d9ad62b
e20242f
95de8ff
42f70e7
8017e2a
cbb4cb1
49b4033
d93bd9d
c8f4022
88151da
a239061
7cd3ec4
82965ef
eaccea2
9682fac
bc9e9c2
2ac3fda
49d6134
242c676
75ab254
4933573
e8745d8
560d616
147a02b
405e039
78f89ab
7810aa8
b31c006
5988b5c
71791b3
1d178df
053bd93
824c2ea
1a99eee
ca49b10
4289b9e
9f0a6ba
4793ba3
dbc1460
af961fb
64dbe5a
167ca3f
8902d06
6eabd5b
2c2cb74
93d859c
2ea37db
0f293ee
a46831e
c8c3b91
b2ed03b
9feb81e
9f4f523
d0f0d78
b0dea78
756b916
a47461b
2005126
086be02
8e0c312
cd9eba7
1a932ca
e72215f
9e336d8
d373cf5
224dfba
3cc71ef
ac4efbd
051a691
94b6298
d1df1cd
b13c195
51f811c
4672495
a25efa6
bdd8e3c
a098315
e818158
48d612d
71ffb1a
46ba2c9
a27245f
1a91394
cbd0259
3b47856
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
mail-parser>=3.9.1,<3.10.0 | ||
mail-parser==3.15.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,6 @@ | |
import tempfile | ||
import socket | ||
|
||
import six | ||
import mock | ||
import mailparser | ||
|
||
|
@@ -126,20 +125,12 @@ def test_sendmail_utf8_subject_and_body(self): | |
"attachments": "", | ||
} | ||
|
||
if six.PY2: | ||
expected_body = ( | ||
"Hello there 😃😃.\n" | ||
"<br><br>\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
else: | ||
expected_body = ( | ||
"Hello there \\U0001f603\\U0001f603.\n" | ||
"<br><br>\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
expected_body = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like we've kept the code from the if - which if I read it was only if we were running on python2, so should we not have kept the else logic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
"Hello there 😃😃.\n" | ||
"<br><br>\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
|
||
status, _, email_data, message = self._run_action( | ||
action_parameters=action_parameters | ||
|
@@ -167,18 +158,11 @@ def test_sendmail_utf8_subject_and_body(self): | |
"attachments": "", | ||
} | ||
|
||
if six.PY2: | ||
expected_body = ( | ||
"Hello there 😃😃.\n\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
else: | ||
expected_body = ( | ||
"Hello there \\U0001f603\\U0001f603.\n\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
expected_body = ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like we've kept the code from the if - which if I read it was only if we were running on python2, so should we not have kept the else logic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The version of the used library has also been changed / upgraded so now the behavior is different / more correct. To clarify - we don't want to use escape sequences since that is Python specific and would not display correctly in the email clients, etc. I assume we just did that change when working on Python 3 support a long time ago since it was the easiest, but technically not the most correct one :) |
||
"Hello there 😃😃.\n\n" | ||
"This message was generated by StackStorm action " | ||
"send_mail running on %s" % (HOSTNAME) | ||
) | ||
|
||
status, _, email_data, message = self._run_action( | ||
action_parameters=action_parameters | ||
|
@@ -271,10 +255,6 @@ def _run_action(self, action_parameters): | |
email_data = result["stdout"] | ||
email_data = email_data.split("\n")[:-2] | ||
email_data = "\n".join(email_data) | ||
|
||
if six.PY2 and isinstance(email_data, six.text_type): | ||
email_data = email_data.encode("utf-8") | ||
|
||
message = mailparser.parse_from_string(email_data) | ||
else: | ||
email_data = None | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
name: orquesta-data-flow-large-data | ||
description: A basic workflow which passes large JSON data around. | ||
runner_type: orquesta | ||
entry_point: workflows/orquesta-data-flow-large-data.yaml | ||
enabled: true | ||
parameters: | ||
file_path: | ||
type: string | ||
required: true | ||
description: "Path to the JSON fixture file to use." |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
name: python_runner_load_and_print_fixture | ||
description: Action which loads provided JSON fixture file, parses it and returns it as an action result. Useful when testing and benchmarking execution save timing. | ||
runner_type: "python-script" | ||
enabled: true | ||
entry_point: pythonactions/load_and_print_fixture.py | ||
parameters: | ||
file_path: | ||
type: "string" | ||
required: true | ||
description: "Path to the JSON fixture file to use." |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
import orjson | ||
|
||
from st2common.runners.base_action import Action | ||
|
||
|
||
class LoadAndPrintFixtureAction(Action): | ||
def run(self, file_path: str): | ||
with open(file_path, "r") as fp: | ||
content = fp.read() | ||
|
||
data = orjson.loads(content) | ||
return data |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
version: 1.0 | ||
|
||
description: A basic workflow which passes large data around. | ||
|
||
input: | ||
- file_path | ||
- b1: <% ctx().file_path %> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Really? You can define vars in input that are not defined in the input parameters metadata? 🤯 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually just used an existing sample workflow for reference :D |
||
|
||
vars: | ||
- a2: <% ctx().b1 %> | ||
- b2: <% ctx().a2 %> | ||
|
||
output: | ||
- a5: <% ctx().b4 %> | ||
- b5: <% ctx().a5 %> | ||
|
||
tasks: | ||
task1: | ||
action: core.echo | ||
input: | ||
message: <% ctx().b2 %> | ||
next: | ||
- when: <% succeeded() %> | ||
publish: | ||
- a3: <% result().stdout %> | ||
- b3: <% ctx().a3 %> | ||
do: task2 | ||
task2: | ||
action: examples.load_and_print_fixture | ||
input: | ||
file_path: <% ctx().file_path %> | ||
next: | ||
- when: <% succeeded() %> | ||
publish: a4=<% result().result %> b4=<% ctx().a4 %> | ||
do: task3 | ||
task3: | ||
action: core.noop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github actions works much better if we use more workflows because you can only restart jobs at the Workflow level, not per job.
So, when we add the benchmark job, it should be in a separate Workflow file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. I will likely add it in the future to run once a week or something since the job is very slow on CI (~30 minutes).