-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(replay): More efficient deserialization #1782
Conversation
relay-replays/src/recording.rs
Outdated
struct Helper<'a> { | ||
#[serde(rename = "type")] | ||
ty: u8, | ||
timestamp: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmanallen what confused me in the existing impl: Some event types have timestamp u64
, some have f64
. Shouldn't they all have the same timestamp format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjbayer RRWeb timestamps are u64. Sentry timestamps are f64.
e90f327
to
e8718b1
Compare
T2(Box<FullSnapshotEvent>), | ||
T3(Box<IncrementalSnapshotEvent>), | ||
T4(Box<MetaEvent>), | ||
T5(Box<CustomEvent>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boxing variants reduces the size of the enum itself -> should help against memory fragmentation.
relay-replays/src/serialization.rs
Outdated
|
||
use crate::recording::{DocumentNode, DocumentTypeNode, ElementNode, NodeVariant, TextNode}; | ||
|
||
impl<'de> Deserialize<'de> for NodeVariant { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation was copy-pasted and adapted from the expanded derive(Deserialize)
of a simple, internally tagged enum.
T2(Box<ElementNode>), | ||
T3(Box<TextNode>), // text | ||
T4(Box<TextNode>), // cdata | ||
T5(Box<TextNode>), // comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably pick better names for these variants, but I'll leave that to a follow-up PR.
Co-authored-by: Oleksandr <[email protected]>
// XXX: Temporarily, only the Sentry org will be allowed to parse replays while | ||
// we measure the impact of this change. | ||
if replays_enabled && state.project_state.organization_id == Some(1) { | ||
if replays_enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the change of the original PR. I cherry-picked it in to see the impact on a Canary deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull seems to solve the high memory issue (more time will tell but at this point it looks very promising). How did this branch fix it? Is heap allocated memory freed more easily than stack allocated? Was it boxing the enum variants that fixed it or the other serialization improvements in serialization.rs
? In the future is there a way for me to know when to heap allocate vs stack allocate memory? As always, thank you for your help. Your expertise is appreciated.
edit: @jjbayer tagging you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more general question: The parsed replay format uses a lot of serde_json::Value
and String
where we could have boolean flags or custom enums. Could we replace those with the intended types? I'm pretty sure that allows us to further reduce memory utilization and speed up parsing (though that would need to be verified). Potentially we could even revert some of the boxing because of that (though we'll need to keep it where there's large size differences between variants).
It'll also contribute to a stricter schema that should be easier to work with in the later parts of the pipeline.
Obviously, this would be a functional change with the potential to introduce regressions, so I'm happy for that to go in a follow-up PR. cc @cmanallen
@cmanallen There are actually two separate fixes in this PR: 1. The size of enums. An enum is a discriminator + one of the values. That means, the static size of this type is the size of its largest variant + usually one byte. If you now have a variant that (recursively) contains a lot of data, this blows up the size of your enum even if all the other variants are tiny. For example, the Joris' change reduces the size of Node to exactly one pointer size (8 bytes) by moving data to the heap. However, that heap allocation can now be smaller depending on which exact variant is being allocated. Overall, this means less wasteful memory usage and fewer empty space in between. 2. Recursive Parsing The Sadly, serde doesn't make this straight-forward. There's a built-in derive for enums with "string" tags (discriminator fields), but not if those fields are numbers. We now need to use serde's internal APIs to accomplish the same. |
I can make those optimizations in another PR. My goal is to have a strict schema but unfortunately the RRWeb library has quirks which can lead to deserialization errors when I don't account for the full spectrum of types a field can occupy. I'd like to be more permissive with |
👍 Great explanation, thanks! |
I'm surprised and quite a bit disappointed that clippy did not scream that the enum is huge. I wonder if this is something that is something that we can report upstream. |
@mitsuhiko if I'm not mistaken, that clippy only checks large differences between variant sizes: https://github.com/rust-lang/rust-clippy/blob/96c28d1f69a120de7fcdbc40fb17610a407a4900/clippy_lints/src/large_enum_variant.rs#L18-L19 |
After deploying #1678, we saw a rise in memory consumption. We narrowed down the reason to deserialization of replay recordings, so this PR attempts to replace those deserializers with more efficient versions that do not parse an entire
serde_json::Value
to get the tag (type
,source
) of the enum.A custom deserializer is necessary because serde does not support integer tags for internally tagged enums.
NodeVariant
, based on serde's ownderive(Deserialize)
of internally tagged enums.recording::Event
, based on serde's ownderive(Deserialize)
of internally tagged enums.IncrementalSourceDataVariant
, based on serde's ownderive(Deserialize)
of internally tagged enums.Benchmark comparison
Ran a criterion benchmark on
rrweb.json
. It does not tell us anything about memory consumption, but the reduced cpu usage points to simpler deserialization:Before
After
#skip-changelog