Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(replay): More efficient deserialization #1782

Merged
merged 20 commits into from
Jan 27, 2023

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jan 25, 2023

After deploying #1678, we saw a rise in memory consumption. We narrowed down the reason to deserialization of replay recordings, so this PR attempts to replace those deserializers with more efficient versions that do not parse an entire serde_json::Value to get the tag (type, source) of the enum.

A custom deserializer is necessary because serde does not support integer tags for internally tagged enums.

  • Custom deserializer for NodeVariant, based on serde's own derive(Deserialize) of internally tagged enums.
  • Custom deserializer for recording::Event, based on serde's own derive(Deserialize) of internally tagged enums.
  • Custom deserializer for IncrementalSourceDataVariant, based on serde's own derive(Deserialize) of internally tagged enums.
  • Box all enum variants.

Benchmark comparison

Ran a criterion benchmark on rrweb.json. It does not tell us anything about memory consumption, but the reduced cpu usage points to simpler deserialization:

Before

rrweb/1                 time:   [142.37 ms 148.17 ms 155.61 ms]

After

rrweb/1                 time:   [31.474 ms 31.801 ms 32.137 ms]

#skip-changelog

relay-replays/src/recording.rs Outdated Show resolved Hide resolved
struct Helper<'a> {
#[serde(rename = "type")]
ty: u8,
timestamp: f64,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmanallen what confused me in the existing impl: Some event types have timestamp u64, some have f64. Shouldn't they all have the same timestamp format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbayer RRWeb timestamps are u64. Sentry timestamps are f64.

@jjbayer jjbayer force-pushed the replay-recording-custom-serialize branch from e90f327 to e8718b1 Compare January 26, 2023 13:03
Comment on lines +241 to +244
T2(Box<FullSnapshotEvent>),
T3(Box<IncrementalSnapshotEvent>),
T4(Box<MetaEvent>),
T5(Box<CustomEvent>),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boxing variants reduces the size of the enum itself -> should help against memory fragmentation.

relay-replays/src/serialization.rs Outdated Show resolved Hide resolved

use crate::recording::{DocumentNode, DocumentTypeNode, ElementNode, NodeVariant, TextNode};

impl<'de> Deserialize<'de> for NodeVariant {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation was copy-pasted and adapted from the expanded derive(Deserialize) of a simple, internally tagged enum.

T2(Box<ElementNode>),
T3(Box<TextNode>), // text
T4(Box<TextNode>), // cdata
T5(Box<TextNode>), // comment
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably pick better names for these variants, but I'll leave that to a follow-up PR.

@jjbayer jjbayer marked this pull request as ready for review January 26, 2023 15:32
@jjbayer jjbayer requested a review from a team January 26, 2023 15:32
// XXX: Temporarily, only the Sentry org will be allowed to parse replays while
// we measure the impact of this change.
if replays_enabled && state.project_state.organization_id == Some(1) {
if replays_enabled {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the change of the original PR. I cherry-picked it in to see the impact on a Canary deploy.

@jjbayer jjbayer requested a review from a team January 26, 2023 15:35
Copy link
Member

@cmanallen cmanallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull seems to solve the high memory issue (more time will tell but at this point it looks very promising). How did this branch fix it? Is heap allocated memory freed more easily than stack allocated? Was it boxing the enum variants that fixed it or the other serialization improvements in serialization.rs? In the future is there a way for me to know when to heap allocate vs stack allocate memory? As always, thank you for your help. Your expertise is appreciated.

edit: @jjbayer tagging you

relay-replays/Cargo.toml Show resolved Hide resolved
relay-replays/benches/benchmarks.rs Outdated Show resolved Hide resolved
relay-replays/src/recording.rs Outdated Show resolved Hide resolved
relay-replays/src/recording.rs Outdated Show resolved Hide resolved
relay-replays/src/recording.rs Outdated Show resolved Hide resolved
relay-replays/src/serialization.rs Outdated Show resolved Hide resolved
relay-replays/src/serialization.rs Outdated Show resolved Hide resolved
relay-replays/src/serialization.rs Outdated Show resolved Hide resolved
relay-replays/src/serialization.rs Outdated Show resolved Hide resolved
relay-replays/src/serialization.rs Outdated Show resolved Hide resolved
Copy link
Member

@jan-auer jan-auer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more general question: The parsed replay format uses a lot of serde_json::Value and String where we could have boolean flags or custom enums. Could we replace those with the intended types? I'm pretty sure that allows us to further reduce memory utilization and speed up parsing (though that would need to be verified). Potentially we could even revert some of the boxing because of that (though we'll need to keep it where there's large size differences between variants).

It'll also contribute to a stricter schema that should be easier to work with in the later parts of the pipeline.

Obviously, this would be a functional change with the potential to introduce regressions, so I'm happy for that to go in a follow-up PR. cc @cmanallen

@jan-auer
Copy link
Member

jan-auer commented Jan 27, 2023

How did this branch fix it? Is heap allocated memory freed more easily than stack allocated?

@cmanallen There are actually two separate fixes in this PR:

1. The size of enums.

An enum is a discriminator + one of the values. That means, the static size of this type is the size of its largest variant + usually one byte. If you now have a variant that (recursively) contains a lot of data, this blows up the size of your enum even if all the other variants are tiny.

For example, the Node was 248 bytes large, regardless of whether all that data was needed or not. This always had to be allocated. And since Node is a recursive tree type, this quickly accumulates to a lot of memory needed.

Joris' change reduces the size of Node to exactly one pointer size (8 bytes) by moving data to the heap. However, that heap allocation can now be smaller depending on which exact variant is being allocated. Overall, this means less wasteful memory usage and fewer empty space in between.

2. Recursive Parsing

The Deserialize impl for Node first parsed the entire tree into a Value. Then you match on the top level and recurse. This means at every level of parsing a Node, you parse the tree recursively. This lead to exponential behavior wrt the depth of the tree. With Joris' change, we now only deserialize the top level.

Sadly, serde doesn't make this straight-forward. There's a built-in derive for enums with "string" tags (discriminator fields), but not if those fields are numbers. We now need to use serde's internal APIs to accomplish the same.

@cmanallen
Copy link
Member

@jan-auer

Could we replace those with the intended types? [...] It'll also contribute to a stricter schema that should be easier to work with in the later parts of the pipeline.

I can make those optimizations in another PR. My goal is to have a strict schema but unfortunately the RRWeb library has quirks which can lead to deserialization errors when I don't account for the full spectrum of types a field can occupy. I'd like to be more permissive with Value for now and introduce stricter typing as we understand the full range of payload variants.

@cmanallen
Copy link
Member

@jan-auer

Joris' change reduces the size of Node to exactly one pointer size (8 bytes) by moving data to the heap. However, that heap allocation can now be smaller depending on which exact variant is being allocated. Overall, this means less wasteful memory usage and fewer empty space in between.

👍 Great explanation, thanks!

@jjbayer jjbayer requested a review from jan-auer January 27, 2023 15:18
@jjbayer jjbayer merged commit 16a0d44 into master Jan 27, 2023
@jjbayer jjbayer deleted the replay-recording-custom-serialize branch January 27, 2023 16:24
@mitsuhiko
Copy link
Member

I'm surprised and quite a bit disappointed that clippy did not scream that the enum is huge. I wonder if this is something that is something that we can report upstream.

@jjbayer
Copy link
Member Author

jjbayer commented Jan 30, 2023

I'm surprised and quite a bit disappointed that clippy did not scream that the enum is huge. I wonder if this is something that is something that we can report upstream.

@mitsuhiko if I'm not mistaken, that clippy only checks large differences between variant sizes: https://github.com/rust-lang/rust-clippy/blob/96c28d1f69a120de7fcdbc40fb17610a407a4900/clippy_lints/src/large_enum_variant.rs#L18-L19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants