ref(replay): More efficient deserialization #1782

jjbayer · 2023-01-25T15:28:21Z

After deploying #1678, we saw a rise in memory consumption. We narrowed down the reason to deserialization of replay recordings, so this PR attempts to replace those deserializers with more efficient versions that do not parse an entire serde_json::Value to get the tag (type, source) of the enum.

A custom deserializer is necessary because serde does not support integer tags for internally tagged enums.

Custom deserializer for NodeVariant, based on serde's own derive(Deserialize) of internally tagged enums.
Custom deserializer for recording::Event, based on serde's own derive(Deserialize) of internally tagged enums.
Custom deserializer for IncrementalSourceDataVariant, based on serde's own derive(Deserialize) of internally tagged enums.
Box all enum variants.

Benchmark comparison

Ran a criterion benchmark on rrweb.json. It does not tell us anything about memory consumption, but the reduced cpu usage points to simpler deserialization:

Before

rrweb/1                 time:   [142.37 ms 148.17 ms 155.61 ms]

After

rrweb/1                 time:   [31.474 ms 31.801 ms 32.137 ms]

#skip-changelog

relay-replays/src/recording.rs

jjbayer · 2023-01-25T15:31:26Z

relay-replays/src/recording.rs

+        struct Helper<'a> {
+            #[serde(rename = "type")]
+            ty: u8,
+            timestamp: f64,


@cmanallen what confused me in the existing impl: Some event types have timestamp u64, some have f64. Shouldn't they all have the same timestamp format?

@jjbayer RRWeb timestamps are u64. Sentry timestamps are f64.

jjbayer · 2023-01-26T13:07:12Z

relay-replays/src/recording.rs

+    T2(Box<FullSnapshotEvent>),
+    T3(Box<IncrementalSnapshotEvent>),
+    T4(Box<MetaEvent>),
+    T5(Box<CustomEvent>),


Boxing variants reduces the size of the enum itself -> should help against memory fragmentation.

relay-replays/src/serialization.rs

jjbayer · 2023-01-26T13:10:09Z

relay-replays/src/serialization.rs

+
+use crate::recording::{DocumentNode, DocumentTypeNode, ElementNode, NodeVariant, TextNode};
+
+impl<'de> Deserialize<'de> for NodeVariant {


This implementation was copy-pasted and adapted from the expanded derive(Deserialize) of a simple, internally tagged enum.

jjbayer · 2023-01-26T13:10:56Z

relay-replays/src/recording.rs

+    T2(Box<ElementNode>),
+    T3(Box<TextNode>), // text
+    T4(Box<TextNode>), // cdata
+    T5(Box<TextNode>), // comment


We should probably pick better names for these variants, but I'll leave that to a follow-up PR.

Co-authored-by: Oleksandr <[email protected]>

jjbayer · 2023-01-26T15:34:20Z

relay-server/src/actors/processor.rs

-                // XXX: Temporarily, only the Sentry org will be allowed to parse replays while
-                // we measure the impact of this change.
-                if replays_enabled && state.project_state.organization_id == Some(1) {
+                if replays_enabled {


This was the change of the original PR. I cherry-picked it in to see the impact on a Canary deploy.

…stom-serialize

cmanallen

This pull seems to solve the high memory issue (more time will tell but at this point it looks very promising). How did this branch fix it? Is heap allocated memory freed more easily than stack allocated? Was it boxing the enum variants that fixed it or the other serialization improvements in serialization.rs? In the future is there a way for me to know when to heap allocate vs stack allocate memory? As always, thank you for your help. Your expertise is appreciated.

edit: @jjbayer tagging you

relay-replays/Cargo.toml

relay-replays/benches/benchmarks.rs

relay-replays/src/recording.rs

relay-replays/src/serialization.rs

jan-auer

One more general question: The parsed replay format uses a lot of serde_json::Value and String where we could have boolean flags or custom enums. Could we replace those with the intended types? I'm pretty sure that allows us to further reduce memory utilization and speed up parsing (though that would need to be verified). Potentially we could even revert some of the boxing because of that (though we'll need to keep it where there's large size differences between variants).

It'll also contribute to a stricter schema that should be easier to work with in the later parts of the pipeline.

Obviously, this would be a functional change with the potential to introduce regressions, so I'm happy for that to go in a follow-up PR. cc @cmanallen

jan-auer · 2023-01-27T10:39:57Z

How did this branch fix it? Is heap allocated memory freed more easily than stack allocated?

@cmanallen There are actually two separate fixes in this PR:

1. The size of enums.

An enum is a discriminator + one of the values. That means, the static size of this type is the size of its largest variant + usually one byte. If you now have a variant that (recursively) contains a lot of data, this blows up the size of your enum even if all the other variants are tiny.

For example, the Node was 248 bytes large, regardless of whether all that data was needed or not. This always had to be allocated. And since Node is a recursive tree type, this quickly accumulates to a lot of memory needed.

Joris' change reduces the size of Node to exactly one pointer size (8 bytes) by moving data to the heap. However, that heap allocation can now be smaller depending on which exact variant is being allocated. Overall, this means less wasteful memory usage and fewer empty space in between.

2. Recursive Parsing

The Deserialize impl for Node first parsed the entire tree into a Value. Then you match on the top level and recurse. This means at every level of parsing a Node, you parse the tree recursively. This lead to exponential behavior wrt the depth of the tree. With Joris' change, we now only deserialize the top level.

Sadly, serde doesn't make this straight-forward. There's a built-in derive for enums with "string" tags (discriminator fields), but not if those fields are numbers. We now need to use serde's internal APIs to accomplish the same.

cmanallen · 2023-01-27T12:47:51Z

@jan-auer

Could we replace those with the intended types? [...] It'll also contribute to a stricter schema that should be easier to work with in the later parts of the pipeline.

I can make those optimizations in another PR. My goal is to have a strict schema but unfortunately the RRWeb library has quirks which can lead to deserialization errors when I don't account for the full spectrum of types a field can occupy. I'd like to be more permissive with Value for now and introduce stricter typing as we understand the full range of payload variants.

cmanallen · 2023-01-27T12:51:35Z

@jan-auer

Joris' change reduces the size of Node to exactly one pointer size (8 bytes) by moving data to the heap. However, that heap allocation can now be smaller depending on which exact variant is being allocated. Overall, this means less wasteful memory usage and fewer empty space in between.

👍 Great explanation, thanks!

mitsuhiko · 2023-01-27T22:23:56Z

I'm surprised and quite a bit disappointed that clippy did not scream that the enum is huge. I wonder if this is something that is something that we can report upstream.

jjbayer · 2023-01-30T08:08:33Z

I'm surprised and quite a bit disappointed that clippy did not scream that the enum is huge. I wonder if this is something that is something that we can report upstream.

@mitsuhiko if I'm not mistaken, that clippy only checks large differences between variant sizes: https://github.com/rust-lang/rust-clippy/blob/96c28d1f69a120de7fcdbc40fb17610a407a4900/clippy_lints/src/large_enum_variant.rs#L18-L19

jjbayer commented Jan 25, 2023

View reviewed changes

jjbayer added 9 commits January 26, 2023 11:38

simple benchmark

d1755d0

ref

2399402

Use largest fixture

aba3f00

perf: Custom serializer for NodeVariant

0c3e9cb

ref

44ffca6

working serializer

b373c2a

ref: move files

99f176b

forgot to add file

aac99a2

Box everything

e8718b1

jjbayer force-pushed the replay-recording-custom-serialize branch from e90f327 to e8718b1 Compare January 26, 2023 13:03

jjbayer commented Jan 26, 2023

View reviewed changes

jjbayer added 5 commits January 26, 2023 15:05

Add deserializer for incremental source

d95080b

Add serializer for incremental source

5e2ab2a

Add (de)serializer for Event

471c902

clean

c7b3acf

feat: Enable for all projects

430dcf6

jjbayer marked this pull request as ready for review January 26, 2023 15:32

jjbayer requested a review from a team January 26, 2023 15:32

feat(replays): Enable PII scrubbing for all organizations (#1678)

4907193

Co-authored-by: Oleksandr <[email protected]>

jjbayer commented Jan 26, 2023

View reviewed changes

jjbayer requested a review from a team January 26, 2023 15:35

jjbayer added 2 commits January 26, 2023 16:58

ref(lint): clippy

4339797

Merge remote-tracking branch 'origin/master' into replay-recording-cu…

06c9d64

…stom-serialize

cmanallen reviewed Jan 26, 2023

View reviewed changes

jan-auer requested changes Jan 27, 2023

View reviewed changes

jan-auer reviewed Jan 27, 2023

View reviewed changes

ref: Review comments

c7d2297

jjbayer requested a review from jan-auer January 27, 2023 15:18

jjbayer added 2 commits January 27, 2023 16:53

pr comments, part 2

9778b0d

forgot Result::map

01480a3

jan-auer approved these changes Jan 27, 2023

View reviewed changes

jjbayer merged commit 16a0d44 into master Jan 27, 2023

jjbayer deleted the replay-recording-custom-serialize branch January 27, 2023 16:24

JoshFerge mentioned this pull request Feb 4, 2023

test(replays): jemalloc and serde struct deserialization #1817

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(replay): More efficient deserialization #1782

ref(replay): More efficient deserialization #1782

jjbayer commented Jan 25, 2023 •

edited

Loading

jjbayer Jan 25, 2023

cmanallen Jan 26, 2023

jjbayer Jan 26, 2023

jjbayer Jan 26, 2023

jjbayer Jan 26, 2023

jjbayer Jan 26, 2023

cmanallen left a comment •

edited

Loading

jan-auer left a comment •

edited

Loading

jan-auer commented Jan 27, 2023 •

edited

Loading

cmanallen commented Jan 27, 2023

cmanallen commented Jan 27, 2023

mitsuhiko commented Jan 27, 2023

jjbayer commented Jan 30, 2023


		use crate::recording::{DocumentNode, DocumentTypeNode, ElementNode, NodeVariant, TextNode};

		impl<'de> Deserialize<'de> for NodeVariant {

ref(replay): More efficient deserialization #1782

ref(replay): More efficient deserialization #1782

Conversation

jjbayer commented Jan 25, 2023 • edited Loading

Benchmark comparison

Before

After

jjbayer Jan 25, 2023

Choose a reason for hiding this comment

cmanallen Jan 26, 2023

Choose a reason for hiding this comment

jjbayer Jan 26, 2023

Choose a reason for hiding this comment

jjbayer Jan 26, 2023

Choose a reason for hiding this comment

jjbayer Jan 26, 2023

Choose a reason for hiding this comment

jjbayer Jan 26, 2023

Choose a reason for hiding this comment

cmanallen left a comment • edited Loading

Choose a reason for hiding this comment

jan-auer left a comment • edited Loading

Choose a reason for hiding this comment

jan-auer commented Jan 27, 2023 • edited Loading

cmanallen commented Jan 27, 2023

cmanallen commented Jan 27, 2023

mitsuhiko commented Jan 27, 2023

jjbayer commented Jan 30, 2023

jjbayer commented Jan 25, 2023 •

edited

Loading

cmanallen left a comment •

edited

Loading

jan-auer left a comment •

edited

Loading

jan-auer commented Jan 27, 2023 •

edited

Loading