(best way to) estimate event size #19781
Unanswered
srstrickland
asked this question in
Q&A
Replies: 1 comment
-
Hey! That would definitely be one way to do it though, as you note, it would be expensive. There is actually a trait on events in Vector that exposes a function to estimate the in-memory size of the event: vector/lib/vector-common/src/byte_size_of.rs Lines 12 to 32 in 0dce776 As well as another trait to estimate the encoded JSON size of the event: vector/lib/vector-core/src/event/estimated_json_encoded_size_of.rs Lines 43 to 45 in 0dce776 I think these would just need to be exposed in VRL by adding these functions to the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Recently we had a rogue process start spamming megabyte log lines, and needless to say bad things started happening. We installed a quick fix at the agent level to just drop anything whose
.message
field exceeds some threshold (via strlen), and it got us out of the woods. Maybe in practice, checking themessage
field is sufficient, since all log messages start there no matter the source. But we have a distributed topology, and don't control all the agents (configs) in our ecosystem, so I need something similar enforced at the "central" layer (which receives data from all the various agents). And technically at this point, it's possible that some upstream agent has already parsed a gigantic message into other fields, so it may be insufficient to check only the message field.The simplest way I've found to estimate payload size is to just encode as json and take a strlen. Obviously this is an overshot, but since a lot of data is serialized as JSON on the way out, it's not unreasonable. But adding a full serialization just to count the bytes feels like overkill, and I'm wondering if there's a better way. I couldn't find anything in the VRL docs, so at the moment this is all I have:
Is there another option? I figure since vector is doing a lot of monitoring around bytes moving through the system, this might be an easy opportunity to expose that information via a function like
estimate_size(<any>)
. It feels like this would be far more efficient than allocating & using memory just for a string to be counted.Thanks in advance for any pointers!
Beta Was this translation helpful? Give feedback.
All reactions