Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(throttle transform): Allow throttling by bytes #14280

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jutley
Copy link

@jutley jutley commented Sep 4, 2022

Implements #11854.

This is my first time programming in rust so some things may be strange. Feedback is greatly appreciated.

I started looking into updating the .cue files for documentation. I saw there are reusable components here, such as the encoding configuration I would like to use. However, this currently is only scoped to sinks. The fact that I am using it here in a transform is totally new. I'm not sure whether this is a sign that I am doing something in a very strange way, or if I should just update the necessary pieces to support the encoding feature within transforms. If someone can help me answer this, it would be a big help for me.

@bits-bot
Copy link

bits-bot commented Sep 4, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the domain: transforms Anything related to Vector's transform components label Sep 4, 2022
@netlify
Copy link

netlify bot commented Sep 4, 2022

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 4399f35
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/63e3d80922e98d000897211f
😎 Deploy Preview https://deploy-preview-14280--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@bruceg bruceg added type: enhancement A value-adding code change that enhances its existing functionality. transform: throttle Anything `throttle` transform related labels Sep 7, 2022
@github-actions
Copy link

github-actions bot commented Sep 7, 2022

Soak Test Results

Baseline: 2586b52
Comparison: cb691eb
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_splunk_hec_logs 608.32KiB 3.77 100.00% 15.75MiB 1.03MiB 21.38KiB 0 0.0651153 16.34MiB 930.16KiB 18.96KiB 0 0.0555667 False False
datadog_agent_remap_blackhole 2.15MiB 3.51 100.00% 61.35MiB 4.05MiB 84.38KiB 0 0.0659798 63.5MiB 3.23MiB 67.36KiB 0 0.0508445 False False
datadog_agent_remap_blackhole_acks 1.83MiB 2.98 100.00% 61.19MiB 3.95MiB 82.27KiB 0 0.064524 63.02MiB 3.83MiB 79.82KiB 0 0.0607075 False False
syslog_loki 321.13KiB 2.17 100.00% 14.47MiB 613.04KiB 12.56KiB 0 0.0413662 14.78MiB 918.5KiB 18.67KiB 0 0.0606632 False False
syslog_log2metric_humio_metrics 248.02KiB 1.97 100.00% 12.3MiB 757.88KiB 15.46KiB 0 0.0601492 12.54MiB 759.87KiB 15.49KiB 0 0.059143 False False
http_pipelines_blackhole_acks 22.79KiB 1.88 100.00% 1.18MiB 131.4KiB 2.67KiB 0 0.108566 1.2MiB 103.97KiB 2.12KiB 0 0.0843155 False False
syslog_regex_logs2metric_ddmetrics 229.05KiB 1.8 100.00% 12.42MiB 728.46KiB 14.84KiB 0 0.0572878 12.64MiB 643.74KiB 13.12KiB 0 0.0497294 False False
syslog_humio_logs 186.8KiB 1.14 100.00% 16.05MiB 833.03KiB 17.01KiB 0 0.0506664 16.24MiB 767.09KiB 15.71KiB 0 0.0461313 False False
splunk_hec_route_s3 150.87KiB 0.78 98.20% 18.81MiB 2.17MiB 45.26KiB 0 0.115517 18.96MiB 2.15MiB 44.91KiB 0 0.113142 False False
datadog_agent_remap_datadog_logs_acks 395.18KiB 0.63 99.95% 61.0MiB 3.34MiB 69.66KiB 0 0.0546646 61.39MiB 4.32MiB 89.87KiB 0 0.0703187 False False
http_to_http_acks 92.46KiB 0.52 30.65% 17.32MiB 8.26MiB 172.63KiB 0 0.476605 17.41MiB 7.61MiB 158.9KiB 0 0.437065 True True
http_pipelines_no_grok_blackhole 46.35KiB 0.41 96.34% 10.93MiB 55.81KiB 1.14KiB 0 0.00498346 10.98MiB 1.06MiB 22.13KiB 0 0.0967773 False False
datadog_agent_remap_datadog_logs 78.02KiB 0.13 60.47% 59.95MiB 1.31MiB 27.45KiB 0 0.0218366 60.03MiB 4.21MiB 87.57KiB 0 0.0700408 False False
splunk_hec_to_splunk_hec_logs_noack 6.65KiB 0.03 47.83% 23.83MiB 380.68KiB 7.78KiB 0 0.015596 23.84MiB 336.72KiB 6.87KiB 0 0.0137913 False False
enterprise_http_to_http -1.06KiB -0 11.55% 23.85MiB 251.57KiB 5.14KiB 0 0.0103004 23.84MiB 254.99KiB 5.22KiB 0 0.010441 False False
splunk_hec_indexer_ack_blackhole 517.05B 0 1.47% 23.74MiB 956.97KiB 19.46KiB 0 0.0393575 23.74MiB 951.56KiB 19.35KiB 0 0.0391342 False False
splunk_hec_to_splunk_hec_logs_acks -6.48KiB -0.03 20.32% 23.75MiB 863.01KiB 17.55KiB 0 0.0354779 23.74MiB 886.87KiB 18.04KiB 0 0.0364683 False False
file_to_blackhole -53.15KiB -0.05 49.64% 95.37MiB 2.59MiB 53.59KiB 0 0.0271021 95.31MiB 2.82MiB 58.67KiB 0 0.0295701 False False
http_to_http_json -49.9KiB -0.2 99.97% 23.85MiB 328.73KiB 6.71KiB 0 0.013459 23.8MiB 585.38KiB 11.94KiB 0 0.0240158 False False
fluent_elasticsearch -209.01KiB -0.26 100.00% 79.47MiB 53.75KiB 1.09KiB 0 0.00066033 79.27MiB 1.74MiB 35.84KiB 0 0.0219808 False False
http_to_http_noack -135.69KiB -0.56 100.00% 23.84MiB 265.4KiB 5.43KiB 0 0.0108673 23.71MiB 1.24MiB 25.87KiB 0 0.052353 False False
http_text_to_http_json -317.18KiB -0.79 100.00% 39.11MiB 1.4MiB 29.31KiB 0 0.0358426 38.8MiB 1.01MiB 21.12KiB 0 0.0260347 False False
http_pipelines_blackhole -14.65KiB -0.87 100.00% 1.65MiB 10.51KiB 219.88B 0 0.00621472 1.64MiB 113.0KiB 2.3KiB 0 0.0674235 False False
syslog_log2metric_splunk_hec_metrics -257.92KiB -1.42 100.00% 17.69MiB 884.08KiB 18.01KiB 0 0.0487961 17.44MiB 1.16MiB 24.17KiB 0 0.0664929 False False
socket_to_socket_blackhole -445.29KiB -1.91 100.00% 22.82MiB 701.42KiB 14.32KiB 0 0.0300077 22.39MiB 642.29KiB 13.11KiB 0 0.0280121 False False

@github-actions
Copy link

github-actions bot commented Sep 9, 2022

Soak Test Results

Baseline: c745d25
Comparison: f32ee88
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
datadog_agent_remap_blackhole 3.04MiB 5.1 100.00% 59.6MiB 4.05MiB 84.44KiB 0 0.0679976 62.64MiB 3.2MiB 66.73KiB 0 0.0510647 False False
datadog_agent_remap_blackhole_acks 2.58MiB 4.22 100.00% 61.18MiB 4.25MiB 88.55KiB 0 0.0695017 63.76MiB 3.67MiB 76.57KiB 0 0.0575279 False False
syslog_loki 435.66KiB 2.91 100.00% 14.61MiB 279.27KiB 5.72KiB 0 0.0186607 15.04MiB 757.43KiB 15.4KiB 0 0.0491792 False False
syslog_log2metric_humio_metrics 317.2KiB 2.45 100.00% 12.66MiB 189.34KiB 3.86KiB 0 0.0146012 12.97MiB 514.69KiB 10.48KiB 0 0.038742 False False
datadog_agent_remap_datadog_logs_acks 1.18MiB 2 100.00% 58.84MiB 3.54MiB 73.82KiB 0 0.0600825 60.02MiB 4.36MiB 90.7KiB 0 0.0725841 False False
http_pipelines_no_grok_blackhole 155.56KiB 1.42 100.00% 10.72MiB 113.04KiB 2.31KiB 0 0.0102972 10.87MiB 1.06MiB 22.11KiB 0 0.0976729 False False
datadog_agent_remap_datadog_logs 872.47KiB 1.4 100.00% 60.85MiB 935.09KiB 19.15KiB 0 0.0150034 61.7MiB 4.19MiB 87.31KiB 0 0.0679362 False False
syslog_splunk_hec_logs 185.8KiB 1.11 100.00% 16.33MiB 750.24KiB 15.27KiB 0 0.0448626 16.51MiB 506.66KiB 10.35KiB 0 0.0299641 False False
http_text_to_http_json 433.98KiB 1.1 100.00% 38.45MiB 852.36KiB 17.4KiB 0 0.0216424 38.88MiB 798.22KiB 16.29KiB 0 0.0200466 False False
syslog_humio_logs 83.28KiB 0.5 100.00% 16.42MiB 424.24KiB 8.67KiB 0 0.0252312 16.5MiB 434.34KiB 8.9KiB 0 0.0257045 False False
syslog_log2metric_splunk_hec_metrics 87.73KiB 0.48 100.00% 17.91MiB 615.53KiB 12.54KiB 0 0.0335531 18.0MiB 860.02KiB 17.49KiB 0 0.0466573 False False
splunk_hec_route_s3 65.97KiB 0.34 67.42% 18.72MiB 2.29MiB 47.62KiB 0 0.1222 18.78MiB 2.26MiB 47.32KiB 0 0.120461 False False
http_to_http_acks 53.46KiB 0.3 18.37% 17.39MiB 7.92MiB 165.53KiB 0 0.455213 17.45MiB 7.67MiB 159.94KiB 0 0.439708 True True
http_pipelines_blackhole_acks 1.07KiB 0.09 23.84% 1.18MiB 133.71KiB 2.72KiB 0 0.110355 1.18MiB 109.26KiB 2.23KiB 0 0.0900974 False False
splunk_hec_indexer_ack_blackhole 17.78KiB 0.07 49.42% 23.74MiB 959.33KiB 19.51KiB 0 0.039455 23.76MiB 896.93KiB 18.25KiB 0 0.0368617 False False
splunk_hec_to_splunk_hec_logs_noack 4.12KiB 0.02 30.70% 23.83MiB 388.99KiB 7.94KiB 0 0.0159355 23.84MiB 331.58KiB 6.77KiB 0 0.0135813 False False
enterprise_http_to_http -1.72KiB -0.01 18.38% 23.85MiB 253.93KiB 5.18KiB 0 0.0103973 23.84MiB 257.76KiB 5.27KiB 0 0.0105547 False False
file_to_blackhole -42.48KiB -0.04 34.82% 95.34MiB 2.94MiB 61.01KiB 0 0.0308625 95.3MiB 3.44MiB 71.69KiB 0 0.0361286 False False
splunk_hec_to_splunk_hec_logs_acks -18.74KiB -0.08 55.95% 23.76MiB 810.07KiB 16.48KiB 0 0.0332828 23.75MiB 877.27KiB 17.84KiB 0 0.0360716 False False
http_to_http_json -54.01KiB -0.22 99.99% 23.85MiB 333.64KiB 6.81KiB 0 0.0136599 23.79MiB 605.15KiB 12.34KiB 0 0.024831 False False
fluent_elasticsearch -209.61KiB -0.26 100.00% 79.47MiB 53.82KiB 1.09KiB 0 0.000661234 79.27MiB 1.89MiB 38.99KiB 0 0.0238994 False False
http_to_http_noack -69.04KiB -0.28 99.67% 23.83MiB 522.25KiB 10.68KiB 0 0.0213981 23.76MiB 1.0MiB 20.92KiB 0 0.04219 False False
http_pipelines_blackhole -5.98KiB -0.35 99.28% 1.68MiB 36.26KiB 758.45B 0 0.0210701 1.67MiB 102.86KiB 2.1KiB 0 0.0599896 False False
syslog_regex_logs2metric_ddmetrics -133.19KiB -1.03 100.00% 12.59MiB 639.01KiB 13.02KiB 0 0.0495621 12.46MiB 727.28KiB 14.82KiB 0 0.0569971 False False
socket_to_socket_blackhole -739.41KiB -3.18 100.00% 22.7MiB 895.33KiB 18.28KiB 0 0.038516 21.97MiB 1.08MiB 22.54KiB 0 0.0490571 False False

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to my comments on the unit/mode enums below, this needs to update the documentation in website/cue/reference/components/transforms/throttle.cue, along with a big warning about the increased CPU requirements the extra encoding step will cause.

Comment on lines 61 to 62
#[serde(default = "default_unit")]
unit: ThrottleUnit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mode would describe this setting better, as it is changing between different modes of operation, with the "bytes" mode turning on encoding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I originally named it, but I decided against it because I can see "mode" becoming overloaded. There is another ticket about throttling having different modes for when capacities are hit (dropping data vs. applying backpressure). I moved to "unit" because it was a bit more precise in terms of what the threshold ends up being a measure of.

I'm totally okay with making the change to unit if you still prefer that, but I wanted to offer the counterargument.

Comment on lines 60 to 65
/// The throttling unit to use.
#[serde(default = "default_unit")]
unit: ThrottleUnit,

/// The encoding to use if throttling by bytes
encoding: Option<EncodingConfig>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be done with an enum using serde's internally_tagged mode, which would make it impossible to configure the "bytes" mode without an encoding:

enum ThrottleMode {
    Events,
    Bytes {
        encoding: EncodingConfig,
    },
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this is possible, with a caveat. If we want these fields at the top-level, then I believe we need to use the flatten serde config. However, this cannot be used in conjunction with deny_unknown_fields. Removing deny_unknown_fields seems to spoil the certainty that this change is supposed to introduce. It is very possible I am missing something though.

Alternatively, we can avoid flattening. Nothing wrong with this, but the name gets a little tricky because we may end up with structs that look like this. I can try to experiment with options, but any further suggestions are appreciated. I am learning as I am going.

mode:
  mode: events

Copy link
Member

@bruceg bruceg Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, serde enum encoding is tricky and, yes, dropping deny_unknown_fields is undesirable. I don't have a good suggestion here other than a custom deserializer, so probably leaving the encoding as a separate parameter is the most reasonable path forward. Edit: Since you have already made the change and we have other components with deny_unknown_fields, I'd say leave it here and we'll see what others think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to come across a similar scenario with the syslog source. It has a mode field which is an enum with a lot of configuration options. It uses flattening and internal tagging while accepting unknown fields, just as this PR's current state is doing. I think this is a sign that this is the preferred approach.
https://github.com/vectordotdev/vector/blob/master/src/sources/syslog.rs#L36-L37

src/transforms/throttle.rs Outdated Show resolved Hide resolved
src/transforms/throttle.rs Outdated Show resolved Hide resolved
bruceg
bruceg previously approved these changes Sep 13, 2022
@github-actions
Copy link

Soak Test Results

Baseline: 2c58589
Comparison: b65250d
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
datadog_agent_remap_blackhole_acks 2.98MiB 4.84 100.00% 61.57MiB 4.13MiB 86.04KiB 0 0.067073 64.56MiB 2.13MiB 44.52KiB 0 0.0329122 False False
datadog_agent_remap_blackhole 2.52MiB 4.14 100.00% 60.74MiB 4.28MiB 89.05KiB 0 0.0703917 63.25MiB 2.91MiB 60.8KiB 0 0.0460502 False False
syslog_loki 529.09KiB 3.77 100.00% 13.72MiB 558.21KiB 11.43KiB 0 0.0397193 14.24MiB 882.18KiB 17.93KiB 0 0.0604933 False False
http_text_to_http_json 1.35MiB 3.58 100.00% 37.62MiB 848.68KiB 17.32KiB 0 0.0220248 38.97MiB 835.33KiB 17.05KiB 0 0.0209285 False False
datadog_agent_remap_datadog_logs 2.11MiB 3.46 100.00% 60.85MiB 3.22MiB 67.49KiB 0 0.0529424 62.95MiB 4.23MiB 88.18KiB 0 0.0672508 False False
http_to_http_acks 559.44KiB 3.24 98.06% 16.84MiB 8.04MiB 168.12KiB 0 0.477456 17.38MiB 8.15MiB 170.15KiB 0 0.468428 True True
syslog_splunk_hec_logs 458.66KiB 2.82 100.00% 15.88MiB 891.7KiB 18.16KiB 0 0.0548111 16.33MiB 708.77KiB 14.46KiB 0 0.042372 False False
syslog_log2metric_splunk_hec_metrics 310.98KiB 1.75 100.00% 17.33MiB 722.75KiB 14.73KiB 0 0.0407113 17.64MiB 870.69KiB 17.73KiB 0 0.0482003 False False
syslog_regex_logs2metric_ddmetrics 197.01KiB 1.55 100.00% 12.45MiB 581.01KiB 11.84KiB 0 0.0455601 12.64MiB 500.1KiB 10.2KiB 0 0.038619 False False
syslog_humio_logs 254.0KiB 1.52 100.00% 16.27MiB 239.44KiB 4.89KiB 0 0.0143686 16.52MiB 153.4KiB 3.14KiB 0 0.00906698 False False
splunk_hec_route_s3 183.93KiB 0.96 99.49% 18.64MiB 2.27MiB 47.19KiB 0 0.121524 18.82MiB 2.18MiB 45.55KiB 0 0.115646 False False
http_pipelines_blackhole_acks 9.2KiB 0.74 99.97% 1.21MiB 103.33KiB 2.1KiB 0 0.0833401 1.22MiB 68.97KiB 1.41KiB 0 0.0552168 False False
http_pipelines_no_grok_blackhole 46.47KiB 0.42 95.84% 10.77MiB 116.23KiB 2.37KiB 0 0.0105348 10.82MiB 1.09MiB 22.67KiB 0 0.10065 False False
syslog_log2metric_humio_metrics 44.64KiB 0.35 98.76% 12.52MiB 565.53KiB 11.54KiB 0 0.0440999 12.56MiB 668.34KiB 13.62KiB 0 0.0519357 False False
splunk_hec_indexer_ack_blackhole 23.33KiB 0.1 64.39% 23.74MiB 925.39KiB 18.82KiB 0 0.0380603 23.76MiB 829.82KiB 16.89KiB 0 0.0340967 False False
splunk_hec_to_splunk_hec_logs_noack 10.4KiB 0.04 64.41% 23.83MiB 436.09KiB 8.9KiB 0 0.0178696 23.84MiB 337.74KiB 6.89KiB 0 0.0138336 False False
http_pipelines_blackhole -2.01B -0 0.08% 1.66MiB 22.85KiB 478.44B 0 0.0134438 1.66MiB 87.96KiB 1.79KiB 0 0.0517506 False False
splunk_hec_to_splunk_hec_logs_acks -5.2KiB -0.02 17.21% 23.76MiB 829.22KiB 16.87KiB 0 0.0340686 23.76MiB 834.22KiB 16.97KiB 0 0.0342815 False False
enterprise_http_to_http -3.71KiB -0.02 38.83% 23.85MiB 251.4KiB 5.13KiB 0 0.010293 23.84MiB 253.88KiB 5.19KiB 0 0.0103961 False False
file_to_blackhole -35.65KiB -0.04 31.40% 95.34MiB 3.18MiB 66.0KiB 0 0.0333852 95.3MiB 2.81MiB 58.5KiB 0 0.0295081 False False
http_to_http_json -41.32KiB -0.17 99.80% 23.84MiB 345.71KiB 7.06KiB 0 0.0141567 23.8MiB 556.12KiB 11.34KiB 0 0.0228111 False False
fluent_elasticsearch -220.99KiB -0.27 99.77% 79.47MiB 52.45KiB 1.06KiB 0 0.000644356 79.26MiB 3.52MiB 72.48KiB 0 0.0444091 False False
http_to_http_noack -86.2KiB -0.35 99.94% 23.83MiB 515.08KiB 10.52KiB 0 0.0211056 23.74MiB 1.1MiB 22.88KiB 0 0.0462045 False False
socket_to_socket_blackhole -609.89KiB -2.54 100.00% 23.42MiB 381.89KiB 7.8KiB 0 0.015919 22.83MiB 350.6KiB 7.16KiB 0 0.0149959 False False
datadog_agent_remap_datadog_logs_acks -4.59MiB -7.44 100.00% 61.72MiB 3.55MiB 74.18KiB 0 0.0575641 57.13MiB 6.92MiB 144.08KiB 0 0.121131 False False

@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Sep 15, 2022
@jutley
Copy link
Author

jutley commented Sep 16, 2022

I believe this PR is now ready for a more proper review. I have updated the cue docs and swapped out the naming from unit to mode.

@JeanMertz
Copy link
Contributor

Hi @jutley,

We discussed this contribution during our team check-in. We are excited for this transform to receive this feature. One concern we have with this PR, is that the requirement to serialize the data to count the bytes can introduce a significant, and unexpected overhead to this transform.

While our concern is based on assumptions, those assumptions are based on our experiences dealing with– and past benchmarking of event serialization in the hot path.

One potential solution we could see is using work that is currently in-flight (and should land this week) to estimate the JSON encoded size of an event, with minimal overhead.

There are two downsides to this:

  1. As mentioned, it is an estimation (for performance reasons, this implementation does not count added escape characters, or replaced invalid UTF-8 byte sequences during serialization).
  2. It only works for estimating the JSON encoded size of an event, no other format is supported.

We were wondering if any of these two downsides would be an issue for your use-case? If not, we would probably prefer that direction for this PR, while keeping the possibility open (thus, baking this into the config API) to add exact byte size counting in the future, if enough users have a need for it, and accept the trade-off of the reduced throughput.

Additionally, we could introduce a new soak test in this PR, that takes an existing test, and adds the throttle transform in the middle, using byte-size throttling, to measure the impact on the overall throughput.

@jutley
Copy link
Author

jutley commented Sep 19, 2022

@JeanMertz These ideas sound totally reasonable. Using a true encoder has always felt simultaneously elegant and incredibly clumsy to me. I'll wait for this encoding estimation work to land and then continue from there.

I think the direction you are suggesting will work. That said, there is a bit of flexibility that the encoding provides that I was planning on exploring. In particular, I think it will be useful to be able to choose the fields to use in the encoding, or to choose the text encoding to get just the message field. I'm not sure what options the JSON encoding estimation will provide.

If this feature doesn't offer much configuration, I may look to implement different methods for getting a byte count for an event. If nothing else, I think it'll be useful (at least for me) to support the full JSON and just the message.

@netlify
Copy link

netlify bot commented Nov 6, 2022

Deploy Preview for vrl-playground ready!

Name Link
🔨 Latest commit 4399f35
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/63e3d80909541f0008eabe98
😎 Deploy Preview https://deploy-preview-14280--vrl-playground.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot removed the domain: external docs Anything related to Vector's external, public documentation label Nov 6, 2022
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Nov 6, 2022
@jutley
Copy link
Author

jutley commented Nov 6, 2022

@JeanMertz I managed to find a little time to get back to this PR. I rewrote it to use the estimated json size, and am also supporting the bytes of the message field. I haven't written a soak test, but I can figure it out if you think it is necessary.

@jutley
Copy link
Author

jutley commented Nov 17, 2022

@JeanMertz anything I can do to help move this along?

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

Regression Test Results

Run ID: 9c2dc1af-ed9f-4eff-b0cf-ef77391eb92f
Baseline: 50933e9
Comparison: f0d84f0
Total vector CPUs: 7

Explanation

A regression test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their bytes_written_per_cpu_second performance between baseline and comparison SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5% change in mean bytes_written_per_cpu_second are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in bytes_written_per_cpu_second with confidence ≥ 90.00% and absolute Δ mean >= ±5%.

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_regex_logs2metric_ddmetrics 108.83KiB/CPU-s 2.96 100.00% 3.59MiB/CPU-s 487.13KiB/CPU-s 6.29KiB/CPU-s 0.0 0.132447 3.7MiB/CPU-s 461.85KiB/CPU-s 5.96KiB/CPU-s 0.0 0.121964 True False
syslog_log2metric_humio_metrics 175.77KiB/CPU-s 2.8 100.00% 6.13MiB/CPU-s 191.95KiB/CPU-s 2.48KiB/CPU-s 0.0 0.030564 6.3MiB/CPU-s 101.11KiB/CPU-s 1.31KiB/CPU-s 0.0 0.015661 False False
socket_to_socket_blackhole 298.14KiB/CPU-s 2.25 100.00% 12.91MiB/CPU-s 426.64KiB/CPU-s 5.51KiB/CPU-s 0.0 0.032264 13.2MiB/CPU-s 354.21KiB/CPU-s 4.58KiB/CPU-s 0.0 0.026196 False False
http_text_to_http_json 366.06KiB/CPU-s 1.54 100.00% 23.18MiB/CPU-s 946.07KiB/CPU-s 12.21KiB/CPU-s 0.0 0.039855 23.54MiB/CPU-s 772.67KiB/CPU-s 9.97KiB/CPU-s 0.0 0.032056 False False
http_to_http_acks 72.62KiB/CPU-s 1.36 84.23% 5.23MiB/CPU-s 2.73MiB/CPU-s 36.07KiB/CPU-s 0.0 0.52148 5.3MiB/CPU-s 2.77MiB/CPU-s 36.62KiB/CPU-s 0.0 0.522565 True False
splunk_hec_route_s3 158.25KiB/CPU-s 1.36 100.00% 11.4MiB/CPU-s 619.44KiB/CPU-s 7.99KiB/CPU-s 0.0 0.053067 11.55MiB/CPU-s 671.22KiB/CPU-s 8.66KiB/CPU-s 0.0 0.056734 False False
otlp_grpc_to_blackhole 13.41KiB/CPU-s 1.29 100.00% 1.02MiB/CPU-s 45.41KiB/CPU-s 600.2B/CPU-s 0.0 0.043545 1.03MiB/CPU-s 49.67KiB/CPU-s 656.36B/CPU-s 0.0 0.047027 False False
syslog_humio_logs 118.22KiB/CPU-s 1.28 100.00% 9.03MiB/CPU-s 303.06KiB/CPU-s 3.91KiB/CPU-s 0.0 0.032762 9.15MiB/CPU-s 160.11KiB/CPU-s 2.07KiB/CPU-s 0.0 0.01709 False False
syslog_splunk_hec_logs 73.06KiB/CPU-s 0.78 100.00% 9.09MiB/CPU-s 116.54KiB/CPU-s 1.5KiB/CPU-s 0.0 0.012518 9.16MiB/CPU-s 229.71KiB/CPU-s 2.96KiB/CPU-s 0.0 0.02448 False False
datadog_agent_remap_datadog_logs 183.01KiB/CPU-s 0.53 100.00% 33.91MiB/CPU-s 1.35MiB/CPU-s 17.83KiB/CPU-s 0.0 0.039805 34.09MiB/CPU-s 1.22MiB/CPU-s 16.12KiB/CPU-s 0.0 0.035792 False False
syslog_log2metric_splunk_hec_metrics 45.71KiB/CPU-s 0.47 100.00% 9.41MiB/CPU-s 137.63KiB/CPU-s 1.78KiB/CPU-s 0.0 0.01428 9.46MiB/CPU-s 275.34KiB/CPU-s 3.56KiB/CPU-s 0.0 0.028434 False False
http_to_http_json 6.98KiB/CPU-s 0.05 90.39% 13.61MiB/CPU-s 239.51KiB/CPU-s 3.09KiB/CPU-s 0.0 0.017182 13.62MiB/CPU-s 219.95KiB/CPU-s 2.84KiB/CPU-s 0.0 0.015771 False False
splunk_hec_indexer_ack_blackhole 3.9KiB/CPU-s 0.03 60.87% 13.62MiB/CPU-s 262.23KiB/CPU-s 3.38KiB/CPU-s 0.0 0.018806 13.62MiB/CPU-s 235.2KiB/CPU-s 3.03KiB/CPU-s 0.0 0.016863 False False
splunk_hec_to_splunk_hec_logs_noack 3.58KiB/CPU-s 0.03 65.68% 13.62MiB/CPU-s 220.87KiB/CPU-s 2.85KiB/CPU-s 0.0 0.015833 13.63MiB/CPU-s 191.71KiB/CPU-s 2.48KiB/CPU-s 0.0 0.01374 False False
file_to_blackhole 8.6KiB/CPU-s 0.02 31.77% 54.49MiB/CPU-s 1.12MiB/CPU-s 14.85KiB/CPU-s 0.0 0.020623 54.5MiB/CPU-s 1.13MiB/CPU-s 14.87KiB/CPU-s 0.0 0.020662 False False
enterprise_http_to_http 2.02KiB/CPU-s 0.01 48.03% 13.62MiB/CPU-s 190.24KiB/CPU-s 2.46KiB/CPU-s 0.0 0.013637 13.62MiB/CPU-s 151.78KiB/CPU-s 1.96KiB/CPU-s 0.0 0.010879 False False
fluent_elasticsearch -100.71B/CPU-s -0.0 14.31% 45.41MiB/CPU-s 30.03KiB/CPU-s 392.35B/CPU-s 0.0 0.000646 45.41MiB/CPU-s 30.42KiB/CPU-s 397.54B/CPU-s 0.0 0.000654 False False
http_to_http_noack 118.85B/CPU-s 0.0 1.65% 13.61MiB/CPU-s 309.13KiB/CPU-s 3.99KiB/CPU-s 0.0 0.022173 13.61MiB/CPU-s 306.43KiB/CPU-s 3.95KiB/CPU-s 0.0 0.021979 False False
splunk_hec_to_splunk_hec_logs_acks -1.02KiB/CPU-s -0.01 12.85% 13.62MiB/CPU-s 343.33KiB/CPU-s 4.43KiB/CPU-s 0.0 0.024621 13.62MiB/CPU-s 348.41KiB/CPU-s 4.5KiB/CPU-s 0.0 0.024987 False False
datadog_agent_remap_blackhole -78.45KiB/CPU-s -0.25 99.85% 30.55MiB/CPU-s 1.12MiB/CPU-s 14.78KiB/CPU-s 0.0 0.036576 30.47MiB/CPU-s 1.5MiB/CPU-s 19.82KiB/CPU-s 0.0 0.049214 False False
datadog_agent_remap_datadog_logs_acks -127.75KiB/CPU-s -0.37 100.00% 33.58MiB/CPU-s 1.28MiB/CPU-s 16.97KiB/CPU-s 0.0 0.038229 33.46MiB/CPU-s 1019.24KiB/CPU-s 13.16KiB/CPU-s 0.0 0.029748 False False
datadog_agent_remap_blackhole_acks -428.67KiB/CPU-s -1.35 100.00% 30.95MiB/CPU-s 1.18MiB/CPU-s 15.58KiB/CPU-s 0.0 0.03807 30.53MiB/CPU-s 1.26MiB/CPU-s 16.59KiB/CPU-s 0.0 0.041103 False False
syslog_loki -127.98KiB/CPU-s -1.41 100.00% 8.87MiB/CPU-s 236.6KiB/CPU-s 3.05KiB/CPU-s 0.0 0.026059 8.74MiB/CPU-s 324.56KiB/CPU-s 4.19KiB/CPU-s 0.0 0.036258 False False
otlp_http_to_blackhole -23.69KiB/CPU-s -1.51 100.00% 1.53MiB/CPU-s 115.02KiB/CPU-s 1.48KiB/CPU-s 0.0 0.073357 1.51MiB/CPU-s 120.06KiB/CPU-s 1.55KiB/CPU-s 0.0 0.077744 False False

@jszwedko jszwedko assigned bruceg and unassigned JeanMertz Feb 7, 2023
@jszwedko jszwedko requested review from jszwedko and bruceg and removed request for JeanMertz and fuchsnj February 7, 2023 21:30
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jutley !

Thanks for this contribution! Picking this up since @JeanMertz is out on leave.

I think this is a great feature, but think we could model this a bit better by allowing specification of all three limits at once rather than a mode switch. This would let users, for example, throttle by both the number of events or their JSON size, which otherwise wouldn't be possible. I also think it makes understanding the threshold configuration field a little less confusing.

My suggested configuration UX would be something like:

[transforms.foo]
type = "throttle"
key_field = "service"
window_secs = 60
threshold.events = 1000 # default to unlimited
threshold.json_bytes = 100000 # default to unlimited
threshold.message_bytes = 1000000 # default to unlimited

Where the first threshold that was hit would start throttling the incoming messages. This would mean maintaining 3 quotas rather than just one.

What do you think of this UX? I realize you are new to Rust and so adding this functionality may be difficult but we can help guide.

@jutley jutley requested a review from a team February 8, 2023 17:12
@jutley
Copy link
Author

jutley commented Feb 8, 2023

@jszwedko That's a great idea. Seems like an obvious choice now that its been said! I think I can figure out the implementation by referencing what already exists. I'll reach out if I need help!

@jutley
Copy link
Author

jutley commented Feb 8, 2023

@jszwedko I started working on this (slowly), and I am running into a limitation with the RateLimiters. If one RateLimiter allows the message but the next RateLimiter does not allow the message, then the message should get dropped. However, in checking this, the message still counted against the first RateLimiter.

Looking at the RateLimiter docs, I'm not seeing any kind of dry-run functionality. Any suggestions?

@jszwedko
Copy link
Member

jszwedko commented Feb 10, 2023

@jszwedko I started working on this (slowly), and I am running into a limitation with the RateLimiters. If one RateLimiter allows the message but the next RateLimiter does not allow the message, then the message should get dropped. However, in checking this, the message still counted against the first RateLimiter.

Looking at the RateLimiter docs, I'm not seeing any kind of dry-run functionality. Any suggestions?

Aha, yes, you appear to be right. There isn't a way to check the quota without consuming it or to replace tokens outside of the refresh interval 🤔 Given that, I'm not seeing an easy way to maintain multiple quotas for the same input stream. I opened boinkor-net/governor#167 on the upstream crate to see if there is something we are missing.

Absent that support, I think the best we can do is something like the current implementation where you can only choose one of the characteristics to throttle on.

However! I was also thinking of a potentially more flexible interface for this though which would be to allow providing a VRL expression to express how many tokens are used by a given incoming event. The interface for this would be something like:

[transforms.foo]
type = "throttle"
key_field = "service"
window_secs = 60
threshold = 1000

tokens = "1" # the default, take one token for each event

# or

tokens = "len(.message)" # take N tokens based on the length of the message

# or

tokens = "len(encode_json(.))" # take N tokens based on the JSON-encoded length of the message

# or

tokens = """
if .status == "500" {
  0
} else {
  1
}
""" # for 500s take a different amount of tokens

The JSON one would be rather expensive since it involves encoding and I know that was the motivation for using the estimated JSON encoding in this PR. I think we could resolve that by adding a new estimated_json_size() method to VRL that returns the estimated size.

This also again has the advantage of having threshold make sense as just a "number of tokens" rather than having it represent different units depending on the mode.

@jszwedko jszwedko assigned fuchsnj and unassigned bruceg Feb 13, 2023
@jutley
Copy link
Author

jutley commented Feb 13, 2023

That approach makes sense. I suspect that if someone wants to use the "multi-mode" approach, they can chain throttle transforms together. That should allow us to guarantee something like "no more then x events and y bytes per second per key". There are probably some subtle mathematical differences between the chained approach and the multi-mode approach, but I suspect this isn't going to be particularly important for most real world scenarios. We're dropping data intentionally either way.

It'll take me a little longer to get around to exploring this approach. I think the estimated_json_size method is completely orthogonal to this throttle tokens effort, so I'll not worry about that piece.

@jszwedko
Copy link
Member

It'll take me a little longer to get around to exploring this approach. I think the estimated_json_size method is completely orthogonal to this throttle tokens effort, so I'll not worry about that piece.

Yeah, agreed, we can split that bit off into its own PR since it'll involve an addition to VRL.

@jszwedko
Copy link
Member

Looks like there was a governor feature request that I missed that discusses our exact issue: boinkor-net/governor#156

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Mar 24, 2023
@jutley
Copy link
Author

jutley commented May 9, 2023

Just leaving a note to say that I took a look at the recommendation approach, and while I do believe it should be possible, I am just not proficient enough in Rust to handle this in a timely manner. Small things take me a long time, and I just don't have the time for this.

If anyone else is interested in picking this up, feel free!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: transforms Anything related to Vector's transform components meta: awaiting author Pull requests that are awaiting their author. transform: throttle Anything `throttle` transform related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants