Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: make backoff base and cap configurable #2870

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

abiliojr
Copy link

@abiliojr abiliojr commented Dec 16, 2020

Currently the backoff limits are hardwired to produce a random number between 5 and 2000 seconds (+1). These numbers are certainly fine for a broad range of applications, but sometimes different ones can produce a quicker response after a long outage, or help an under-powered server survive bigger number of clients. As a reference, Google IoT recommends a backoff cap of 32 or 64 seconds, way under the 2000 s.

These changes allow the user to customize those times.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Documentation

  • Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@abiliojr
Copy link
Author

abiliojr commented Dec 16, 2020

Demo configuration

[SERVICE]
    Backoff_Base 1
    Backoff_Cap 3

[INPUT]
    Name dummy
    Tag dummy.in

[OUTPUT]
    Name http
    Match *
    host localhost
    port 8080
    Retry_Limit False

@abiliojr
Copy link
Author

Logs

$ ./build/bin/fluent-bit -v -c demo.cfg
Fluent Bit v1.7.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/12/16 22:09:24] [ info] Configuration:
[2020/12/16 22:09:24] [ info]  flush time     | 5.000000 seconds
[2020/12/16 22:09:24] [ info]  grace          | 5 seconds
[2020/12/16 22:09:24] [ info]  daemon         | 0
[2020/12/16 22:09:24] [ info]  backoff base   | 1 seconds
[2020/12/16 22:09:24] [ info]  backoff cap    | 3 seconds
[2020/12/16 22:09:24] [ info] ___________
[2020/12/16 22:09:24] [ info]  inputs:
[2020/12/16 22:09:24] [ info]      dummy
[2020/12/16 22:09:24] [ info] ___________
[2020/12/16 22:09:24] [ info]  filters:
[2020/12/16 22:09:24] [ info] ___________
[2020/12/16 22:09:24] [ info]  outputs:
[2020/12/16 22:09:24] [ info]      http.0
[2020/12/16 22:09:24] [ info] ___________
[2020/12/16 22:09:24] [ info]  collectors:
[2020/12/16 22:09:24] [ info] [engine] started (pid=30675)
[2020/12/16 22:09:24] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/12/16 22:09:24] [debug] [storage] [cio stream] new stream registered: dummy.0
[2020/12/16 22:09:24] [ info] [storage] version=1.1.0, initializing...
[2020/12/16 22:09:24] [ info] [storage] in-memory
[2020/12/16 22:09:24] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/12/16 22:09:24] [debug] [http:http.0] created event channels: read=20 write=21
[2020/12/16 22:09:24] [debug] [router] match rule dummy.0:http.0
[2020/12/16 22:09:24] [ info] [sp] stream processor started
[2020/12/16 22:09:28] [debug] [task] created task=0x7ff234006460 id=0 OK
[2020/12/16 22:09:28] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:28] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:28] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:28] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:28] [debug] [retry] new retry created for task_id=0 attempts=1
[2020/12/16 22:09:28] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 1 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:29] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:29] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:29] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:29] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:29] [debug] [retry] re-using retry for task_id=0 attempts=2
[2020/12/16 22:09:29] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 2 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:31] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:31] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:31] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:31] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:31] [debug] [retry] re-using retry for task_id=0 attempts=3
[2020/12/16 22:09:31] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 1 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:32] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:32] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:32] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:32] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:32] [debug] [retry] re-using retry for task_id=0 attempts=4
[2020/12/16 22:09:32] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 2 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:33] [debug] [task] created task=0x7ff234011000 id=1 OK
[2020/12/16 22:09:33] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:33] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 130
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:33] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:33] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:33] [debug] [retry] new retry created for task_id=1 attempts=1
[2020/12/16 22:09:33] [ warn] [engine] failed to flush chunk '30675-1608152968.661751826.flb', retry in 2 seconds: task_id=1, input=dummy.0 > output=http.0
[2020/12/16 22:09:34] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:34] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:34] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:34] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:34] [debug] [retry] re-using retry for task_id=0 attempts=5
[2020/12/16 22:09:34] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 3 seconds: task_id=0, input=dummy.0 > output=http.0
^C[2020/12/16 22:09:35] [engine] caught signal (SIGINT)
[2020/12/16 22:09:35] [debug] [task] created task=0x7ff2340106f0 id=2 OK
[2020/12/16 22:09:35] [ warn] [engine] service will stop in 5 seconds
[2020/12/16 22:09:35] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:35] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 52
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:35] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:35] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:35] [debug] [retry] new retry created for task_id=2 attempts=1
[2020/12/16 22:09:35] [ warn] [engine] failed to flush chunk '30675-1608152973.661172829.flb', retry in 2 seconds: task_id=2, input=dummy.0 > output=http.0
[2020/12/16 22:09:35] [debug] [input chunk] dummy.0 is paused, cannot append records
[2020/12/16 22:09:35] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:35] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 130
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:35] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:35] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:35] [debug] [retry] re-using retry for task_id=1 attempts=2
[2020/12/16 22:09:35] [ warn] [engine] failed to flush chunk '30675-1608152968.661751826.flb', retry in 2 seconds: task_id=1, input=dummy.0 > output=http.0
[2020/12/16 22:09:36] [debug] [input chunk] dummy.0 is paused, cannot append records
[2020/12/16 22:09:36] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:36] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 52
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:36] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:36] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:36] [debug] [retry] re-using retry for task_id=2 attempts=2
[2020/12/16 22:09:36] [ warn] [engine] failed to flush chunk '30675-1608152973.661172829.flb', retry in 2 seconds: task_id=2, input=dummy.0 > output=http.0
[2020/12/16 22:09:37] [debug] [input chunk] dummy.0 is paused, cannot append records
[2020/12/16 22:09:37] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:37] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:37] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:37] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:37] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:37] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 130
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:37] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:37] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:37] [debug] [retry] re-using retry for task_id=0 attempts=6
[2020/12/16 22:09:37] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 1 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:37] [debug] [retry] re-using retry for task_id=1 attempts=3
[2020/12/16 22:09:37] [ warn] [engine] failed to flush chunk '30675-1608152968.661751826.flb', retry in 2 seconds: task_id=1, input=dummy.0 > output=http.0
[2020/12/16 22:09:38] [debug] [input chunk] dummy.0 is paused, cannot append records
[2020/12/16 22:09:38] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:38] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 52
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:38] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:38] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:38] [debug] [http_client] not using http_proxy for header
[2020/12/16 22:09:38] [debug] [http_client] header=POST / HTTP/1.1
Host: localhost:8080
Content-Length: 104
Content-Type: application/msgpack
User-Agent: Fluent-Bit


[2020/12/16 22:09:38] [error] [src/flb_http_client.c:1133 errno=32] Broken pipe
[2020/12/16 22:09:38] [error] [output:http:http.0] could not flush records to localhost:8080 (http_do=-1)
[2020/12/16 22:09:38] [debug] [retry] re-using retry for task_id=2 attempts=3
[2020/12/16 22:09:38] [ warn] [engine] failed to flush chunk '30675-1608152973.661172829.flb', retry in 3 seconds: task_id=2, input=dummy.0 > output=http.0
[2020/12/16 22:09:38] [debug] [retry] re-using retry for task_id=0 attempts=7
[2020/12/16 22:09:38] [ warn] [engine] failed to flush chunk '30675-1608152964.661220233.flb', retry in 2 seconds: task_id=0, input=dummy.0 > output=http.0
[2020/12/16 22:09:39] [ info] [engine] service stopped
[2020/12/16 22:09:39] [debug] [task] destroy task=0x7ff234006460 (task_id=0)
[2020/12/16 22:09:39] [debug] [retry] task retry=0x7ff234011690, invalidated from the scheduler
[2020/12/16 22:09:39] [debug] [task] destroy task=0x7ff234011000 (task_id=1)
[2020/12/16 22:09:39] [debug] [retry] task retry=0x7ff23400fee0, invalidated from the scheduler
[2020/12/16 22:09:39] [debug] [task] destroy task=0x7ff2340106f0 (task_id=2)
[2020/12/16 22:09:39] [debug] [retry] task retry=0x7ff234011660, invalidated from the scheduler

@abiliojr
Copy link
Author

Valgrind report:

==32246== 
==32246== HEAP SUMMARY:
==32246==     in use at exit: 0 bytes in 0 blocks
==32246==   total heap usage: 607 allocs, 607 frees, 1,141,385 bytes allocated
==32246== 
==32246== All heap blocks were freed -- no leaks are possible
==32246== 
==32246== For counts of detected and suppressed errors, rerun with: -v
==32246== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@abiliojr abiliojr force-pushed the configurable_backoff branch from 629f331 to 66c77a4 Compare December 16, 2020 21:47
@abiliojr
Copy link
Author

Documentation pull request is at: fluent/fluent-bit-docs#435

Copy link
Member

@edsiper edsiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for opening this PR.

please re-work the PR with the changes suggested.

@@ -266,8 +266,12 @@ int flb_sched_request_create(struct flb_config *config, void *data, int tries)
timer->event.mask = MK_EVENT_EMPTY;

/* Get suggested wait_time for this request */
seconds = backoff_full_jitter(FLB_SCHED_BASE, FLB_SCHED_CAP, tries);
seconds += 1;
if (config->backoff_base <= 0 || config->backoff_cap <= 0 || config->backoff_base > config->backoff_cap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this config check should not be done at runtime, it should be done at start time since the values never changes after the service has started.

@@ -56,6 +56,14 @@ struct flb_service_config service_configs[] = {
FLB_CONF_TYPE_INT,
offsetof(struct flb_config, grace)},

{FLB_CONF_STR_BACKOFF_BASE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config map first value is wrong, it aims to define the "value type", that value will lead to an undefined behavior.

@github-actions
Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants