Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf reloads URL-based/remote config on a specified interval #8730

Closed
sjwang90 opened this issue Jan 21, 2021 · 10 comments · Fixed by #15388
Closed

Telegraf reloads URL-based/remote config on a specified interval #8730

sjwang90 opened this issue Jan 21, 2021 · 10 comments · Fixed by #15388
Assignees
Labels
area/agent area/configuration feature request Requests for new plugin and for new features to existing plugins platform/windows

Comments

@sjwang90
Copy link
Contributor

sjwang90 commented Jan 21, 2021

Feature Request

Proposal:

Reload configuration via URL

Current behavior:

The only time the Telegraf config is (re)loaded is at agent startup. Agent currently does not poll for new configuration via URL-based config.

Desired behavior:

Reload configuration based on a designated frequency for the reload and include a jitter time to ensure we avoid a thundering herd for the reading of the config from the server hosting it.

Use case:

As it stands, the agent requests its configuration at startup based on the supplied URL, but users can modify the configuration at any point afterwards. Such a configuration change requires a signaling mechanism to trigger Telegraf to reload the config -- which doesn't currently exist. A simple time-based mechanism can ensure that all active agents will refresh within some reasonable time window.

One workaround is that in Linux, you can use the Exec plugin to run a “HUP” against the agent every hour, which triggers Telegraf to reload. But there is no equivalent HUP for Windows, and so whilst killing the agent and while waiting for Windows ServiceControlManager to restart it, there is a short blackout period (as well as an error in the Windows Event Log). These are not ideal.

@sjwang90 sjwang90 added feature request Requests for new plugin and for new features to existing plugins area/configuration area/agent labels Jan 21, 2021
@sjwang90 sjwang90 changed the title Reload config based on a certain amount of time Reload URL-based config on a specified time Jan 27, 2021
@sjwang90
Copy link
Contributor Author

Previous issue on this same topic: #5502. Keeping this issue open since it addresses discussion with @schmorgs

From @danielnelson

Thanks for the feedback, we have some plans to allow event triggered updates based on this prototype I worked on some time back: https://github.com/danielnelson/tgconfig. For now though I don't want to add any new cli options since I don't want to commit to keeping support for them.

@reimda reimda changed the title Reload URL-based config on a specified time Reload URL-based config on a specified interval Feb 25, 2021
@sjwang90
Copy link
Contributor Author

Continuing the conversation from #8529 (comment) here:

As there is not guarantee that server holding the remote config supports HEAD (e.g. InfluxDB server doesn't) and also there is not guarantee that server will return info about the resource state, e.g. ETag header or Date-Modified header, easiest solution could be to store hash (MD5) of the current config and periodically (there should be parametrized time interval) download copy of the remote config, compute hash and compare. If changed, restart.

The question is, what should be default time interval for check to not bother server much? 1 minute?

Based off this comment - the implementation on the Telegraf may not be too complicated but there could issues on the remote config server side (ex: Telegraf Config stored in InfluxDB cloud).

@sjwang90
Copy link
Contributor Author

We'll look to implement this after a Config API is implemented with Telegraf since it should cover this functionality.

We want to make sure we're thinking of this feature not as users being limited on being able to reload their configuration through a SIGHUP but as changing the user experience to be able to update your configuration and the Telegraf agent detecting and implementing those changes.

@sjwang90 sjwang90 changed the title Reload URL-based config on a specified interval Telegraf reloads URL-based/remote config on a specified interval Aug 16, 2021
@DavidBoman
Copy link

+1

I'm using a scheduled task in Windows now to restart the Telegraf service but that's a quite ugly workaround. Being able to trigger a reload server side or having Telegraf polling for new config and reloading on a change would be a great solution.

@clever-trevor
Copy link

Agree with @DavidBoman

An inbuilt process to retry a config load on a predetermined schedule would help simplify config deployment in a large environment

Caveat being that if the config endpoint is down, the agent does not crash and carries on with cached config

The strategic agent api config load is good, but in an Enterprise env, adds complexity in environments such as DMZ where the endpiint may not be immediately contactable

toni-moreno pushed a commit to toni-moreno/telegraf that referenced this issue Nov 16, 2021
toni-moreno pushed a commit to toni-moreno/telegraf that referenced this issue Nov 17, 2021
@powersj
Copy link
Contributor

powersj commented Nov 18, 2021

Caveat being that if the config endpoint is down, the agent does not crash and carries on with cached config

The above and several other scenarios are essential to figuring out before landing this feature. For example, what happens if the file is un-reachable, there are errors in the configuration itself, a way for the user to get feedback on whether an update occurred, or some mechanism of security.

The next steps are to design and flesh out the above and other cases. Then we can work to add the CLI support for this feature.

@Trovalo
Copy link
Collaborator

Trovalo commented Nov 24, 2021

a way for the user to get feedback on whether an update occurred or some mechanism of security.

What about adding some more metrics to the input.internal?
This will allow users to monitor and set up whatever alert they want.

Not sure about what can be provided since that might depend on the implementation but something like
"LastConfigUpdate" → a date that will be saved as a string in influxDB (since it has no "date" datatype), not sure how handy is to work with a string, to build alert rules from Grafana/Kapacitor
"IsUpdated" → a boolean that shows if the config endpoint was reachable or not
"Message" → text field with the error itself, whatever it is (timeout, not rachable, unauthorized)

@pdrivom
Copy link

pdrivom commented Feb 17, 2024

+1

@powersj powersj self-assigned this Apr 29, 2024
@paulojmdias
Copy link
Contributor

Is there any plan to add this feature @powersj ?

@powersj
Copy link
Contributor

powersj commented May 7, 2024

@paulojmdias - started working on a spec for this: #15321 take a look and please comment.

powersj added a commit to powersj/telegraf that referenced this issue May 21, 2024
This introduces a new config-url-watch-interval option, which when set
will, at each interval, check the Last-Modified header of the file to
determine if telegraf should reload.

If the header is not available then the watcher is disabled for the
file.

fixes: influxdata#8730
powersj added a commit to powersj/telegraf that referenced this issue May 22, 2024
This introduces a new config-url-watch-interval option, which when set
will, at each interval, check the Last-Modified header of the file to
determine if telegraf should reload.

If the header is not available then the watcher is disabled for the
file.

fixes: influxdata#8730
powersj added a commit to powersj/telegraf that referenced this issue May 22, 2024
This introduces a new config-url-watch-interval option, which when set
will, at each interval, check the Last-Modified header of the file to
determine if telegraf should reload.

If the header is not available then the watcher is disabled for the
file.

fixes: influxdata#8730
pvlltvk pushed a commit to devopsext/telegraf that referenced this issue May 31, 2024
This introduces a new config-url-watch-interval option, which when set
will, at each interval, check the Last-Modified header of the file to
determine if telegraf should reload.

If the header is not available then the watcher is disabled for the
file.

fixes: influxdata#8730
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent area/configuration feature request Requests for new plugin and for new features to existing plugins platform/windows
Projects
None yet
8 participants