added support for file undeleted when tails input is paused #2254

sachinmsft · 2020-06-12T23:03:05Z

When tail input paused for any reason we destroy the timer that fires tail_fs_check() and as a reason we don't check if any file is deleted or not.
and when docker tries to delete the pod it try to delete the log file associated with pod but since fluent-bit has one handle opened for that log file and we are not firing (since tail is paused and we have destroyed the tail_fs_check() timer) tail_fs_check() to close the log file FD if docker is trying to delete the pod log file and pod stuck in terminating state.

sachinmsft · 2020-06-19T05:23:55Z

@fujimotos Can you please look into this PR.

fujimotos · 2020-07-07T05:24:49Z

@sachinmsft I was thinking on this patch last night but I'm not
convinced by this modification.

The basic problem is that this patch attempts to tune in_tail
to a very special situation where:

Fluent Bit is running on Kuberntes, and
its output plugin is not working at all.

In other cases, this modification makes little sense. Especially,
if your output plugin is working, you do not want this behaviour
at all, because:

If Fluent Bit lets a deleted file go while in_tail is pausing,
it results in a unrecoverable data loss.
In this situation, "locking the file until the output plugin
catches up" is totally legitimate behaviour, since it is better
to defer the termination than losing a chunk of data in an
unrecoverable manner.

The problem occurs (as you describe) when output plugin is broken,
and we expect it to never catch up. But in that case, we really
should resolve the problem by fixing that broken output plugin,
not by making in_tail lossy.

sachinmsft · 2020-07-14T01:41:00Z

Hi @fujimotos , Thanks for taking a look on it.
Yes, I agree with you on this. Though in my opinion we should still look to fix it some more appropriate manner.
Problem is that target service can go down in cluster with any number of reason and as a result fluent-bit will pause the input plugin. We should not indefinitely keep the handle to log files as it breaks the other scenarios like we would not be able to delete the pods whose log files handle is kept by fluent-bit in input tail paused scenario.

we have seen this happening in our testing multiple number of times when elastic search instance was unreachable/down.

added support for file undeleted when tails input is paused

84a933e

sachinmsft requested review from edsiper, fujimotos and koleini as code owners June 12, 2020 23:03

edsiper assigned fujimotos Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added support for file undeleted when tails input is paused #2254

added support for file undeleted when tails input is paused #2254

sachinmsft commented Jun 12, 2020

sachinmsft commented Jun 19, 2020

fujimotos commented Jul 7, 2020

sachinmsft commented Jul 14, 2020

added support for file undeleted when tails input is paused #2254

Are you sure you want to change the base?

added support for file undeleted when tails input is paused #2254

Conversation

sachinmsft commented Jun 12, 2020

sachinmsft commented Jun 19, 2020

fujimotos commented Jul 7, 2020

sachinmsft commented Jul 14, 2020