fix flower should not use logic clock to manage events #831

lovemyliwu · 2018-08-10T12:35:37Z

fixed #613

Since Celery v3.1, the events state managed based on a logic clock, see change log:

http://docs.celeryproject.org/en/3.0/whatsnew-3.1.html#events-are-now-ordered-using-logical-time

Detail logic see:

https://github.com/celery/celery/blob/master/celery/events/state.py#L609-L618

there are two variable keep events: tasks and taskheap, taskheap only contain weak reference to tasks item.

The logic clock can auto sync in a worker cluster, but not sync with flower, that is the root cause.

Lets assume the State is:

limit=2 
tasks=[A,B] 
taskheap=[X.receive(dead), X.success(dead), A.create, A.receive, A.success, B.create, B.receive, B.success]

If a worker cluster lunched without --state-db option and restarted, the logic clock will reset to 0.

When the flower received new event(C.create) with smaller clock from Worker, at the same time, task number already reach the limit, State class will remove the first event and re-insert the latest smaller clock event to the first place, the State now is:

limit=2 
tasks=[B,C] <-- A removed from memory, so the weak reference is dead
taskheap=[C.create, X.success(dead), A.create(dead), A.Receive(dead), A.success(dead), B.create, B.Receive, B.success]

then another new event, again and again, the State will like below:

limit=2 
tasks=[C,D] 
taskheap=[D.create, X.success(dead), A.create(dead), A.Receive(dead), A.success(dead), B.create(dead), B.Receive(dead), B.success(dead)]

Now, you can see the heap only contain one task alive, others are already dead, util logic clock forward greater than Worker restart moment value, the flower task list view can only read the latest task item.

As I said, the root cause is flower can not auto sync logic clock with worker cluster, so I monkey patched the calculate method to make clock always =0

johnarnold · 2018-08-27T17:35:28Z

Have you considered upstreaming a change into Celery, so that the State clock is configurable, rather than patching it? Otherwise, flower could be affected by a change in celery behavior in the future.

elafontaine · 2018-09-11T18:03:23Z

@johnarnold is there any fix or action being taken for this issue? it would be nice to see this issue resolve.

@lovemyliwu nice analysis !

lovemyliwu · 2018-09-14T09:35:14Z

@johnarnold Sorry, currently there is no plan to implement patch to upstream.

@elafontaine Thank you.

mher · 2020-07-17T14:45:16Z

I'm getting RecursionError: maximum recursion depth exceeded with this change

fix flower should not use logic clock to manage events

fd77d0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix flower should not use logic clock to manage events #831

fix flower should not use logic clock to manage events #831

lovemyliwu commented Aug 10, 2018

johnarnold commented Aug 27, 2018

elafontaine commented Sep 11, 2018

lovemyliwu commented Sep 14, 2018

mher commented Jul 17, 2020

fix flower should not use logic clock to manage events #831

Are you sure you want to change the base?

fix flower should not use logic clock to manage events #831

Conversation

lovemyliwu commented Aug 10, 2018

johnarnold commented Aug 27, 2018

elafontaine commented Sep 11, 2018

lovemyliwu commented Sep 14, 2018

mher commented Jul 17, 2020