Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flower should not use logic clock to manage events #831

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lovemyliwu
Copy link

fixed #613

Since Celery v3.1, the events state managed based on a logic clock, see change log:

http://docs.celeryproject.org/en/3.0/whatsnew-3.1.html#events-are-now-ordered-using-logical-time

Detail logic see:

https://github.com/celery/celery/blob/master/celery/events/state.py#L609-L618

there are two variable keep events: tasks and taskheap, taskheap only contain weak reference to tasks item.

The logic clock can auto sync in a worker cluster, but not sync with flower, that is the root cause.

Lets assume the State is:

limit=2 
tasks=[A,B] 
taskheap=[X.receive(dead), X.success(dead), A.create, A.receive, A.success, B.create, B.receive, B.success]

If a worker cluster lunched without --state-db option and restarted, the logic clock will reset to 0.

When the flower received new event(C.create) with smaller clock from Worker, at the same time, task number already reach the limit, State class will remove the first event and re-insert the latest smaller clock event to the first place, the State now is:

limit=2 
tasks=[B,C] <-- A removed from memory, so the weak reference is dead
taskheap=[C.create, X.success(dead), A.create(dead), A.Receive(dead), A.success(dead), B.create, B.Receive, B.success]

then another new event, again and again, the State will like below:

limit=2 
tasks=[C,D] 
taskheap=[D.create, X.success(dead), A.create(dead), A.Receive(dead), A.success(dead), B.create(dead), B.Receive(dead), B.success(dead)]

Now, you can see the heap only contain one task alive, others are already dead, util logic clock forward greater than Worker restart moment value, the flower task list view can only read the latest task item.


As I said, the root cause is flower can not auto sync logic clock with worker cluster, so I monkey patched the calculate method to make clock always =0

@johnarnold
Copy link
Contributor

Have you considered upstreaming a change into Celery, so that the State clock is configurable, rather than patching it? Otherwise, flower could be affected by a change in celery behavior in the future.

@elafontaine
Copy link

@johnarnold is there any fix or action being taken for this issue? it would be nice to see this issue resolve.

@lovemyliwu nice analysis !

@lovemyliwu
Copy link
Author

@johnarnold Sorry, currently there is no plan to implement patch to upstream.

@elafontaine Thank you.

@mher
Copy link
Owner

mher commented Jul 17, 2020

I'm getting RecursionError: maximum recursion depth exceeded with this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only one task listed on the tasks dashboard
4 participants