-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional time-to-live to subscription lock #213
Comments
Hiho! I am also facing problem that, the only one node in the cluster process events, which make things harder to manage and scale horizontally. Solution which you proposed would allow to spread load by subscription name, but still it isn't ideal. We won't be able to spin off an another subscriber to the existing persistent subscription (for instance when the consumer has constantly growing lag). I am just wondering, maybe we can use an another scaling pattern here? From what I inferred from codebase, the problem is that EventStore.Subscriptions.Supervisor is unaware of clustered environment. It spins off one process for each subscription on each node, but only one can really process events (just as you described). What if, we make that supervisor cluster-aware using some tool like swarm or horde? Subscription processes will be registered "globally" (well almost :) ) and they will be able to spread event handling on each node by given partition_by callback. Last week I hacked a small PoC from that idea, if you think that it can be a viable concept I can share a draft :) |
@HarenBroog This particular issue relates to scenarios where distributed Erlang is not being used. However, allowing subscribers to a single subscription to be distributed amongst multiple nodes in a distributed Erlang cluster would be a useful feature to add. I can create a separate issue for horizonal subscription scaling. |
@slashdotdash makes sense. Then we can continue discussion there. |
This lock on the subscription actually causes a problem with deploying new versions of the app. |
Postgres advisory locks are used to ensure only a single subscription process can subscribe to each uniquely named EventStore subscription. This ensures events are only processed by a single susbcription, regardless of how many nodes are running in a multi-node deployment.
One consequence of this design is that when multiple nodes are started it will likely be the first node which starts that acquires locks for all the started subscriptions This means that load won't be evenly distributed amongst the available nodes in the cluster. You could use distributed Erlang to evenly distribute subscriber processes amongst the available nodes in the cluster.
For scenarios where distributed Erlang is not used, to help balance load more evently the subscription lock could be released after a configurable interval (e.g. hourly with random jitter). This would allow a subscription process running on another to connect and resume processing. It may be necessary to broadcast a "lock released" message to connected nodes, triggering lock acquisition on another node, to reduce latency. Eventually subscription processes should be randomly distributed amongst all running nodes in the cluster.
The
pg_locks
table can be used to identify locks acquired on the EventStoresubscriptions
table from any connected node:This could be used to determine if locks are fairly distributed or not by grouping and counting by PID:
The text was updated successfully, but these errors were encountered: