Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Grain directory leases #9225

Open
ReubenBond opened this issue Nov 12, 2024 · 3 comments
Open

[Proposal] Grain directory leases #9225

ReubenBond opened this issue Nov 12, 2024 · 3 comments

Comments

@ReubenBond
Copy link
Member

Fixes #2428
Fixes #5687
Fixes #8242

In #9103, we introduced a strong consistency directory, leveraging the strong guarantees which Orleans' powerful membership provides, as discussed in #1323. This proposal is for a mechanism to go the last mile and offer strong single activation guarantees by means of leases. The new grain directory is strong consistency already, but strong single activation guarantees rely on evicted silos ceasing operation when there is a potential for a grain to be activated elsewhere. Leases are the only practical way to implement this kind of guarantee (see this comment).

The proposal is to add an implicit leasing mechanism based on membership which silos and the directory will use to self-terminate/deactivate activations and to prevent registrations respectively. The proposed mechanism is this:

  1. Instead of evicting registrations from the directory when a silo is evicted, leave a tombstone entry indicating the latest possible membership version the silo was evicted in.
  2. Disallow deregistration of those tombstone entries until at least a certain time passes since the silo was evicted. This involves keeping track a list of which membership updates have been seen locally, and when. Skipped updates are ok: the directory pessimistically chooses the newer update as the start time for lease expiration.
  3. If a silo does not manage to refresh its membership within the leasing period, it self-terminates.

The valid leasing period must be calculated based on the membership refresh interval. Leases are extended whenever a new membership version is received by a silo.

@nkosi23
Copy link

nkosi23 commented Nov 12, 2024

This is probably a layman question, but would this proposal have meaningful negative implications on throughput if an expired lease has to be checked / confirmed before activating a new grain?

My understanding is that this would not have any negative impact for grains being already active since the lookup process would be mostly unaffected.

@rkargMsft
Copy link
Contributor

Is the tradeoff for this that there's a stronger guarantee that there won't be duplicate activations during the lease period (and ideally no duplicates since the old silo will terminate itself if it can't renew its lease).
But there's a longer period where an old, unreachable silo will still be seen to hold the lease so activations won't be placed elsewhere until that lease is given up?

@ReubenBond
Copy link
Member Author

would this proposal have meaningful negative implications on throughput if an expired lease has to be checked / confirmed before activating a new grain?

No, this does not impact performance. It slightly affects directory hand-off & crash recovery just because we aren't omitting activations hosted on crashed silos, but that is not meaningful.

My understanding is that this would not have any negative impact for grains being already active since the lookup process would be mostly unaffected.

That is correct. Leases are checked centrally, periodically, not at the per-grain level.

But there's a longer period where an old, unreachable silo will still be seen to hold the lease so activations won't be placed elsewhere until that lease is given up?

Yes, that's right: this feature necessarily decreases availability of some subset of grains after a crash.

Specifically: grains known to be hosted on a crashed silo (i.e, registred to other partitions), and grains which were potentially hosted on the crashed silo (i.e, grains belonging to the directory ranges owned by the crashed silo which are not known to be hosted elsewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants