Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local rate limit: add cross local cluster rate limit support #34276

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented May 21, 2024

Commit Message: local rate limit: add cross local cluster rate limit support
Additional Description:

Envoy provides lots of rate limit related filters/features. The global rate limit provides the most powerful feature. But it also introduce additional dependencies (rate limit server, redis, etc) and latency (additional calling to rate limit server).

The local rate limit is more stable and has better performance, and has no dependency to external server. But the local rate limit is work at single instance or connection scope. That means that if local rate limit is used, we cannot get a stable total limit for a Envoy cluster. Because the total limit of local rate limit filter is single instance limit multiply the instance number of Envoy. But the instance number may changed when the cluster scaling.

This PR add a new interesting feature. It make the local rate limit filter could aware the membership of the local cluster (the cluster contains current Envoy self). That means the local rate limit could compute it's tokens based on the membership of local cluster and achieve the target to share the total limit between multiple Envoy instances (a Envoy cluster).

See #34230 for more discussion.

Risk Level: low. (nothing will be changed if we don't enable the feature explicitly)
Testing: unit.
Docs Changes: n/a.
Release Notes: n/a.
Platform Specific Features: n/a.

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #34276 was opened by wbpcode.

see: more, trace.

Signed-off-by: wbpcode <[email protected]>
Signed-off-by: wbpcode <[email protected]>
@jmarantz
Copy link
Contributor

Can we have Tianyu take a pass and then I will send to a maintainer?

@wbpcode
Copy link
Member Author

wbpcode commented May 30, 2024

Seems @markdroth is busy recently, re-assign this to @adisuissa for API review.

@wbpcode wbpcode assigned adisuissa and unassigned markdroth May 30, 2024
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, thanks!
Left high-level comments/questions.

@wbpcode
Copy link
Member Author

wbpcode commented Jun 3, 2024

/retest

@alyssawilk
Copy link
Contributor

@adisuissa looks like this is ready for review - the main merge is just for review notes

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I've left a few high-level comments.

One other idea: will it be possible to redesign this so that each instance owns a reference to the endpointStats() object or the membership_total object of the static local cluster, and when tokensPerFill is called, it will fetch the current number of endpoints?
I think this will still need to by guarded (mutex), if the membership_total is not atomic. However, it will remove the additional share-monitor, and the member-update callback (may simplify the solution).

@wbpcode
Copy link
Member Author

wbpcode commented Jun 12, 2024

One other idea: will it be possible to redesign this so that each instance owns a reference to the endpointStats() object or the membership_total object of the static local cluster, and when tokensPerFill is called, it will fetch the current number of endpoints?

It's my initial implementation when doing the POC. It's actually simpler but means it's hard to use different algorithms to calculate the ratio. For example, we may want take the weight into the account in the future. Current design provides a well defined interface and abstraction for more complex share calculating.

Signed-off-by: wbpcode <[email protected]>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall LGTM.
Left some minor comments.

@@ -84,6 +124,9 @@ class LocalRateLimiterImpl {
TokenState tokens_;
absl::flat_hash_set<LocalDescriptorImpl, LocalDescriptorHash, LocalDescriptorEqual> descriptors_;
std::vector<LocalDescriptorImpl> sorted_descriptors_;

ShareProviderSharedPtr share_provider_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC does this really need to be a shared-ptr and not just a reference. AFAIK the singleton should outlive the filter, but I may be mistaken.

Copy link
Member Author

@wbpcode wbpcode Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the singleton has longer lifetime than the filter. But note the getShareProvider() not guarantee that the
singleton will keep a copy of returned shared pointer.

For example, if we support new algorithms in the future, then the singleton will only keep a map from the proto to the weak pointer of share provider.

So, I think a shared pointer here would be better.

api/envoy/extensions/common/ratelimit/v3/ratelimit.proto Outdated Show resolved Hide resolved
wbpcode and others added 2 commits June 14, 2024 09:41
Signed-off-by: wbpcode <[email protected]>
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@@ -315,5 +431,44 @@ TEST_P(LocalRateLimitFilterIntegrationTest, BasicTestPerRouteAndRds) {
cleanUpXdsConnection();
}

#ifdef NDEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late observation, but IMHO this is a red-flag when it comes to tests (especially integration tests). Which assertion in the singleton manager is triggered?

Specifically I would like to see a validation of the MAIN_THREAD assertions that were added being exercised in an integration test.

Copy link
Member Author

@wbpcode wbpcode Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get method of singleton manager. The singleton manager could only be accessed in the thread where it's created. (the server main thread for integration test).

But these tests will access singleton manger in the test thread which is not allowed and will trigger the assertion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this macro because I created another PR to resolve this problem. See #34766

@wbpcode
Copy link
Member Author

wbpcode commented Jun 14, 2024

/retest

Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
/lgtm api

@tyxia can you PTAL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants