Support ramping up endpoints for `XdsEndpointGroup` #5688

jrhee17 · 2024-05-20T15:34:51Z

Consider reviewing #5693 before this PR.

Motivation:

This changeset attempts to support slow start mode.
While simply supporting this is trivial with the existing WeightRampingUpStrategy, ramping up needs to work irregardless of whether a 1) cluster is updated 2) or the set of endpoints is updated.

The current implementation creates a new EndpointGroup whenever one of the two updates above is triggered.

The `ClusterEntry` itself is reconstructed

armeria/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterManager.java

Line 128 in 806556e

clusterEntry = new ClusterEntry(clusterSnapshot, this);

The set of endpoints is updated

armeria/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterEntry.java

Lines 60 to 62 in 806556e

 this.endpoints = ImmutableList.copyOf(endpoints); 

 final PrioritySet prioritySet = new PrioritySet(endpoints); 

 loadBalancer.prioritySetUpdated(prioritySet);

Consequentially, endpoints will always be ramped up from the beginning with the current implementation. e.g. if endpointA is added, the new EndpointGroup will trigger all endpoints to be ramped up all together. As a result endpointA won't receive less traffic compared to the other endpoints.

I propose the following changes to handle this.

1. The `ClusterEntry` itself is reconstructed

I propose that ClusterEntry is maintained per cluster name. If a ClusterEntry already exists for a ClusterSnapshot, then ClusterEntry#updateClusterSnapshot will be called instead of creating a new ClusterEntry. SubsetLoadBalancer also cannot keep ClusterSnapshot as a final field, so the related logic has also been refactored.

This change is actually probably more in-sync with envoy's implementation as well.
ref: https://github.com/envoyproxy/envoy/blob/2e055e40bb93e4bbd97a10b7fedee9c50c5ba8af/source/common/upstream/cluster_manager_impl.cc#L1318

2. The set of endpoints is updated

In order to handle this, I propose that we keep a pool of endpoints in a class named DelegatingEndpointGroup. The EndpointGroup will store the createdAtNanos timestamp of previous endpoints, and set these values if a new Endpoint is added. These timestamps will be used to calculate which ramping up step an Endpoint was in previously. Note that DelegatingEndpointGroup is at the outermost layer of EndpointGroups. This is done to ramp up an Endpoint if it was filtered out by a HealthCheckedEndpointGroup and then added back in.

Once a ClusterSnapshot is updated, DelegatingEndpointGroup#updateClusterSnapshot will be called. DelegatingEndpointGroup will listen for updates to endpoints, set the createdAtNanos attribute for each endpoint, and call ClusterEntry#accept. It is now possible for ClusterEntry#accept to be called from multiple threads (xDS event loop, health check event loop, etc..) so I've also added logic to reschedule.

Lastly, in order to fully support xDS's slow-start I've added EndpointWeightTransition#aggression. If a SlowStartConfig is provided, the EndpointGroup is constructed with a WeightRampingUpStrategy.

Modifications:

Modified ClusterManager not create a new ClusterEntry as long as the clusterName is the same. To support this change, ClusterEntry#clusterSnapshotUpdated is added.
Added a DelegatingEndpointGroup which keeps a reference to the EndpointGroup associated with the ClusterSnapshot. DelegatingEndpointGroup also sets createdAtNanos to endpoints.
Added EndpointWeightTransition#aggression which allows non-linear weight transition.
WeightRampingUpStrategy is not set when the SlowStartConfig parameter is set.

Result:

XdsEndpointGroup now supports slow start.

github-actions · 2024-05-20T16:58:25Z

🔍 Build Scan® (commit: `30644c0`)

Job name	Status	Build Scan®

minwoox

Basically looks great! Left a few comments. 😉

core/src/main/java/com/linecorp/armeria/client/endpoint/EndpointWeightTransition.java

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/DelegatingEndpointGroup.java

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterEntry.java

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/DelegatingEndpointGroup.java

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/EndpointUtil.java

minwoox

Looks fantastic now. Thanks! 😄

minwoox · 2024-05-23T05:18:13Z

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/EndpointsPool.java

+ timestamp = createdAtNanos(endpoint);
+ } else {
+ timestamp = createdTimestamps.getOrDefault(endpoint, defaultTimestamp);
+ };


Suggested change

};

}

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/EndpointsPool.java

ikhoon

Very nice. 👍👍

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/EndpointUtil.java

ikhoon · 2024-05-31T12:09:30Z

xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterEntry.java

+ void updateClusterSnapshot(ClusterSnapshot clusterSnapshot) {
+ final EndpointSnapshot endpointSnapshot = clusterSnapshot.endpointSnapshot();
+ assert endpointSnapshot != null;
+ endpointsPool.updateClusterSnapshot(clusterSnapshot);


Optional) The cross-coupling between EndpointsPool and ClusterEntry could be eliminated by passing a callback to endpointsPool.updateClusterSnapshot().

Suggested change

endpointsPool.updateClusterSnapshot(clusterSnapshot);

endpointsPool.updateClusterSnapshot(clusterSnapshot, endpoints -> {

accept(clusterSnapshot, endpoints);

});

Good point, done 👍

Motivation: The motivation for this PR stems from #5688. The current `WeightRampingUpStrategy` internally maintains the ramping up status of endpoints. However, it is possible that although an `EndpointGroup` has just been created certain endpoints must be considered already ramped up. For instance, in xDS we newly create an `EndpointGroup` every time a `ClusterSnapshot` is updated. However, even if a `ClusterSnapshot` has changed, we need to retain an endpoint's ramping up status. To resolve this, I propose a timestamp based solution. Specifically, I believe an `Endpoint` can maintain a state `createdAtNanos` and `WeightRampingUpStrategy` can refer to this value to determine the ramp-up status. We can also allow users to specify their own timestamp via an attribute if they are maintaining their own `Endpoint` pool like done in xDS. While implementing a timestamp based solution, I found that the previous logic is probably simpler to manage if it were refactored. Specifically, I propose that a `rampingUpInterval` is divided into `rampingUpTaskWindow`s. Each `rampingUpTaskWindow` will be assigned a scheduler. If a scheduler doesn't have any more endpoints to ramp up, the scheduler is stopped. I've also removed the endpoint deduplication related logic to make the implementation simpler. Modifications: - Defined an attribute `createdAtNanos` which signifies when an `Endpoint` has been created. - Refactored `WeightRampingUpStrategy` to refer to `createdAtNanos` when computing the initial ramping up step - Refactored `WeightRampingUpStrategy` overall to always use the created timestamp when 1) determining the initial ramping up step 2) determining which ramping up scheduler to use - Removed deduplication logic for simplifying logic - Removed weight update logic for simplifying logic. Result: - We can prepare to support `WeightRampingUpStrategy` for xDS.

minwoox · 2024-06-04T11:13:36Z

Oops, could you resolve the conflict?

jrhee17 · 2024-06-05T02:07:16Z

Oops, could you resolve the conflict?

Did an Accept theirs and confirmed that there is no diffset related to WeightRampingUpStrategy and WeightRampingUpStrategyTest in this PR

Motivation: The motivation for this PR stems from line#5688. The current `WeightRampingUpStrategy` internally maintains the ramping up status of endpoints. However, it is possible that although an `EndpointGroup` has just been created certain endpoints must be considered already ramped up. For instance, in xDS we newly create an `EndpointGroup` every time a `ClusterSnapshot` is updated. However, even if a `ClusterSnapshot` has changed, we need to retain an endpoint's ramping up status. To resolve this, I propose a timestamp based solution. Specifically, I believe an `Endpoint` can maintain a state `createdAtNanos` and `WeightRampingUpStrategy` can refer to this value to determine the ramp-up status. We can also allow users to specify their own timestamp via an attribute if they are maintaining their own `Endpoint` pool like done in xDS. While implementing a timestamp based solution, I found that the previous logic is probably simpler to manage if it were refactored. Specifically, I propose that a `rampingUpInterval` is divided into `rampingUpTaskWindow`s. Each `rampingUpTaskWindow` will be assigned a scheduler. If a scheduler doesn't have any more endpoints to ramp up, the scheduler is stopped. I've also removed the endpoint deduplication related logic to make the implementation simpler. Modifications: - Defined an attribute `createdAtNanos` which signifies when an `Endpoint` has been created. - Refactored `WeightRampingUpStrategy` to refer to `createdAtNanos` when computing the initial ramping up step - Refactored `WeightRampingUpStrategy` overall to always use the created timestamp when 1) determining the initial ramping up step 2) determining which ramping up scheduler to use - Removed deduplication logic for simplifying logic - Removed weight update logic for simplifying logic. Result: - We can prepare to support `WeightRampingUpStrategy` for xDS.

Consider reviewing line#5693 before this PR. Motivation: This changeset attempts to support [slow start mode](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/slow_start). While simply supporting this is trivial with the existing `WeightRampingUpStrategy`, ramping up needs to work irregardless of whether a 1) cluster is updated 2) or the set of endpoints is updated. The current implementation creates a new `EndpointGroup` whenever one of the two updates above is triggered. ##### The `ClusterEntry` itself is reconstructed https://github.com/line/armeria/blob/806556e5a4d27b549c4c6c24f75d96a1447d85d1/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterManager.java#L128 ##### The set of endpoints is updated https://github.com/line/armeria/blob/806556e5a4d27b549c4c6c24f75d96a1447d85d1/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterEntry.java#L60-L62 Consequentially, endpoints will always be ramped up from the beginning with the current implementation. e.g. if `endpointA` is added, the new `EndpointGroup` will trigger all endpoints to be ramped up all together. As a result `endpointA` won't receive less traffic compared to the other endpoints. I propose the following changes to handle this. ##### 1. The `ClusterEntry` itself is reconstructed I propose that `ClusterEntry` is maintained per cluster name. If a `ClusterEntry` already exists for a `ClusterSnapshot`, then `ClusterEntry#updateClusterSnapshot` will be called instead of creating a new `ClusterEntry`. `SubsetLoadBalancer` also cannot keep `ClusterSnapshot` as a final field, so the related logic has also been refactored. This change is actually probably more in-sync with envoy's implementation as well. ref: https://github.com/envoyproxy/envoy/blob/2e055e40bb93e4bbd97a10b7fedee9c50c5ba8af/source/common/upstream/cluster_manager_impl.cc#L1318 ##### 2. The set of endpoints is updated In order to handle this, I propose that we keep a pool of endpoints in a class named `DelegatingEndpointGroup`. The `EndpointGroup` will store the `createdAtNanos` timestamp of previous endpoints, and set these values if a new `Endpoint` is added. These timestamps will be used to calculate which ramping up step an `Endpoint` was in previously. Note that `DelegatingEndpointGroup` is at the outermost layer of `EndpointGroup`s. This is done to ramp up an `Endpoint` if it was filtered out by a `HealthCheckedEndpointGroup` and then added back in. Once a `ClusterSnapshot` is updated, `DelegatingEndpointGroup#updateClusterSnapshot` will be called. `DelegatingEndpointGroup` will listen for updates to endpoints, set the `createdAtNanos` attribute for each endpoint, and call `ClusterEntry#accept`. It is now possible for `ClusterEntry#accept` to be called from multiple threads (xDS event loop, health check event loop, etc..) so I've also added logic to reschedule. Lastly, in order to fully support xDS's slow-start I've added `EndpointWeightTransition#aggression`. If a `SlowStartConfig` is provided, the `EndpointGroup` is constructed with a `WeightRampingUpStrategy`. Modifications: - Modified `ClusterManager` not create a new `ClusterEntry` as long as the `clusterName` is the same. To support this change, `ClusterEntry#clusterSnapshotUpdated` is added. - Added a `DelegatingEndpointGroup` which keeps a reference to the `EndpointGroup` associated with the `ClusterSnapshot`. `DelegatingEndpointGroup` also sets `createdAtNanos` to endpoints. - Added `EndpointWeightTransition#aggression` which allows non-linear weight transition. - `WeightRampingUpStrategy` is not set when the `SlowStartConfig` parameter is set. Result: - `XdsEndpointGroup` now supports slow start.  --------- Co-authored-by: minux <[email protected]>

jrhee17 added the new feature label May 20, 2024

minwoox added this to the 1.29.0 milestone May 22, 2024

jrhee17 force-pushed the feature/xds-ramping-up branch from 1ca79d3 to 0c83dc9 Compare May 22, 2024 02:47

jrhee17 added 3 commits May 22, 2024 11:50

minimal impl

e4a3797

rename

addd2bf

minor fix

bec0a9c

jrhee17 mentioned this pull request May 22, 2024

Refactor WeightRampingUpStrategy to be timestamp based #5693

Merged

minimal impl

90f35f6

jrhee17 force-pushed the feature/xds-ramping-up branch from 0c83dc9 to 90f35f6 Compare May 22, 2024 03:50

remove public

42a8fc0

jrhee17 marked this pull request as ready for review May 22, 2024 05:47

jrhee17 requested review from ikhoon, minwoox and trustin as code owners May 22, 2024 05:47

minwoox reviewed May 23, 2024

View reviewed changes

jrhee17 added 3 commits May 23, 2024 13:06

address comments by @minwoox

e3ce07a

endpointPool -> endpointsPool

f4031ea

just use a object -> long map

150ea58

minwoox approved these changes May 23, 2024

View reviewed changes

address comments by @minwoox

2035486

ikhoon approved these changes May 31, 2024

View reviewed changes

address comments by @ikhoon

db19f41

Merge remote-tracking branch 'origin/main' into feature/xds-ramping-up

729fb9e

Merge branch 'main' into feature/xds-ramping-up

30644c0

jrhee17 merged commit 3117ad8 into line:main Jun 5, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ramping up endpoints for `XdsEndpointGroup` #5688

Support ramping up endpoints for `XdsEndpointGroup` #5688

jrhee17 commented May 20, 2024 •

edited

github-actions bot commented May 20, 2024 •

edited

minwoox left a comment

minwoox left a comment

minwoox May 23, 2024

ikhoon left a comment

ikhoon May 31, 2024

jrhee17 Jun 4, 2024

minwoox commented Jun 4, 2024

jrhee17 commented Jun 5, 2024

	this.endpoints = ImmutableList.copyOf(endpoints);
	final PrioritySet prioritySet = new PrioritySet(endpoints);
	loadBalancer.prioritySetUpdated(prioritySet);

- endpointsPool.updateClusterSnapshot(clusterSnapshot);
+ endpointsPool.updateClusterSnapshot(clusterSnapshot, endpoints -> {
+ accept(clusterSnapshot, endpoints);
+ });

Support ramping up endpoints for XdsEndpointGroup #5688

Support ramping up endpoints for XdsEndpointGroup #5688

Conversation

jrhee17 commented May 20, 2024 • edited

The ClusterEntry itself is reconstructed

The set of endpoints is updated

1. The ClusterEntry itself is reconstructed

2. The set of endpoints is updated

github-actions bot commented May 20, 2024 • edited

🔍 Build Scan® (commit: 30644c0)

minwoox left a comment

Choose a reason for hiding this comment

minwoox left a comment

Choose a reason for hiding this comment

minwoox May 23, 2024

Choose a reason for hiding this comment

ikhoon left a comment

Choose a reason for hiding this comment

ikhoon May 31, 2024

Choose a reason for hiding this comment

jrhee17 Jun 4, 2024

Choose a reason for hiding this comment

minwoox commented Jun 4, 2024

jrhee17 commented Jun 5, 2024

Support ramping up endpoints for `XdsEndpointGroup` #5688

Support ramping up endpoints for `XdsEndpointGroup` #5688

jrhee17 commented May 20, 2024 •

edited

The `ClusterEntry` itself is reconstructed

1. The `ClusterEntry` itself is reconstructed

github-actions bot commented May 20, 2024 •

edited

🔍 Build Scan® (commit: `30644c0`)