Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ramping up endpoints for XdsEndpointGroup #5688

Merged
merged 12 commits into from
Jun 5, 2024

Conversation

jrhee17
Copy link
Contributor

@jrhee17 jrhee17 commented May 20, 2024

Consider reviewing #5693 before this PR.

Motivation:

This changeset attempts to support slow start mode.
While simply supporting this is trivial with the existing WeightRampingUpStrategy, ramping up needs to work irregardless of whether a 1) cluster is updated 2) or the set of endpoints is updated.

The current implementation creates a new EndpointGroup whenever one of the two updates above is triggered.

The ClusterEntry itself is reconstructed

clusterEntry = new ClusterEntry(clusterSnapshot, this);

The set of endpoints is updated

this.endpoints = ImmutableList.copyOf(endpoints);
final PrioritySet prioritySet = new PrioritySet(endpoints);
loadBalancer.prioritySetUpdated(prioritySet);

Consequentially, endpoints will always be ramped up from the beginning with the current implementation. e.g. if endpointA is added, the new EndpointGroup will trigger all endpoints to be ramped up all together. As a result endpointA won't receive less traffic compared to the other endpoints.

I propose the following changes to handle this.

1. The ClusterEntry itself is reconstructed

I propose that ClusterEntry is maintained per cluster name. If a ClusterEntry already exists for a ClusterSnapshot, then ClusterEntry#updateClusterSnapshot will be called instead of creating a new ClusterEntry. SubsetLoadBalancer also cannot keep ClusterSnapshot as a final field, so the related logic has also been refactored.

This change is actually probably more in-sync with envoy's implementation as well.
ref: https://github.com/envoyproxy/envoy/blob/2e055e40bb93e4bbd97a10b7fedee9c50c5ba8af/source/common/upstream/cluster_manager_impl.cc#L1318

2. The set of endpoints is updated

In order to handle this, I propose that we keep a pool of endpoints in a class named DelegatingEndpointGroup. The EndpointGroup will store the createdAtNanos timestamp of previous endpoints, and set these values if a new Endpoint is added. These timestamps will be used to calculate which ramping up step an Endpoint was in previously. Note that DelegatingEndpointGroup is at the outermost layer of EndpointGroups. This is done to ramp up an Endpoint if it was filtered out by a HealthCheckedEndpointGroup and then added back in.

Once a ClusterSnapshot is updated, DelegatingEndpointGroup#updateClusterSnapshot will be called. DelegatingEndpointGroup will listen for updates to endpoints, set the createdAtNanos attribute for each endpoint, and call ClusterEntry#accept. It is now possible for ClusterEntry#accept to be called from multiple threads (xDS event loop, health check event loop, etc..) so I've also added logic to reschedule.

Lastly, in order to fully support xDS's slow-start I've added EndpointWeightTransition#aggression. If a SlowStartConfig is provided, the EndpointGroup is constructed with a WeightRampingUpStrategy.

Modifications:

  • Modified ClusterManager not create a new ClusterEntry as long as the clusterName is the same. To support this change, ClusterEntry#clusterSnapshotUpdated is added.
  • Added a DelegatingEndpointGroup which keeps a reference to the EndpointGroup associated with the ClusterSnapshot. DelegatingEndpointGroup also sets createdAtNanos to endpoints.
  • Added EndpointWeightTransition#aggression which allows non-linear weight transition.
  • WeightRampingUpStrategy is not set when the SlowStartConfig parameter is set.

Result:

  • XdsEndpointGroup now supports slow start.

Copy link
Contributor

github-actions bot commented May 20, 2024

🔍 Build Scan® (commit: 30644c0)

Job name Status Build Scan®

@minwoox minwoox added this to the 1.29.0 milestone May 22, 2024
@jrhee17 jrhee17 marked this pull request as ready for review May 22, 2024 05:47
Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically looks great! Left a few comments. 😉

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic now. Thanks! 😄

timestamp = createdAtNanos(endpoint);
} else {
timestamp = createdTimestamps.getOrDefault(endpoint, defaultTimestamp);
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
};
}

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. 👍👍

void updateClusterSnapshot(ClusterSnapshot clusterSnapshot) {
final EndpointSnapshot endpointSnapshot = clusterSnapshot.endpointSnapshot();
assert endpointSnapshot != null;
endpointsPool.updateClusterSnapshot(clusterSnapshot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional) The cross-coupling between EndpointsPool and ClusterEntry could be eliminated by passing a callback to endpointsPool.updateClusterSnapshot().

Suggested change
endpointsPool.updateClusterSnapshot(clusterSnapshot);
endpointsPool.updateClusterSnapshot(clusterSnapshot, endpoints -> {
accept(clusterSnapshot, endpoints);
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done 👍

minwoox pushed a commit that referenced this pull request Jun 4, 2024
Motivation:

The motivation for this PR stems from #5688.

The current `WeightRampingUpStrategy` internally maintains the ramping up status of endpoints. However, it is possible that although an `EndpointGroup` has just been created certain endpoints must be considered already ramped up. For instance, in xDS we newly create an `EndpointGroup` every time a `ClusterSnapshot` is updated. However, even if a `ClusterSnapshot` has changed, we need to retain an endpoint's ramping up status.

To resolve this, I propose a timestamp based solution. Specifically, I believe an `Endpoint` can maintain a state `createdAtNanos` and `WeightRampingUpStrategy` can refer to this value to determine the ramp-up status. We can also allow users to specify their own timestamp via an attribute if they are maintaining their own `Endpoint` pool like done in xDS.

While implementing a timestamp based solution, I found that the previous logic is probably simpler to manage if it were refactored. Specifically, I propose that a `rampingUpInterval` is divided into `rampingUpTaskWindow`s. Each `rampingUpTaskWindow` will be assigned a scheduler. If a scheduler doesn't have any more endpoints to ramp up, the scheduler is stopped. I've also removed the endpoint deduplication related logic to make the implementation simpler.

Modifications:

- Defined an attribute `createdAtNanos` which signifies when an `Endpoint` has been created.
- Refactored `WeightRampingUpStrategy` to refer to `createdAtNanos` when computing the initial ramping up step
- Refactored `WeightRampingUpStrategy` overall to always use the created timestamp when 1) determining the initial ramping up step 2) determining which ramping up scheduler to use
  - Removed deduplication logic for simplifying logic
  - Removed weight update logic for simplifying logic.

Result:

- We can prepare to support `WeightRampingUpStrategy` for xDS.
<!--
Visit this URL to learn more about how to write a pull request description:
https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->
@minwoox
Copy link
Member

minwoox commented Jun 4, 2024

Oops, could you resolve the conflict?

@jrhee17
Copy link
Contributor Author

jrhee17 commented Jun 5, 2024

Oops, could you resolve the conflict?

Did an Accept theirs and confirmed that there is no diffset related to WeightRampingUpStrategy and WeightRampingUpStrategyTest in this PR

@jrhee17 jrhee17 merged commit 3117ad8 into line:main Jun 5, 2024
14 of 15 checks passed
Dogacel pushed a commit to Dogacel/armeria that referenced this pull request Jun 8, 2024
Motivation:

The motivation for this PR stems from line#5688.

The current `WeightRampingUpStrategy` internally maintains the ramping up status of endpoints. However, it is possible that although an `EndpointGroup` has just been created certain endpoints must be considered already ramped up. For instance, in xDS we newly create an `EndpointGroup` every time a `ClusterSnapshot` is updated. However, even if a `ClusterSnapshot` has changed, we need to retain an endpoint's ramping up status.

To resolve this, I propose a timestamp based solution. Specifically, I believe an `Endpoint` can maintain a state `createdAtNanos` and `WeightRampingUpStrategy` can refer to this value to determine the ramp-up status. We can also allow users to specify their own timestamp via an attribute if they are maintaining their own `Endpoint` pool like done in xDS.

While implementing a timestamp based solution, I found that the previous logic is probably simpler to manage if it were refactored. Specifically, I propose that a `rampingUpInterval` is divided into `rampingUpTaskWindow`s. Each `rampingUpTaskWindow` will be assigned a scheduler. If a scheduler doesn't have any more endpoints to ramp up, the scheduler is stopped. I've also removed the endpoint deduplication related logic to make the implementation simpler.

Modifications:

- Defined an attribute `createdAtNanos` which signifies when an `Endpoint` has been created.
- Refactored `WeightRampingUpStrategy` to refer to `createdAtNanos` when computing the initial ramping up step
- Refactored `WeightRampingUpStrategy` overall to always use the created timestamp when 1) determining the initial ramping up step 2) determining which ramping up scheduler to use
  - Removed deduplication logic for simplifying logic
  - Removed weight update logic for simplifying logic.

Result:

- We can prepare to support `WeightRampingUpStrategy` for xDS.
<!--
Visit this URL to learn more about how to write a pull request description:
https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->
Dogacel pushed a commit to Dogacel/armeria that referenced this pull request Jun 8, 2024
Consider reviewing line#5693 before this PR.

Motivation:

This changeset attempts to support [slow start
mode](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/slow_start).
While simply supporting this is trivial with the existing
`WeightRampingUpStrategy`, ramping up needs to work irregardless of
whether a 1) cluster is updated 2) or the set of endpoints is updated.

The current implementation creates a new `EndpointGroup` whenever one of
the two updates above is triggered.

##### The `ClusterEntry` itself is reconstructed


https://github.com/line/armeria/blob/806556e5a4d27b549c4c6c24f75d96a1447d85d1/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterManager.java#L128

##### The set of endpoints is updated


https://github.com/line/armeria/blob/806556e5a4d27b549c4c6c24f75d96a1447d85d1/xds/src/main/java/com/linecorp/armeria/xds/client/endpoint/ClusterEntry.java#L60-L62

Consequentially, endpoints will always be ramped up from the beginning
with the current implementation. e.g. if `endpointA` is added, the new
`EndpointGroup` will trigger all endpoints to be ramped up all together.
As a result `endpointA` won't receive less traffic compared to the other
endpoints.

I propose the following changes to handle this.

##### 1. The `ClusterEntry` itself is reconstructed

I propose that `ClusterEntry` is maintained per cluster name. If a
`ClusterEntry` already exists for a `ClusterSnapshot`, then
`ClusterEntry#updateClusterSnapshot` will be called instead of creating
a new `ClusterEntry`. `SubsetLoadBalancer` also cannot keep
`ClusterSnapshot` as a final field, so the related logic has also been
refactored.

This change is actually probably more in-sync with envoy's
implementation as well.
ref:
https://github.com/envoyproxy/envoy/blob/2e055e40bb93e4bbd97a10b7fedee9c50c5ba8af/source/common/upstream/cluster_manager_impl.cc#L1318

##### 2. The set of endpoints is updated

In order to handle this, I propose that we keep a pool of endpoints in a
class named `DelegatingEndpointGroup`. The `EndpointGroup` will store
the `createdAtNanos` timestamp of previous endpoints, and set these
values if a new `Endpoint` is added. These timestamps will be used to
calculate which ramping up step an `Endpoint` was in previously. Note
that `DelegatingEndpointGroup` is at the outermost layer of
`EndpointGroup`s. This is done to ramp up an `Endpoint` if it was
filtered out by a `HealthCheckedEndpointGroup` and then added back in.

Once a `ClusterSnapshot` is updated,
`DelegatingEndpointGroup#updateClusterSnapshot` will be called.
`DelegatingEndpointGroup` will listen for updates to endpoints, set the
`createdAtNanos` attribute for each endpoint, and call
`ClusterEntry#accept`. It is now possible for `ClusterEntry#accept` to
be called from multiple threads (xDS event loop, health check event
loop, etc..) so I've also added logic to reschedule.

Lastly, in order to fully support xDS's slow-start I've added
`EndpointWeightTransition#aggression`. If a `SlowStartConfig` is
provided, the `EndpointGroup` is constructed with a
`WeightRampingUpStrategy`.

Modifications:

- Modified `ClusterManager` not create a new `ClusterEntry` as long as
the `clusterName` is the same. To support this change,
`ClusterEntry#clusterSnapshotUpdated` is added.
- Added a `DelegatingEndpointGroup` which keeps a reference to the
`EndpointGroup` associated with the `ClusterSnapshot`.
`DelegatingEndpointGroup` also sets `createdAtNanos` to endpoints.
- Added `EndpointWeightTransition#aggression` which allows non-linear
weight transition.
- `WeightRampingUpStrategy` is not set when the `SlowStartConfig`
parameter is set.

Result:

- `XdsEndpointGroup` now supports slow start.

<!--
Visit this URL to learn more about how to write a pull request
description:

https://armeria.dev/community/developer-guide#how-to-write-pull-request-description
-->

---------

Co-authored-by: minux <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants