Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix envoy initial fetch time out when CDS updated #1034

Closed
wants to merge 1 commit into from

Conversation

haorenfsa
Copy link

@haorenfsa haorenfsa commented Oct 25, 2024

@haorenfsa haorenfsa changed the title Fix race when server update resourceVersion WIP: Fix race when server update resourceVersion Oct 25, 2024
@haorenfsa haorenfsa changed the title WIP: Fix race when server update resourceVersion Fix envoy initial fetch time out when CDS updated Oct 29, 2024
@haorenfsa
Copy link
Author

@arkodg Please help review, thank you in advance

@haorenfsa haorenfsa changed the title Fix envoy initial fetch time out when CDS updated WIP: Fix envoy initial fetch time out when CDS updated Oct 29, 2024
@haorenfsa haorenfsa force-pushed the bugfix branch 2 times, most recently from fda1676 to b76edf9 Compare October 29, 2024 07:02
@haorenfsa haorenfsa changed the title WIP: Fix envoy initial fetch time out when CDS updated Fix envoy initial fetch time out when CDS updated Oct 29, 2024
@lukidzi
Copy link
Contributor

lukidzi commented Oct 30, 2024

I encountered the same issue. Here is my PR, but unfortunately, it hasn't received a review #981

// otherwise, envoy won't complete initialization
if len(resp.Resources) > 0 || len(resp.RemovedResources) > 0 || (state.IsWildcard() && state.IsFirst()) {
if len(resp.Resources) > 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make more sense as a switch{} now where we can fall through cases? Having this many || seems a bit unreadable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -137,3 +142,11 @@ func (s *StreamState) SetKnownResourceNamesAsList(url string, names []string) {
}
s.knownResourceNames[url] = m
}

func (s *StreamState) HasWarmingCluster() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more generic given that envoy also supports listener warming? Maybe HasResourceWarming()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems only Cluster warming needs to be considered, see below comments.

if typ == resource.EndpointType {
watch.state.SetClusterWarming(false)
}
if typ == resource.ClusterType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the cases where we get a warming listener?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! When I wrote this patch, I only had dynamic Cluster, so only ClusterType was considered. If we're to merge this patch, I'll add the rest resources need warming.

Copy link
Author

@haorenfsa haorenfsa Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alecholmez Thank you for taking time reviewing this.
I just read the envoy xDS protocol document again https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#resource-warming, and found that only Cluster warming needs an additional EDS response. No additional RDS response is required for Listener warming. So I think we only need consider about the ClusterType in this patch, what do you think?

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lukidzi, I checked your PR. As I stated above, seems only CDS & EDS needs to be taking care of ? Anything I missed, please help point it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, if both LDS and RDS are in use, you may encounter an issue when a Listener enters the warming phase and depends on RDS data to proceed. My PR only addressed the issue for CDS/EDS, so it doesn’t currently resolve the LDS/RDS dependency during the warming phase.

@@ -54,6 +58,7 @@ func NewStreamState(wildcard bool, initialResourceVersions map[string]string) St
first: true,
knownResourceNames: map[string]map[string]struct{}{},
ordered: false, // Ordered comes from the first request since that's when we discover if they want ADS
hasWarmingCluster: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @valerian-roche suggested in my PR envoyproxy/go-control-plane#981, it would be beneficial to push only the endpoints of updated clusters. When your Envoy instance is watching over 1000 clusters, forcing a full update for all of them can be excessive if only one has changed. I could reopen my PR and finish it up if it makes sense to do it. Envoy has a new feature to load cached EDS but it works ONLY when initial_fetch_timeout is not disabled.

Copy link
Author

@haorenfsa haorenfsa Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me, only 1 extra empty EDS response is needed no matter how many Clusters were updated.

So my patch will only send 1 empty EDS response, if nothing changes.

Copy link

github-actions bot commented Dec 7, 2024

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Dec 7, 2024
Copy link

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: envoy initial fetch time out when CDS updated.
3 participants