Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50735][CONNECT] Failure in ExecuteResponseObserver results in infinite reattaching requests #49370

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

changgyoopark-db
Copy link
Contributor

What changes were proposed in this pull request?

ExecuteGrpcResponseSender checks whether the associated ExecuteThreadRunner is completed to return an error if the ExecuteThreadRunner has failed to record the outcome.

Why are the changes needed?

ExecuteResponseObserver.{onError, onComplete} are fallible while they are not retried; this leads to a situation where the ExecuteThreadRunner is completed without succeeding in responding to the client, and thus the client keeps retrying by reattaching the execution.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

testOnly org.apache.spark.sql.connect.service.SparkConnectServiceE2ESuite

Was this patch authored or co-authored using generative AI tooling?

No.

@changgyoopark-db changgyoopark-db force-pushed the SPARK-50735 branch 4 times, most recently from e495228 to d3391c7 Compare January 6, 2025 13:49
@changgyoopark-db
Copy link
Contributor Author

Hey, @juliuszsompolski , I hope you are doing well. Can you please review this change?
-> Short description. If ExecuteThreadRunner fails to record the completion/error to the observer (e.g., due to OOM), the client permanently tries to reattach to the.
-> The fix is to let the stream sender send an error if ExecuteThreadRunner is gone without recording anything.
-> This does not cover streaming queries (if there's any problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant