Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41794][SQL] Add try_remainder function and re-enable column tests #46434

Closed

Conversation

grundprinzip
Copy link
Contributor

What changes were proposed in this pull request?

As part of re-enabling the ANSI mode tests for Spark Connect, we discovered that we don't have an equivalent for try_* for the remainder of operations. This patch adds the try_remainder function in Scala, Python, and Spark Connect and adds the required testing.

Why are the changes needed?

ANSI and Spark 4

Does this PR introduce any user-facing change?

Yes, it adds the try_remainder function that behaves according to ANSI for division by zero.

How was this patch tested?

Added new UT and E2E tests.

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

cc @gengliangwang

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41794] Add try_remainder function and re-enable column tests [SPARK-41794][SQL] Add try_remainder function and re-enable column tests May 7, 2024
@dongjoon-hyun
Copy link
Member

Could you fix Python linter failure, @grundprinzip ?

./python/pyspark/sql/tests/connect/test_connect_column.py:1029:5: F401 'os' imported but unused
    import os
    ^
1     F401 'os' imported but unused

@grundprinzip
Copy link
Contributor Author

yes, done.

@dongjoon-hyun
Copy link
Member

Thank you.

BTW, the recent test failure is relevant? For me, it looks irrelevant. Could you take a look at?

[info] *** 1 TEST FAILED ***
[error] Failed: Total 10578, Failed 1, Errors 0, Passed 10577, Ignored 29
[error] Failed tests:
[error] 	org.apache.spark.sql.execution.python.PythonStreamingDataSourceSuite
[error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 3020 s (50:20), completed May 8, 2024, 1:48:01 PM

@gengliangwang
Copy link
Member

@grundprinzip thanks for the work!
Let's also mention the new function in https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#useful-functions-for-ansi-mode

@github-actions github-actions bot added the DOCS label May 12, 2024
@grundprinzip
Copy link
Contributor Author

@gengliangwang @dongjoon-hyun I addressed all comments and added the additional test, PTAL. Thanks!

Comment on lines +124 to +127
logWarning(
"StreamingQueryListenerBus Handler thread received exception, all client" +
" side listeners are removed and handler thread is terminated.",
e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logWarning(
"StreamingQueryListenerBus Handler thread received exception, all client" +
" side listeners are removed and handler thread is terminated.",
e)
logWarning("StreamingQueryListenerBus Handler thread received exception, all client" +
" side listeners are removed and handler thread is terminated.", e)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the spark connect auto format and it produced these changes. I'm ok with reverting them in the worst case but at the same time your suggestions introduce na manual style adjustment.

LMK

@@ -451,8 +451,7 @@ object CheckConnectJvmClientCompatibility {
"org.apache.spark.sql.streaming.RemoteStreamingQuery$"),
// Skip client side listener specific class
ProblemFilters.exclude[MissingClassProblem](
"org.apache.spark.sql.streaming.StreamingQueryListenerBus"
),
"org.apache.spark.sql.streaming.StreamingQueryListenerBus"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"org.apache.spark.sql.streaming.StreamingQueryListenerBus"),
"org.apache.spark.sql.streaming.StreamingQueryListenerBus"
),

Copy link
Contributor Author

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix the wrong version typo in two places.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine but I defer to @gengliangwang who drove this.

Fixing versions.

Co-authored-by: Gengliang Wang <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
@gengliangwang
Copy link
Member

Thanks for adding the new function! Merging to master.

@gengliangwang
Copy link
Member

@grundprinzip Shall we create a new jira for this instead of using https://issues.apache.org/jira/browse/SPARK-41794?

@zhengruifeng
Copy link
Contributor

add it to python API references in #46566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants