Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Livy 899] The state of interactive session is always idle when using thrift protocol. #377

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jianzhenwu
Copy link
Contributor

What changes were proposed in this pull request?

  1. Should broadcast ReplState to the RSCClient in RSCDriver or the session state is always idle when using thrift protocol.
  2. Add unit test testSessionState in ThriftSessionTest to verify that the state change is as expected.

https://issues.apache.org/jira/browse/LIVY-899

How was this patch tested?

This patch is tested by unit tests. The the state change is recorded in replStateChangedHistoryInTest when testing.

@codecov-commenter
Copy link

codecov-commenter commented Dec 18, 2022

Codecov Report

Merging #377 (d062281) into master (314f2de) will decrease coverage by 36.98%.
The diff coverage is 0.00%.

❗ Current head d062281 differs from pull request most recent head 322af8e. Consider uploading reports for the commit 322af8e to get more accurate results

@@              Coverage Diff              @@
##             master     #377       +/-   ##
=============================================
- Coverage     65.58%   28.61%   -36.98%     
+ Complexity      952      379      -573     
=============================================
  Files           103      103               
  Lines          6044     6057       +13     
  Branches        911      912        +1     
=============================================
- Hits           3964     1733     -2231     
- Misses         1536     3977     +2441     
+ Partials        544      347      -197     
Impacted Files Coverage Δ
...rc/main/java/org/apache/livy/rsc/BaseProtocol.java 0.00% <ø> (-52.13%) ⬇️
...c/src/main/java/org/apache/livy/rsc/RSCClient.java 0.00% <0.00%> (-72.79%) ⬇️
...ava/org/apache/livy/rsc/driver/JobContextImpl.java 0.00% <0.00%> (-86.67%) ⬇️
...in/java/org/apache/livy/rsc/driver/JobWrapper.java 0.00% <0.00%> (-88.58%) ⬇️
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 0.00% <0.00%> (-80.92%) ⬇️

... and 82 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the change, as this does not involve only the thriftserver module, but all interactive sessions. If the problem involves all interactive sessions, then it is ok, otherwise the fix is probably not the correct one. Can you check if the same issue affects all interactive jobs? If so, I think we can go ahead with this PR, but I would recommend a different reviewer more familiar with the code in rsc, otherwise, the fix needs to be done only on the thriftserver part and we need more investigation to understand why normal interactive sessions do not have the issue while thriftserver has. Thanks.

import java.util.List;
import java.util.Properties;
import java.util.concurrent.TimeUnit;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid importing everything, just add what we need

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -192,6 +192,27 @@ public void testThriftSession() throws Exception {
waitFor(new UnregisterSessionJob(s3));
}

@Test
public void testSessionState() throws Exception {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove unneeded blank line

@@ -214,7 +235,7 @@ private SqlJob newSqlJob(String session, String stId, String statement) {
*/
private Exception expectError(Job<?> job, String expected) throws TimeoutException {
try {
livy.submit(job).get(10, TimeUnit.SECONDS);
livy.submit(job).get(10, SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are unneeded changes, please avoid them and use TimeUnit.SECONDS in the added code as well

@@ -192,6 +192,27 @@ public void testThriftSession() throws Exception {
waitFor(new UnregisterSessionJob(s3));
}

@Test
public void testSessionState() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we testing the RSCDriver in thriftserver? All changes are outside the thriftserver, so the UTs should be in rsc as well. This will also avoid adding dependencies to the thriftserver module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to verify that the state is correct at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the end of what? Here we are not going through the thriftserver, so this test should be moved to rsc

broadcast(new ReplState(SessionState.Busy$.MODULE$.state()));
}

public void idleIfActiveJobsEmpty(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public void idleIfActiveJobsEmpty(){ -> public void idleIfActiveJobsEmpty() {

import java.util.concurrent.TimeoutException;
import static java.util.concurrent.TimeUnit.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@jianzhenwu
Copy link
Contributor Author

I am not sure about the change, as this does not involve only the thriftserver module, but all interactive sessions. If the problem involves all interactive sessions, then it is ok, otherwise the fix is probably not the correct one. Can you check if the same issue affects all interactive jobs? If so, I think we can go ahead with this PR, but I would recommend a different reviewer more familiar with the code in rsc, otherwise, the fix needs to be done only on the thriftserver part and we need more investigation to understand why normal interactive sessions do not have the issue while thriftserver has. Thanks.

The same issue doesn't affect interactive jobs which created by REST API. The is because the handle functions are different in the driver. The REST API uses handle(ctx: ChannelHandlerContext, msg: BaseProtocol.ReplJobRequest) in the ReplDriver while thrift API uses handle(ChannelHandlerContext ctx, JobRequest<?> msg) in the RSCDRiver.
ReplDriver changes state when execute code by invoking stateChangedCallback which broadcast the state to all RPC clients.
RSCDriver does not broadcast state to RPC clients.
I think this patch would not affect other interactive jobs except using thrift API.

@mgaido91
Copy link
Contributor

This fix is causing many changes. The submit method is used for many purposes, such as adding jar, files, so I do not think is a good idea to have a fix that has such a large impact on other cases that should not be affected. The current fix would also set to busy when fetching the results, and executing schema operations, and even while unregistering a session...

I think a better place for the fix would be SqlJob, although it is not a perfect solution either. So I would be very happy if someone comes with a better alternative.

@jianzhenwu
Copy link
Contributor Author

This fix is causing many changes. The submit method is used for many purposes, such as adding jar, files, so I do not think is a good idea to have a fix that has such a large impact on other cases that should not be affected. The current fix would also set to busy when fetching the results, and executing schema operations, and even while unregistering a session...

I think a better place for the fix would be SqlJob, although it is not a perfect solution either. So I would be very happy if someone comes with a better alternative.

sorry for the late reply. I have migrated the changes to SqlJob, can you @mgaido91 review the code again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants