Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS #49440

Closed
wants to merge 2 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Jan 10, 2025

What changes were proposed in this pull request?

This PR canonicalizes the JVM profiler added in SPARK-46094 profiling result files on DFS to

dfsDir/{{APP_ID}}/profile-exec-{{EXECUTOR_ID}}.jfr

which majorly follows the event logs file name pattern and layout.

Why are the changes needed?

According to #44021 (comment), we can integrate the profiling results with Spark UI (both live and history) in the future, so it's good to follow the event logs file name pattern and layout as much as possible.

Does this PR introduce any user-facing change?

No, it's an unreleased feature.

How was this patch tested?

$ bin/spark-submit run-example \
  --master yarn \
  --deploy-mode cluster \
  --conf spark.plugins=org.apache.spark.executor.profiler.ExecutorProfilerPlugin \
  --conf spark.executor.profiling.enabled=true \
  --conf spark.executor.profiling.dfsDir=hdfs:///spark-profiling \
  --conf spark.executor.profiling.fraction=1 \
  SparkPi 100000
hadoop@spark-dev1:~/spark$ hadoop fs -ls /spark-profiling/
Found 1 items
drwxrwx---   - hadoop supergroup          0 2025-01-13 10:29 /spark-profiling/application_1736320707252_0023_1
hadoop@spark-dev1:~/spark$ hadoop fs -ls /spark-profiling/application_1736320707252_0023_1
Found 48 items
-rw-rw----   3 hadoop supergroup    5255028 2025-01-13 10:29 /spark-profiling/application_1736320707252_0023_1/profile-exec-1.jfr
-rw-rw----   3 hadoop supergroup    3840775 2025-01-13 10:29 /spark-profiling/application_1736320707252_0023_1/profile-exec-10.jfr
-rw-rw----   3 hadoop supergroup    3889002 2025-01-13 10:29 /spark-profiling/application_1736320707252_0023_1/profile-exec-11.jfr
-rw-rw----   3 hadoop supergroup    3570697 2025-01-13 10:29 /spark-profiling/application_1736320707252_0023_1/profile-exec-12.jfr
...

Was this patch authored or co-authored using generative AI tooling?

No.

private val appId = try {
conf.getAppId
} catch {
case _: NoSuchElementException => "local-" + System.currentTimeMillis
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curiosity, is this possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember getting this error when developing this feature. The app id had not been generated when the profiler was being initialized. I don't know if we might still be getting this, but safer this way.


private val PROFILER_FOLDER_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort)
private val PROFILER_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("660", 8).toShort)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow event log behavior, to allow SHS process reading the file. we may do the following things in the future:

  • support downloading profiling results from SHS, like we have done for event logs
  • support integration with History UI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you.
It would be really nice if you take up the integration with the History UI.

@@ -2954,6 +2954,15 @@ private[spark] object Utils
str.replaceAll("[ :/]", "-").replaceAll("[.${}'\"]", "_").toLowerCase(Locale.ROOT)
}

def nameForAppAndAttempt(appId: String, appAttemptId: Option[String]): String = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be reused in several places, for example #42575 (comment)

@@ -54,7 +54,7 @@ Then enable the profiling in the configuration.
<td><code>spark.executor.profiling.dfsDir</code></td>
<td>(none)</td>
<td>
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr</i> <br/>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, {{APP_ID}} is nameForAppAndAttempt(see code) for YARN cluster mode, but writing behavior details in the configuration description is a little bit verbose ...

please let me know if you have better suggestion to polish this sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

@@ -72,7 +72,7 @@ Then enable the profiling in the configuration.
<td>event=wall,interval=10ms,alloc=2m,lock=10ms,chunktime=300s</td>
<td>
Options to pass to the profiler. Detailed options are documented in the comments here:
<a href="https://github.com/async-profiler/async-profiler/blob/32601bccd9e49adda9510a2ed79d142ac6ef0ff9/src/arguments.cpp#L52">Profiler arguments</a>.
<a href="https://github.com/async-profiler/async-profiler/blob/v3.0/src/arguments.cpp#L44">Profiler arguments</a>.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer using tag for consistency

@pan3793
Copy link
Member Author

pan3793 commented Jan 10, 2025

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -54,7 +54,7 @@ Then enable the profiling in the configuration.
<td><code>spark.executor.profiling.dfsDir</code></td>
<td>(none)</td>
<td>
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/>
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr</i> <br/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

private val appId = try {
conf.getAppId
} catch {
case _: NoSuchElementException => "local-" + System.currentTimeMillis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember getting this error when developing this feature. The app id had not been generated when the profiler was being initialized. I don't know if we might still be getting this, but safer this way.


private val PROFILER_FOLDER_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort)
private val PROFILER_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("660", 8).toShort)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you.
It would be really nice if you take up the integration with the History UI.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use APP_NAME in the new file name, can we avoid the repetition like the following?

- dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr
+ dfsDir/{{APP_ID}}/profile-exec-{{EXECUTOR_ID}}.jfr

@pan3793
Copy link
Member Author

pan3793 commented Jan 13, 2025

@dongjoon-hyun thanks for the suggestion, addressed in e5aa660, I re-verified and updated PR description

val profilerDirForAppPath = new Path(profilerDirForApp)
if (!fs.exists(profilerDirForAppPath)) {
// SPARK-30860: use the class method to avoid the umask causing permission issues
FileSystem.mkdirs(fs, profilerDirForAppPath, PROFILER_FOLDER_PERMISSIONS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This is a new improvement, isn't it, @pan3793 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the change grants the permission for SHS to read/delete the folder and files

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @pan3793 .

The PR looks good to me. As a final piece, let's spin-off the following two files because I agree with you that it's useful utility function. It's irrelevant to JVM Profiler stuff. So, we should do that refactor first before this PR, @pan3793 .

core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
core/src/main/scala/org/apache/spark/util/Utils.scala

@dongjoon-hyun
Copy link
Member

Please ping me on your spin-off PR. I can merge your new PR swiftly.

dongjoon-hyun pushed a commit that referenced this pull request Jan 14, 2025
…ils`

### What changes were proposed in this pull request?

Pure refactor, move method `nameForAppAndAttempt` from `EventLogFileWriter` to `o.a.s.u.Utils`.

### Why are the changes needed?

The method could be reused in several other places, e.g. #49440

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GHA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49476 from pan3793/SPARK-50805.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

I merged the spin-offed PR, @pan3793 . Could you rebase this to the master?

@pan3793 pan3793 changed the title [SPARK-50783][CORE] Canonicalize JVM profiler results file name and layout on DFS [SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS Jan 14, 2025
@pan3793
Copy link
Member Author

pan3793 commented Jan 14, 2025

@dongjoon-hyun thanks, rebased

@dongjoon-hyun
Copy link
Member

Thank you!

@dongjoon-hyun
Copy link
Member

Since this is a subset of previous status, I manually tested the compilation.

Merged to master for Apache Spark 4.0.0.

Thank you, @pan3793 and @parthchandra .

SteNicholas added a commit to apache/celeborn that referenced this pull request Jan 21, 2025
### What changes were proposed in this pull request?

Bump ap-loader version from 3.0-8 to 3.0-9.

### Why are the changes needed?

ap-loader has already released v3.0-9, which should bump version from 3.0-8 for `JVMProfiler`.

Backport:

1. apache/spark#46402
2. apache/spark#49440

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #3072 from SteNicholas/CELEBORN-1842.

Authored-by: SteNicholas <[email protected]>
Signed-off-by: SteNicholas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants