-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50783] Canonicalize JVM profiler results file name and layout on DFS #49440
Conversation
private val appId = try { | ||
conf.getAppId | ||
} catch { | ||
case _: NoSuchElementException => "local-" + System.currentTimeMillis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curiosity, is this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember getting this error when developing this feature. The app id had not been generated when the profiler was being initialized. I don't know if we might still be getting this, but safer this way.
|
||
private val PROFILER_FOLDER_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort) | ||
private val PROFILER_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("660", 8).toShort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow event log behavior, to allow SHS process reading the file. we may do the following things in the future:
- support downloading profiling results from SHS, like we have done for event logs
- support integration with History UI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you.
It would be really nice if you take up the integration with the History UI.
@@ -2954,6 +2954,15 @@ private[spark] object Utils | |||
str.replaceAll("[ :/]", "-").replaceAll("[.${}'\"]", "_").toLowerCase(Locale.ROOT) | |||
} | |||
|
|||
def nameForAppAndAttempt(appId: String, appAttemptId: Option[String]): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be reused in several places, for example #42575 (comment)
connector/profiler/README.md
Outdated
@@ -54,7 +54,7 @@ Then enable the profiling in the configuration. | |||
<td><code>spark.executor.profiling.dfsDir</code></td> | |||
<td>(none)</td> | |||
<td> | |||
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/> | |||
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr</i> <br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, {{APP_ID}}
is nameForAppAndAttempt
(see code) for YARN cluster mode, but writing behavior details in the configuration description is a little bit verbose ...
please let me know if you have better suggestion to polish this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
@@ -72,7 +72,7 @@ Then enable the profiling in the configuration. | |||
<td>event=wall,interval=10ms,alloc=2m,lock=10ms,chunktime=300s</td> | |||
<td> | |||
Options to pass to the profiler. Detailed options are documented in the comments here: | |||
<a href="https://github.com/async-profiler/async-profiler/blob/32601bccd9e49adda9510a2ed79d142ac6ef0ff9/src/arguments.cpp#L52">Profiler arguments</a>. | |||
<a href="https://github.com/async-profiler/async-profiler/blob/v3.0/src/arguments.cpp#L44">Profiler arguments</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer using tag for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
connector/profiler/README.md
Outdated
@@ -54,7 +54,7 @@ Then enable the profiling in the configuration. | |||
<td><code>spark.executor.profiling.dfsDir</code></td> | |||
<td>(none)</td> | |||
<td> | |||
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/application_id/profile-appname-exec-executor_id.jfr</i> <br/> | |||
An HDFS compatible path to which the profiler's output files are copied. The output files will be written as <i>dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr</i> <br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
private val appId = try { | ||
conf.getAppId | ||
} catch { | ||
case _: NoSuchElementException => "local-" + System.currentTimeMillis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember getting this error when developing this feature. The app id had not been generated when the profiler was being initialized. I don't know if we might still be getting this, but safer this way.
|
||
private val PROFILER_FOLDER_PERMISSIONS = new FsPermission(Integer.parseInt("770", 8).toShort) | ||
private val PROFILER_FILE_PERMISSIONS = new FsPermission(Integer.parseInt("660", 8).toShort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you.
It would be really nice if you take up the integration with the History UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't use APP_NAME
in the new file name, can we avoid the repetition like the following?
- dfsDir/{{APP_ID}}/profile-{{APP_ID}}-exec-{{EXECUTOR_ID}}.jfr
+ dfsDir/{{APP_ID}}/profile-exec-{{EXECUTOR_ID}}.jfr
@dongjoon-hyun thanks for the suggestion, addressed in e5aa660, I re-verified and updated PR description |
val profilerDirForAppPath = new Path(profilerDirForApp) | ||
if (!fs.exists(profilerDirForAppPath)) { | ||
// SPARK-30860: use the class method to avoid the umask causing permission issues | ||
FileSystem.mkdirs(fs, profilerDirForAppPath, PROFILER_FOLDER_PERMISSIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This is a new improvement, isn't it, @pan3793 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the change grants the permission for SHS to read/delete the folder and files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @pan3793 .
The PR looks good to me. As a final piece, let's spin-off the following two files because I agree with you that it's useful utility function. It's irrelevant to JVM Profiler
stuff. So, we should do that refactor first before this PR, @pan3793 .
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
core/src/main/scala/org/apache/spark/util/Utils.scala
Please ping me on your spin-off PR. I can merge your new PR swiftly. |
…ils` ### What changes were proposed in this pull request? Pure refactor, move method `nameForAppAndAttempt` from `EventLogFileWriter` to `o.a.s.u.Utils`. ### Why are the changes needed? The method could be reused in several other places, e.g. #49440 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GHA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49476 from pan3793/SPARK-50805. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
I merged the spin-offed PR, @pan3793 . Could you rebase this to the master? |
@dongjoon-hyun thanks, rebased |
Thank you! |
Since this is a subset of previous status, I manually tested the compilation. Merged to master for Apache Spark 4.0.0. Thank you, @pan3793 and @parthchandra . |
### What changes were proposed in this pull request? Bump ap-loader version from 3.0-8 to 3.0-9. ### Why are the changes needed? ap-loader has already released v3.0-9, which should bump version from 3.0-8 for `JVMProfiler`. Backport: 1. apache/spark#46402 2. apache/spark#49440 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #3072 from SteNicholas/CELEBORN-1842. Authored-by: SteNicholas <[email protected]> Signed-off-by: SteNicholas <[email protected]>
What changes were proposed in this pull request?
This PR canonicalizes the JVM profiler added in SPARK-46094 profiling result files on DFS to
which majorly follows the event logs file name pattern and layout.
Why are the changes needed?
According to #44021 (comment), we can integrate the profiling results with Spark UI (both live and history) in the future, so it's good to follow the event logs file name pattern and layout as much as possible.
Does this PR introduce any user-facing change?
No, it's an unreleased feature.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.