【Feature Discussion】Make tb_plugin support larger traces #760

lh-ycx · 2023-05-18T03:40:45Z

Hi guys, I'm recently working on profiling LLM (like GPT-3) workloads using the Pytorch profiler. When I tried to visualize the trace using tensorboard, one major problem was that the trace is too large (typically > 1GB for one step), causing no response in the browser.

I would like to work on this problem to make the tb_plugin support larger traces. Below are a few of my concerns:

I cannot find a doc for tb_plugin (about the code structure and the plugin's arch), which makes it difficult for me to get started.
Could you provide some insight into this problem? like where is the possible bottleneck (the original tensorboard? the trace analysis server? the frontend browser? or something else?)
I also noticed that "monitoring daemon for larger scale deployments" is in progress, is this solving the same problem? If yes, can I get involved?

Thanks : )

aaronenyeshi · 2023-06-23T14:25:20Z

Hi @lh-ycx , please feel free to contribute to the tb_plugin directory (the code is outdated, and under maintenance mode only).

As far as I know, the documentation can be found here: https://github.com/pytorch/kineto/blob/main/tb_plugin/README.md .
This is also a problem for Chrome traces, when using the export_chrome_trace API. One workaround is to move the traceEvents to a root JSON dict, and then open it in Perfetto. Another workaround is to turn with_stack=False, which reduces the amount of events significantly.
My intuition is that there are too many events to show in the frontend browser. One potential solution is to sample events when we're zoomed out. This idea is because, on chrome trace, you can delete a portion of the events, and it will start loading in the frontend.

lmelinda · 2023-07-27T20:33:40Z

I also wonder how much work would it be to incorporate Perfetto UI into the trace tab? For all my large trace files, they load pretty fast in Perfetto UI and does not crash like tensor board profiler tab. If not this could resolve a lot of problems.

Summary: Enhancement for Issue #760. Hey guys, I've optimized the speed of the memory view using the LTTB sampling (Largest-Triangle-Three-Buckets sampling, which is able to downsample time series–like data while retaining the overall shape and variability in the data). I've tested this using a PyTorch profiler trace of 2G, and the memory view page will not get crashed and the scaling operation is smooth and rather acceptable to me. Pull Request resolved: #776 Reviewed By: chaekit Differential Revision: D47850048 Pulled By: aaronenyeshi fbshipit-source-id: 4d32666f972c7f1b5d18817f69c3266bcb619d92

aaronenyeshi added the plugin PyTorch Profiler TensorBoard Plugin related label Jun 23, 2023

aaronenyeshi added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Jun 23, 2023

lh-ycx mentioned this issue Jun 29, 2023

optimize memory view using lttb sample #776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Feature Discussion】Make tb_plugin support larger traces #760

【Feature Discussion】Make tb_plugin support larger traces #760

lh-ycx commented May 18, 2023

aaronenyeshi commented Jun 23, 2023

lmelinda commented Jul 27, 2023

【Feature Discussion】Make tb_plugin support larger traces #760

【Feature Discussion】Make tb_plugin support larger traces #760

Comments

lh-ycx commented May 18, 2023

aaronenyeshi commented Jun 23, 2023

lmelinda commented Jul 27, 2023