Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Support #8572

Merged
merged 8 commits into from
May 17, 2024
Merged

Trace Support #8572

merged 8 commits into from
May 17, 2024

Conversation

tt-asaigal
Copy link
Contributor

fyi @tt-aho tracking all our trace commits here.

kmabeeTT and others added 8 commits May 17, 2024 17:48
 - Use noc_semaphore_inc(neg_val) + noc_async_atomic_barrier() as per
   Paul feedback, instead of ~unsafe direct set to ptr value
…tchQ MSB)

 - Host sets MSB of FetchQ entry on ExecBuf cmd to denote
   that prefetcher should stall and not fetch any more cmds
   since ExecBuf will read TraceBuffer data and write to CmdDataQ
   which will clobber subsequently fetched cmds. Remove previous
   "ugly hack" that was doing similiar thing.

 - On STALL_NEXT, barrier/wait for fetched cmd requesting a stall to
   return, and increase fence, before moving to STALLED state and
   early exit in fetch_q_get_cmds() when STALLED

 - PR Feedback and fix for NOT_STALLED setting
 - Add assert to make sure ExecBuf is comes with stall_flag=true, it's
   required now that ugly-hack is removed, otherwise hang.
 - Update test_prefetcher.cpp to set stall flag (FetchQ MSB) for
   ExecBuf, otherwise hang. Needed to change to make cmd_sizes uint32_t
   instead of uint16_t through the code since currently today FetchQ entry size
   dispatch_constants::prefetch_q_entry_type is uint32_t, to be able to carry
   though the MSB bit properly.
 - Hang (assert with watcher) that STALL state wasn't seen when
   handling ExecBuf

 - Don't know if this is correct fix, but it seems to work...
Update device trace cmds to take in cq_id, remove multi-device apis
Add tracing tests for metal Resnet50. TODO: Cleanup/reuse code
Disable allocations after capturing trace
Update trace apis to return/take in trace id. Make device own TraceBuffer mapping. Remove trace apis that correspond to allowing users to create Trace objects
#8383: End any active traces during device close and assert tracing is not enabled for terminate cmd
  - Add async safe ttnn and tt_lib trace APIs
  - Single and multi-chip trace tests added to ttnn
    post commit
  - Resnet50 Async Trace tests added (after porting the model
    over to async)
  - Certain multichip tests with all-gather currently disabled
    since they hang with trace
@tt-asaigal tt-asaigal merged commit b557422 into main May 17, 2024
5 checks passed
@tt-aho tt-aho deleted the asaigal/ttnn_trace_rebased branch June 3, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants