Skip to content

Shouldn't wrapLDSBufferForStore be global vectorDim agnostic ? #1158

Answered by giuseros
manupak asked this question in Q&A
Discussion options

You must be logged in to vote

So my question is why do we change thread layout (i.e. how tid maps to indices) based on the vectorization dimension ?

I know (I think) how to answer that one. It's based on coalescence. If your physical layout is (say) MxK, then K is the contiguous dimension and you want thread 0 on element (0,0) thread 1 on element (0,1). If you see here: https://github.com/ROCmSoftwarePlatform/rocMLIR/pull/996/files we were originally always distributing the thread ids on the K dimension. But if the matrix was KxM (or KxN) this would create non-coalesced access, i.e., the thread would access the matrix in a strided fashion. So I think that doing :

splitId.merge({"k_thread", dThreadName}, {4, 5}, "tid…

Replies: 3 comments 7 replies

Comment options

manupak
Jul 17, 2023
Collaborator Author

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@manupak
Comment options

manupak Jul 17, 2023
Collaborator Author

@manupak
Comment options

manupak Jul 17, 2023
Collaborator Author

@giuseros
Comment options

@manupak
Comment options

manupak Jul 17, 2023
Collaborator Author

@giuseros
Comment options

Answer selected by manupak
Comment options

You must be logged in to vote
2 replies
@manupak
Comment options

manupak Jul 17, 2023
Collaborator Author

@krzysz00
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants