You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[[ these are an initial draft that was edited during a meeting ]]
Places we'd need to account for dynamic shapes
TransformAttr, TransformMapAttr
The good news is that affine maps let us have "symbols", which act
like constants for analysis purposes but are bound at runtime. We
should be able to reuse this mechanism for transform maps.
In an individual coordinate transform (the TransformAttr), a symbol
can appear in the parameter lists to transformations (like Merge,
Unmerge, or Pad). They can't appear in the dimension list.
On a technical level, this means that the params component of a TransformAttr will shift from ArrayRef<int64_t> to same flavor of ArrayRef<ConstantOrSymbolIndex>. Those'll be used when constructing
the affine map.
TransformMapAttr and its builders will also need a few extensions.
Firstly, in the computation of lower or upper bounds, we'll follow
the tensor and memref convention of using -1 (printed as ?) for
bounds of unknown size.
Because, for example, we may need to add symbols when dealing in
broadcasts, it may be a good idea to keep numSymbols as an
attribute on TransformMapAttrs and their builders, so as to ensure
uniformity within a transform stack. However, I'm not entirely sure
that this'll be required.
My PadTo{} is actually Align{}
(ex. Align{64})
Transform map validity (Pad{} and Embed{})
The current Pad{l1, r1, l2, r2, ...} operator in transform maps
comes from convolutions. Its validity check rests on being able
to pull a lower bound from the transform map to see if the
input is in the padded region. We also have similar checks to
prevent the results of Embed{}s that go negative or out of
bounds from being propagated further.
The first problem we have is that Pad{} is specified in terms of
how much padding to add to the left and right of the given dimension.
When padding a dynamic dimension out to a block size, we fundamentally
don't know those qualities. We colud compute them and make that a
symbol, but that'll get irritating, lead to a proliferation of symbols,
and generally be a pain ... but it's not off the table.
Therefore, I propose PadTo{N1, N2, ...}, where the N_k are block sizes
or other such constants. The semantics of PadTo{} is that the k'th lower
dimension is padded out to the next multiple of N_k. We can then impose
the restriction that N_k be a constant, and make a similer requirement
on the inputs to Pad{}, just to stay sane.
However, the higher-level problem of validity checking remains.
I claim that there's a reasonably straightforward way to fix this, namely
defining OpFoldResult getLengthOfDim(unsigned dim, ArrayRef<TransformMapAttr> remainingMaps, Value underlyingBuffer);
which returns the length dim can have as in input to remainingMaps
before reaching underlyingMem - if the dimension isn't static there,
this'll eventually lead to a call to tensor/memref.dim or by some
other similar construct.
The method in question will only really be needed during validity checks.
rock.transform and rock.transforming_for
There're places in our code where we could end up computing
dimension lengths from other lengths - ex
... Ok, no, this needs contemplated overall, who/where the size
expressions go is an open question.
... Darn tempted to move it all off of transforming_for and
make that thing just take a transformed Value and some initial
coordinates.
Sizes, v2
So, since Embed is a general linear combination with user-provided size
semantics, we may want to move to storing sizes in general.
My proposal here is
rock.transform #transform_map(%val)
symbols [%...] // for Merge{Symbol{0}, 3, 3}, ex.
// in_dims [%dynDim1, %dyndim2, ...] out_dims [%dynDim1]
// not needed per below
: memref<2x?xf32> to memref<2x?x?xf32>
Or (per Simon):
Embed is changed to take extra parameters
Embed{lowerLen, coeff1, size1, coeff2, size2, ...}
and those are symbols.
Now size computation works and we just need symbols.
gemmN <- Merge("no", "ho", "wo")
gemmNPadded <- PadTo{BlockSize}(gemmN) // must check validity here
// so must know size of gemmN, so must know size of ho
// but size of ho isn't funciton of size of hi
and that rock.transforming_for just takes a pointer to the transform stack.
rock.transforming for (%lower1, ...) = %transformedValue[%coord0, %coord1],
Computing launch sizes
Good news: block size computations proceed as normal, since
those are functions of the tuning parameters.
Bad news: grid size is a function of dynamic inputs!
So we don't actually get a known grid_size.
I suggest that we allow the grid size to be an affine expression
of the input dimensions and attach that to the function.
Per IREE, this can just be an @func reference to call at runtime
Simon: get docs on IREE interface in case we want to follow it.
blockwise_fill
Will need to be adjusted to assume the blockwise-ness or
otherwise to not rely on getNumElements() working in the validation.
The utility kernels in general
These kernels will, in the dynamic case, end up with a non-constant
trip count in their outer loops. This is fine.
Vectorization
Easy answer: a dynamic dimension's vector length and alignment
requirement are 1. Having non-static-ness, just like padding by 1,
makes for unvectorizability.
Index size / the buffer trick
The various "can this be 64-bit indexing" queries will have to returns
"yes" when there's a dynamic shape involved.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
[[ these are an initial draft that was edited during a meeting ]]
Places we'd need to account for dynamic shapes
TransformAttr, TransformMapAttr
The good news is that affine maps let us have "symbols", which act
like constants for analysis purposes but are bound at runtime. We
should be able to reuse this mechanism for transform maps.
In an individual coordinate transform (the TransformAttr), a symbol
can appear in the parameter lists to transformations (like Merge,
Unmerge, or Pad). They can't appear in the dimension list.
On a technical level, this means that the
params
component of aTransformAttr
will shift fromArrayRef<int64_t>
to same flavor ofArrayRef<ConstantOrSymbolIndex>
. Those'll be used when constructingthe affine map.
TransformMapAttr and its builders will also need a few extensions.
Firstly, in the computation of lower or upper bounds, we'll follow
the tensor and memref convention of using -1 (printed as ?) for
bounds of unknown size.
Because, for example, we may need to add symbols when dealing in
broadcasts, it may be a good idea to keep
numSymbols
as anattribute on TransformMapAttrs and their builders, so as to ensure
uniformity within a transform stack. However, I'm not entirely sure
that this'll be required.
My PadTo{} is actually Align{}
(ex. Align{64})
Transform map validity (Pad{} and Embed{})
The current Pad{l1, r1, l2, r2, ...} operator in transform maps
comes from convolutions. Its validity check rests on being able
to pull a lower bound from the transform map to see if the
input is in the padded region. We also have similar checks to
prevent the results of Embed{}s that go negative or out of
bounds from being propagated further.
The first problem we have is that Pad{} is specified in terms of
how much padding to add to the left and right of the given dimension.
When padding a dynamic dimension out to a block size, we fundamentally
don't know those qualities. We colud compute them and make that a
symbol, but that'll get irritating, lead to a proliferation of symbols,
and generally be a pain ... but it's not off the table.
Therefore, I propose PadTo{N1, N2, ...}, where the N_k are block sizes
or other such constants. The semantics of PadTo{} is that the k'th lower
dimension is padded out to the next multiple of N_k. We can then impose
the restriction that N_k be a constant, and make a similer requirement
on the inputs to Pad{}, just to stay sane.
However, the higher-level problem of validity checking remains.
I claim that there's a reasonably straightforward way to fix this, namely
defining
OpFoldResult getLengthOfDim(unsigned dim, ArrayRef<TransformMapAttr> remainingMaps, Value underlyingBuffer);
which returns the length
dim
can have as in input toremainingMaps
before reaching
underlyingMem
- if the dimension isn't static there,this'll eventually lead to a call to
tensor/memref.dim
or by someother similar construct.
The method in question will only really be needed during validity checks.
rock.transform
androck.transforming_for
There're places in our code where we could end up computing
dimension lengths from other lengths - ex
... Ok, no, this needs contemplated overall, who/where the size
expressions go is an open question.
... Darn tempted to move it all off of
transforming_for
andmake that thing just take a transformed
Value
and some initialcoordinates.
Sizes, v2
So, since Embed is a general linear combination with user-provided size
semantics, we may want to move to storing sizes in general.
My proposal here is
Or (per Simon):
Embed is changed to take extra parameters
Embed{lowerLen, coeff1, size1, coeff2, size2, ...}
and those are symbols.
Now size computation works and we just need symbols.
Action, me: check Broadcast
Example:
Embed{1, 1} ["y", "ho"] -> ["hi"]
len(y) = dim(%filter[y])
len(ho) = dim(%output[h])
len(hi) = dim(%input[h]) // assume no padding on input
gemmN <- Merge("no", "ho", "wo")
gemmNPadded <- PadTo{BlockSize}(gemmN) // must check validity here
// so must know size of gemmN, so must know size of ho
// but size of ho isn't funciton of size of hi
and that
rock.transforming_for
just takes a pointer to the transform stack.// New API:
ArrayRefrock::TransformOp, Value, ... untransform(Value transformed);
optional<ArrayRef, ...>staticUntransform(Value transformed);
Computing launch sizes
Good news: block size computations proceed as normal, since
those are functions of the tuning parameters.
Bad news: grid size is a function of dynamic inputs!
So we don't actually get a known grid_size.
I suggest that we allow the grid size to be an affine expression
of the input dimensions and attach that to the function.
Per IREE, this can just be an @func reference to call at runtime
Simon: get docs on IREE interface in case we want to follow it.
blockwise_fill
Will need to be adjusted to assume the blockwise-ness or
otherwise to not rely on getNumElements() working in the validation.
The utility kernels in general
These kernels will, in the dynamic case, end up with a non-constant
trip count in their outer loops. This is fine.
Vectorization
Easy answer: a dynamic dimension's vector length and alignment
requirement are 1. Having non-static-ness, just like padding by 1,
makes for unvectorizability.
Index size / the buffer trick
The various "can this be 64-bit indexing" queries will have to returns
"yes" when there's a dynamic shape involved.
Beta Was this translation helpful? Give feedback.
All reactions