-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move tensor_set/scale/cast/copy to gtest #3416
Conversation
unary: 2.1-3.8x binary: 1.8-2.7x ternary: 1.3-1.7x
a4ffd88
to
4af51c7
Compare
4af51c7
to
a66714d
Compare
44aa747
to
585ee05
Compare
This PR blocks #3424 |
@@ -2135,8 +2135,10 @@ void CastTensor(const Handle& handle, | |||
MIOPEN_THROW(miopenStatusBadParm, "Tensor dimension sizes unsupported."); | |||
} | |||
|
|||
auto miopen_alpha = *(static_cast<const float*>(alpha)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible alpha to be nullptr here ? Or I missed this check ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's set everywhere it's used, but also surprising we pass it as a pointer.
It seems like it could be a float in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved it earlier because previously it was even worse - it did CopyTensor for any alpha and since I extended the coverage, I've found out the case and fixed it.
You are right, but I have no time for another review round) Probably I can make some subsequent PR related to that nullptr check (and I8 Copy case).
It seems like it could be a float in this case.
For all the types except double
it must be float and for the double
it must be double, but MIOpen barely supports doubles, almost everywhere... Anyway, it can be done later.
Btw, that's why I asked to add explicit double
check for all the isApplicable
methods here https://github.com/ROCm/MIOpen/pull/3346/files#diff-1af35db437d55c5d90c9186b7db9736706d96549c0f20ea3156113db8fdc16e4
@@ -35,7 +35,7 @@ struct conv2d_bias_driver : public conv_bias_driver<T> | |||
tensor_elem_gen_checkboard_sign{}(is...); | |||
}; | |||
|
|||
this->add(this->output, "output", this->get_tensor(get_inputs, gen_value)); | |||
this->add(this->output, "output", this->get_tensor(get_inputs<int>, gen_value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change relate to the purpose of the given PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it won't compile because it can't inference a type even if there is default one.
if(clamp) | ||
{ | ||
operate_over_subtensor( | ||
[alpha, clampVal = static_cast<float>(std::numeric_limits<DstType>::max())]( | ||
auto& dst, auto src) { | ||
dst = std::min(static_cast<float>(src) * alpha, clampVal); | ||
}, | ||
dstSuperCpu, | ||
srcSuperCpu, | ||
dstDesc, | ||
srcDesc, | ||
dstOffset, | ||
srcOffset); | ||
} | ||
else | ||
{ | ||
operate_over_subtensor( | ||
[alpha](auto& dst, auto src) { dst = static_cast<float>(src) * alpha; }, | ||
dstSuperCpu, | ||
srcSuperCpu, | ||
dstDesc, | ||
srcDesc, | ||
dstOffset, | ||
srcOffset); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we gained coverage here.
I think the old tests only checked the clamp case.
Nice!
std::string network_config = "cast " + std::to_string(dstDesc_flat.GetType()); | ||
// TODO: make proper network config | ||
std::string network_config = "cast " + std::to_string(srcDesc_flat.GetType()) + | ||
std::to_string(dstDesc_flat.GetType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this wrong before?
Looks like it used to have a caching key based off the dst type only, but uses the src type as part of the compilation params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before - it did not use srcType
for the network config and failed if you do call int
-> float
and then float
-> int
conversion (nobody tested mixed calls ever and we had that problem in the other algorithms).
In the both cases it called int
-> float
because it has been added first. Now it calls appropriate kernels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw - that's one of the "gtest single binary" problems - we may stuck withing the first added kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proper fix is to include all the compile-time and performance critical parameters into the network config.
In the first case it will choose a properly compiled kernel and in the second one it will choose the most performant one (if there are any). I haven't analyzed it deep and just fixed that part with has been found we the wider coverage.
X_INSTANTIATE_COPY(FP32, float); | ||
X_INSTANTIATE_COPY(FP16, float16); | ||
X_INSTANTIATE_COPY(BFP16, bfloat16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there no I8 tests for Copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I think this expands the coverage in our CI, and my comments are not blocking.
This PR moves tensor_set/scale/cast/copy to gtest framework.
It introduces fast and flexible cpu verification routines for unary/binary/ternary operations (@Vsevolod1983 please pay attention to this, it's very useful for ternary tensor_ops conversion)
Those routines are up to 4x faster for unary operations, 3xfaster for binary and almost 2x faster for ternary operations (probably more, it becomes better for bigger tensors, I checked only those sizes what are in the testsuites).
It introduces more
size_t
friendly configs for #3393 (@Vsevolod1983 please take a look)It also fixes few obvious bugs in the Cast operations.