You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
transpose scheduler has its own requirement on how transformation should be propagated, hence the check on whether a reference tv is valid should be different from pointwise scheduler.
There are the two examples I was playing with.
TEST_F(PointwiseTest, TransposeSchedulerShouldAccept) {
auto fusion_ptr = std::make_unique<Fusion>();
auto fusion = fusion_ptr.get();
FusionGuard fg(fusion);
// tv0 {i0, i1}
TensorView* tv0 = makeContigTensor(2);
fusion->addInput(tv0);
// tv1 {i0, i1}
TensorView* tv1 = makeContigTensor(2);
tv1->setAllocationDomain({tv1->axis(1), tv1->axis(0)}, true);
fusion->addInput(tv1);
// tv2 {b2, i0, i1}
auto tv2 = broadcast(tv1, {true, false, false});
// tv3 {b3, i0, i1}
auto tv3 = broadcast(tv0, {true, false, false});
// tv4 {b3{1 ex 32}, i0, i1}
auto tv4 = expand(
tv2,
{IrBuilder::create<Val>(32),
tv2->axis(1)->extent(),
tv2->axis(2)->extent()});
// tv5 {b3{1 ex 32}, i0, i1}
auto tv5 = add(tv4, tv3);
// tv6 {i4{32} * i0, i1}
auto tv6 = reshape(tv5, {32, 1024, 128}, {32*1024, 128});
fusion->addOutput(tv6);
// This one should be scheduled by transpose_scheduler
FusionExecutorCache executor_cache(std::move(fusion_ptr));
auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}
TEST_F(PointwiseTest, TransposeSchedulerShouldReject) {
auto fusion_ptr = std::make_unique<Fusion>();
auto fusion = fusion_ptr.get();
FusionGuard fg(fusion);
// tv0 {i0, i1}
TensorView* tv0_0 = makeContigTensor(2);
fusion->addInput(tv0_0);
TensorView* tv0_1 = makeContigTensor(2);
tv0_1->setAllocationDomain({tv0_1->axis(1), tv0_1->axis(0)}, true);
fusion->addInput(tv0_1);
TensorView* tv0 = add(tv0_0, tv0_1);
// tv1 {i0, i1}
auto tv1 = relu(tv0);
fusion->addOutput(tv1);
// tv2 {i0, b2, i1}
auto tv2 = broadcast(tv1, {false, true, false});
// tv3 {i0, b3{1 ex 4}, i1}
auto tv3 = expand(
tv2,
{tv2->axis(0)->extent(),
IrBuilder::create<Val>(4),
tv2->axis(2)->extent()});
// Note that currently expand doesn't introduce an iter domain operation, so
// we don't see that i4 is produced by realizing the expanded extent of b3{1
// ex 4} tv4 {i0, i4*i1}
auto tv4 = reshape(tv3, {1024, 4, 128}, {1024, 4*128});
fusion->addOutput(tv4);
// This one should be scheduled by transpose_scheduler
FusionExecutorCache executor_cache(std::move(fusion_ptr));
auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}
Right now on TOT, the first example does go through transpose scheduler, but the second example hits issue #3512 .
While I'm working on the issue, we added extra checks for pointwise scheduler to ensure that the reference tensor would be able to replay its transformation to every I/O TensorViews.
It does fix the functional issue on TransposeScheduler in the second example. But now since transpose scheduler shares the same validation check, it's rejecting the first example.
The thread here shows some discussion we had on this topic: #3513 (comment)
We decided that for the time being, performance regression is a better choice than an assert. We'll be moving forward with PR #3513 . But we do want to revisit the reference_tv validation for transpose scheduler.
The text was updated successfully, but these errors were encountered:
transpose scheduler has its own requirement on how transformation should be propagated, hence the check on whether a reference tv is valid should be different from pointwise scheduler.
There are the two examples I was playing with.
Right now on TOT, the first example does go through transpose scheduler, but the second example hits issue #3512 .
While I'm working on the issue, we added extra checks for pointwise scheduler to ensure that the reference tensor would be able to replay its transformation to every I/O TensorViews.
It does fix the functional issue on TransposeScheduler in the second example. But now since transpose scheduler shares the same validation check, it's rejecting the first example.
The thread here shows some discussion we had on this topic: #3513 (comment)
We decided that for the time being, performance regression is a better choice than an assert. We'll be moving forward with PR #3513 . But we do want to revisit the reference_tv validation for transpose scheduler.
The text was updated successfully, but these errors were encountered: