transpose scheduler should have its own reference_tv validation logic #3570

jjsjann123 · 2024-12-11T02:51:18Z

transpose scheduler has its own requirement on how transformation should be propagated, hence the check on whether a reference tv is valid should be different from pointwise scheduler.

There are the two examples I was playing with.

TEST_F(PointwiseTest, TransposeSchedulerShouldAccept) {
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv0 {i0, i1}
  TensorView* tv0 = makeContigTensor(2);
  fusion->addInput(tv0);
  // tv1 {i0, i1}
  TensorView* tv1 = makeContigTensor(2);
  tv1->setAllocationDomain({tv1->axis(1), tv1->axis(0)}, true);

  fusion->addInput(tv1);

  // tv2 {b2, i0, i1}
  auto tv2 = broadcast(tv1, {true, false, false});

  // tv3 {b3, i0, i1}
  auto tv3 = broadcast(tv0, {true, false, false});
  // tv4 {b3{1 ex 32}, i0, i1}
  auto tv4 = expand(
      tv2,
      {IrBuilder::create<Val>(32),
       tv2->axis(1)->extent(),
       tv2->axis(2)->extent()});

  // tv5 {b3{1 ex 32}, i0, i1}
  auto tv5 = add(tv4, tv3);
  // tv6 {i4{32} * i0, i1}
  auto tv6 = reshape(tv5, {32, 1024, 128}, {32*1024, 128});
  fusion->addOutput(tv6);

  // This one should be scheduled by transpose_scheduler
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
  at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}

TEST_F(PointwiseTest, TransposeSchedulerShouldReject) {
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv0 {i0, i1}
  TensorView* tv0_0 = makeContigTensor(2);
  fusion->addInput(tv0_0);
  TensorView* tv0_1 = makeContigTensor(2);
  tv0_1->setAllocationDomain({tv0_1->axis(1), tv0_1->axis(0)}, true);
  fusion->addInput(tv0_1);
  TensorView* tv0 = add(tv0_0, tv0_1);
  // tv1 {i0, i1}
  auto tv1 = relu(tv0);
  fusion->addOutput(tv1);
  // tv2 {i0, b2, i1}
  auto tv2 = broadcast(tv1, {false, true, false});
  // tv3 {i0, b3{1 ex 4}, i1}
  auto tv3 = expand(
      tv2,
      {tv2->axis(0)->extent(),
       IrBuilder::create<Val>(4),
       tv2->axis(2)->extent()});
  // Note that currently expand doesn't introduce an iter domain operation, so
  // we don't see that i4 is produced by realizing the expanded extent of b3{1
  // ex 4} tv4 {i0, i4*i1}
  auto tv4 = reshape(tv3, {1024, 4, 128}, {1024, 4*128});
  fusion->addOutput(tv4);

  // This one should be scheduled by transpose_scheduler
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
  at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}

Right now on TOT, the first example does go through transpose scheduler, but the second example hits issue #3512 .

While I'm working on the issue, we added extra checks for pointwise scheduler to ensure that the reference tensor would be able to replay its transformation to every I/O TensorViews.
It does fix the functional issue on TransposeScheduler in the second example. But now since transpose scheduler shares the same validation check, it's rejecting the first example.

The thread here shows some discussion we had on this topic: #3513 (comment)
We decided that for the time being, performance regression is a better choice than an assert. We'll be moving forward with PR #3513 . But we do want to revisit the reference_tv validation for transpose scheduler.

The text was updated successfully, but these errors were encountered:

jjsjann123 mentioned this issue Dec 11, 2024

pointwise scheduler fails to validate reference tv #3513

Merged

jjsjann123 self-assigned this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transpose scheduler should have its own reference_tv validation logic #3570

transpose scheduler should have its own reference_tv validation logic #3570

jjsjann123 commented Dec 11, 2024

transpose scheduler should have its own reference_tv validation logic #3570

transpose scheduler should have its own reference_tv validation logic #3570

Comments

jjsjann123 commented Dec 11, 2024