Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transpose scheduler should have its own reference_tv validation logic #3570

Open
jjsjann123 opened this issue Dec 11, 2024 · 0 comments
Open
Assignees

Comments

@jjsjann123
Copy link
Collaborator

transpose scheduler has its own requirement on how transformation should be propagated, hence the check on whether a reference tv is valid should be different from pointwise scheduler.

There are the two examples I was playing with.

TEST_F(PointwiseTest, TransposeSchedulerShouldAccept) {
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv0 {i0, i1}
  TensorView* tv0 = makeContigTensor(2);
  fusion->addInput(tv0);
  // tv1 {i0, i1}
  TensorView* tv1 = makeContigTensor(2);
  tv1->setAllocationDomain({tv1->axis(1), tv1->axis(0)}, true);

  fusion->addInput(tv1);

  // tv2 {b2, i0, i1}
  auto tv2 = broadcast(tv1, {true, false, false});

  // tv3 {b3, i0, i1}
  auto tv3 = broadcast(tv0, {true, false, false});
  // tv4 {b3{1 ex 32}, i0, i1}
  auto tv4 = expand(
      tv2,
      {IrBuilder::create<Val>(32),
       tv2->axis(1)->extent(),
       tv2->axis(2)->extent()});

  // tv5 {b3{1 ex 32}, i0, i1}
  auto tv5 = add(tv4, tv3);
  // tv6 {i4{32} * i0, i1}
  auto tv6 = reshape(tv5, {32, 1024, 128}, {32*1024, 128});
  fusion->addOutput(tv6);

  // This one should be scheduled by transpose_scheduler
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
  at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}

TEST_F(PointwiseTest, TransposeSchedulerShouldReject) {
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv0 {i0, i1}
  TensorView* tv0_0 = makeContigTensor(2);
  fusion->addInput(tv0_0);
  TensorView* tv0_1 = makeContigTensor(2);
  tv0_1->setAllocationDomain({tv0_1->axis(1), tv0_1->axis(0)}, true);
  fusion->addInput(tv0_1);
  TensorView* tv0 = add(tv0_0, tv0_1);
  // tv1 {i0, i1}
  auto tv1 = relu(tv0);
  fusion->addOutput(tv1);
  // tv2 {i0, b2, i1}
  auto tv2 = broadcast(tv1, {false, true, false});
  // tv3 {i0, b3{1 ex 4}, i1}
  auto tv3 = expand(
      tv2,
      {tv2->axis(0)->extent(),
       IrBuilder::create<Val>(4),
       tv2->axis(2)->extent()});
  // Note that currently expand doesn't introduce an iter domain operation, so
  // we don't see that i4 is produced by realizing the expanded extent of b3{1
  // ex 4} tv4 {i0, i4*i1}
  auto tv4 = reshape(tv3, {1024, 4, 128}, {1024, 4*128});
  fusion->addOutput(tv4);

  // This one should be scheduled by transpose_scheduler
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1024, 128}, {128, 1}, options);
  at::Tensor input1 = at::empty_strided({1024, 128}, {1, 1024}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}

Right now on TOT, the first example does go through transpose scheduler, but the second example hits issue #3512 .

While I'm working on the issue, we added extra checks for pointwise scheduler to ensure that the reference tensor would be able to replay its transformation to every I/O TensorViews.
It does fix the functional issue on TransposeScheduler in the second example. But now since transpose scheduler shares the same validation check, it's rejecting the first example.

The thread here shows some discussion we had on this topic: #3513 (comment)
We decided that for the time being, performance regression is a better choice than an assert. We'll be moving forward with PR #3513 . But we do want to revisit the reference_tv validation for transpose scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant