Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UT][BugFix] fix PullUpScanPredicateRule (backport #53740) #53838

Merged
merged 2 commits into from
Dec 16, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 11, 2024

Why I'm doing:

What I'm doing:

Fixes https://github.com/StarRocks/StarRocksTest/issues/8896

Fix some bugs in PullUpScanPredicateRule:

  1. Limit is not handled correctly.

when there is a limit on ScanOperator, we need to extract the limit from ScanOperator to FilterOperator

  1. semi-structured data is not handled correctly, it will cause the optimization of subfield column pruning to fail.

for this problem, we cannot directly give up extracting related expressions from scan predicates, otherwise we will lose many opportunities to reuse expressions.
my solution: after extracting the reserved predicate in FilterOperator, we also need to collect the expressions that can be used for subfield column pruning, then add them to the scan projection and replace them with column ref in the final predicate.

taking this query as an example, before fixing, we need read the whole json column since json columns are in project node.

mysql> desc t1;
+----------------------+------+------+-------+---------+-------+
| Field                | Type | Null | Key   | Default | Extra |
+----------------------+------+------+-------+---------+-------+
| k1                   | int  | YES  | true  | NULL    |       |
| no_match_flat_json   | json | YES  | false | NULL    |       |
| one_layer_flat_json  | json | YES  | false | NULL    |       |
| many_layer_flat_json | json | YES  | false | NULL    |       |
+----------------------+------+------+-------+---------+-------+
4 rows in set (0.00 sec)
mysql> explain select k1 from t1 where no_match_flat_json->'$.k9.k0.k3' = one_layer_flat_json->'$.k5';
+---------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                |
+---------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                               |
|  OUTPUT EXPRS:1: k1                                                                                           |
|   PARTITION: UNPARTITIONED                                                                                    |
|                                                                                                               |
|   RESULT SINK                                                                                                 |
|                                                                                                               |
|   3:EXCHANGE                                                                                                  |
|                                                                                                               |
| PLAN FRAGMENT 1                                                                                               |
|  OUTPUT EXPRS:                                                                                                |
|   PARTITION: RANDOM                                                                                           |
|                                                                                                               |
|   STREAM DATA SINK                                                                                            |
|     EXCHANGE ID: 03                                                                                           |
|     UNPARTITIONED                                                                                             |
|                                                                                                               |
|   2:SELECT                                                                                                    |
|   |  predicates: json_query(2: no_match_flat_json, '$.k9.k0.k3') = json_query(3: one_layer_flat_json, '$.k5') |
|   |                                                                                                           |
|   1:Project                                                                                                   |
|   |  <slot 1> : 1: k1                                                                                         |
|   |  <slot 2> : 2: no_match_flat_json                                                                         |
|   |  <slot 3> : 3: one_layer_flat_json                                                                        |
|   |                                                                                                           |
|   0:OlapScanNode                                                                                              |
|      TABLE: t1                                                                                                |
|      PREAGGREGATION: ON                                                                                       |
|      partitions=1/4                                                                                           |
|      rollup: t1                                                                                               |
|      tabletRatio=2/2                                                                                          |
|      tabletList=48051,48053                                                                                   |
|      cardinality=7                                                                                            |
|      avgRowSize=2052.0                                                                                        |
+---------------------------------------------------------------------------------------------------------------+
33 rows in set (0.01 sec)

after fixing, only json_query(xx) in project node, we don't need read the whole column

mysql> explain verbose select k1 from t1 where no_match_flat_json->'$.k9.k0.k3' = one_layer_flat_json->'$.k5';
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| RESOURCE GROUP: default_wg                                                                                                                               |
|                                                                                                                                                          |
| PLAN COST                                                                                                                                                |
|   CPU: 20500.0                                                                                                                                           |
|   Memory: 0.0                                                                                                                                            |
|                                                                                                                                                          |
| PLAN FRAGMENT 0(F01)                                                                                                                                     |
|   Fragment Cost: 0.0                                                                                                                                     |
|   Output Exprs:1: k1                                                                                                                                     |
|   Input Partition: UNPARTITIONED                                                                                                                         |
|   RESULT SINK                                                                                                                                            |
|                                                                                                                                                          |
|   3:EXCHANGE                                                                                                                                             |
|      cardinality: 5                                                                                                                                      |
|                                                                                                                                                          |
| PLAN FRAGMENT 1(F00)                                                                                                                                     |
|   Fragment Cost: 10250.0                                                                                                                                 |
|                                                                                                                                                          |
|   Input Partition: RANDOM                                                                                                                                |
|   OutPut Partition: UNPARTITIONED                                                                                                                        |
|   OutPut Exchange Id: 03                                                                                                                                 |
|                                                                                                                                                          |
|   2:SELECT                                                                                                                                               |
|   |  predicates: 5: json_query = 6: json_query                                                                                                           |
|   |  cardinality: 5                                                                                                                                      |
|   |                                                                                                                                                      |
|   1:Project                                                                                                                                              |
|   |  output columns:                                                                                                                                     |
|   |  1 <-> [1: k1, INT, true]                                                                                                                            |
|   |  5 <-> json_query[([2: no_match_flat_json, JSON, true], '$.k9.k0.k3'); args: JSON,VARCHAR; result: JSON; args nullable: true; result nullable: true] |
|   |  6 <-> json_query[([3: one_layer_flat_json, JSON, true], '$.k5'); args: JSON,VARCHAR; result: JSON; args nullable: true; result nullable: true]      |
|   |  cardinality: 5                                                                                                                                      |
|   |                                                                                                                                                      |
|   0:OlapScanNode                                                                                                                                         |
|      table: t1, rollup: t1                                                                                                                               |
|      preAggregation: on                                                                                                                                  |
|      partitionsRatio=1/4, tabletsRatio=2/2                                                                                                               |
|      tabletList=48051,48053                                                                                                                              |
|      actualRows=7, avgRowSize=2054.0                                                                                                                     |
|      ColumnAccessPath: [/no_match_flat_json/k9/k0/k3(json), /one_layer_flat_json/k5(json)]                                                               |
|      cardinality: 5                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------------------+
41 rows in set (0.01 sec)

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Copy link
Contributor Author

mergify bot commented Dec 11, 2024

Cherry-pick of 8eff033 has failed:

On branch mergify/bp/branch-3.3/pr-53740
Your branch is up to date with 'origin/branch-3.3'.

You are currently cherry-picking commit 8eff0335b3.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   test/sql/test_expr_reuese/R/test_scan_predicate_expr_reuse
	new file:   test/sql/test_expr_reuese/T/test_scan_predicate_expr_reuse

Unmerged paths:
  (use "git add/rm <file>..." as appropriate to mark resolution)
	both modified:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/Optimizer.java
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rewrite/TableScanPredicateExtractor.java
	deleted by us:   fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/transformation/PullUpScanPredicateRule.java
	deleted by us:   fe/fe-core/src/test/java/com/starrocks/sql/optimizer/ScanPredicateExprReuseTest.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link
Contributor Author

mergify bot commented Dec 11, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@silverbullet233
Copy link
Contributor

@mergify rebase

Copy link
Contributor Author

mergify bot commented Dec 16, 2024

rebase

☑️ Nothing to do

  • -closed [📌 rebase requirement]
  • -conflict [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]
  • any of:
    • #commits-behind > 0 [📌 rebase requirement]
    • #commits > 1 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]

@wanpengfei-git wanpengfei-git enabled auto-merge (squash) December 16, 2024 00:50
Signed-off-by: silverbullet233 <[email protected]>
(cherry picked from commit 8eff033)
@silverbullet233 silverbullet233 force-pushed the mergify/bp/branch-3.3/pr-53740 branch from af2bbc8 to 7a2b551 Compare December 16, 2024 00:58
Signed-off-by: silverbullet233 <[email protected]>
@wanpengfei-git wanpengfei-git merged commit be8c9b5 into branch-3.3 Dec 16, 2024
29 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.3/pr-53740 branch December 16, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants