Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: push_down_filter causes incorrect results #16779

Open
1 of 2 tasks
forsaken628 opened this issue Nov 6, 2024 · 1 comment
Open
1 of 2 tasks

bug: push_down_filter causes incorrect results #16779

forsaken628 opened this issue Nov 6, 2024 · 1 comment
Labels
C-bug Category: something isn't working

Comments

@forsaken628
Copy link
Collaborator

forsaken628 commented Nov 6, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Version

9c2c3dd

What's Wrong?

SELECT *
FROM
  (SELECT i_category,
          i_class,
          i_brand,
          i_product_name,
          sumsales,
          rank() OVER (PARTITION BY i_category
                       ORDER BY sumsales DESC) rk
   FROM
     (SELECT i_category,
             i_class,
             i_brand,
             i_product_name,
             d_year,
             d_qoy,
             d_moy,
             s_store_id,
             sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
      FROM store_sales,
           date_dim,
           store,
           item
      WHERE ss_sold_date_sk=d_date_sk
        AND ss_item_sk=i_item_sk
        AND ss_store_sk = s_store_sk
        AND d_month_seq BETWEEN 1200 AND 1200+11
      GROUP BY rollup(i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id))dw1) dw2
WHERE rk <= 100 and i_category = 'Books' order by i_category nulls first;

The outermost level of this query contains the predicate i_category = 'Books', but the result contains rows where i_category is null.

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│    i_category    │      i_class     │        i_brand        │  i_product_name  │         sumsales         │   rk   │
│ Nullable(String) │ Nullable(String) │    Nullable(String)   │ Nullable(String) │ Nullable(Decimal(38, 2)) │ UInt64 │
├──────────────────┼──────────────────┼───────────────────────┼──────────────────┼──────────────────────────┼────────┤
│ NULL             │ NULL             │ NULL                  │ NULL             │ 1480754.12               │      1 │
│ Books            │ NULL             │ NULL                  │ NULL             │ 1480754.12               │      1 │
│ Books            │ fiction          │ NULL                  │ NULL             │ 356542.34                │      2 │
│ Books            │ self-help        │ NULL                  │ NULL             │ 345808.39                │      3 │
│ Books            │ fiction          │ scholarunivamalg #8   │ NULL             │ 269225.19                │      4 │

How to Reproduce?

run tests/sqllogictests/scripts/prepare_tpcds_data.sh to construct the dataset

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@forsaken628 forsaken628 added the C-bug Category: something isn't working label Nov 6, 2024
@inviscid
Copy link

It also appears that the ROLLUP() method cannot be combined with other GROUP BY columns to reduce the number of generated combinations. This is an example:

DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
    brand VARCHAR NOT NULL,
    segment VARCHAR NOT NULL,
    quantity INT NOT NULL
);
INSERT INTO sales (brand, segment, quantity)
VALUES
    ('ABC', 'Premium', 100),
    ('ABC', 'Basic', 200),
    ('XYZ', 'Premium', 100),
    ('XYZ', 'Basic', 300);
    
-- This Works
SELECT
    brand,
    segment,
    SUM (quantity)
FROM
    sales
GROUP BY
    ROLLUP (brand, segment)
ORDER BY
    brand,
    segment;
    
-- This does not work
SELECT
    brand,
    segment,
    SUM (quantity)
FROM
    sales
GROUP BY
    brand,
    ROLLUP (segment)
ORDER BY
    brand,
    segment;

It returns the following error:

unexpected `segment`, expecting `SELECT`, `EXCEPT`, `INTERSECT`, `IGNORE_RESULT`, `LIMIT`, `OFFSET`, `FROM`, `UNION`, `ORDER`, `VALUES`, `(`, or `WITH`

The expected behavior is that the any number of GROUP BY columns can be specified prior to specifying the ROLLUP() columns in order to reduce the calculated columns.

If this is a separate issue from the one above I would be happy to create a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants