Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column reference error for loop inside functions #4717

Open
2 tasks done
snth opened this issue Jul 10, 2024 · 6 comments
Open
2 tasks done

Column reference error for loop inside functions #4717

snth opened this issue Jul 10, 2024 · 6 comments
Labels
bug Invalid compiler output or panic

Comments

@snth
Copy link
Member

snth commented Jul 10, 2024

What happened?

There's a regression in PRQL version:0.12.2 in that some loop examples no longer compile correctly.

The problems are:

  • References to _expr_0, _expr_1, _expr_2, _expr_3, and _expr_4 instead of t, x, and ewma,
  • Recursive reference to table_2.t in ) AS table_5 ON table_5.t = table_5.t + 1.

The expected output runs correctly in shell.duckdb.org.

PRQL input

let calc_ewma = func span rel -> (
  from rel
  filter t==1
  select {t, x, ewma=x}
  loop (
    join rel (that.t==this.t+1)
    select {t, x, ewma = 2/(span+1)*x + (1-2/(span+1))*ewma}
  )
)

let data = [
  {ts=1, value=1},
  {ts=2, value=12},
  {ts=3, value=13},
  {ts=4, value=5},
  {ts=5, value=1},
  ]

from data
select {t=ts, x=value}
calc_ewma 3
select {ts=t, value=x, value_ewma3=ewma}

SQL output

WITH RECURSIVE table_0 AS (
  SELECT
    1 AS ts,
    1 AS value
  UNION
  ALL
  SELECT
    2 AS ts,
    12 AS value
  UNION
  ALL
  SELECT
    3 AS ts,
    13 AS value
  UNION
  ALL
  SELECT
    4 AS ts,
    5 AS value
  UNION
  ALL
  SELECT
    5 AS ts,
    1 AS value
),
data AS (
  SELECT
    ts,
    value
  FROM
    table_0
),
table_3 AS (
  SELECT
    ts AS t,
    value AS x
  FROM
    data
),
table_2 AS (
  SELECT
    _expr_0,
    _expr_1,
    _expr_1 AS _expr_2
  FROM
    table_3
  WHERE
    _expr_0 = 1
  UNION
  ALL
  SELECT
    table_5.t AS _expr_3,
    table_5.x AS _expr_4,
    2 / (3 + 1) * table_5.x + (1 - 2 / (3 + 1)) * table_2._expr_2
  FROM
    table_2
    JOIN (
      SELECT
        ts AS t,
        value AS x
      FROM
        data
    ) AS table_5 ON table_5.t = table_5.t + 1
)
SELECT
  t AS ts,
  x AS value,
  _expr_2 AS value_ewma3
FROM
  table_2 AS table_4

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

Expected SQL output

WITH RECURSIVE table_0 AS (
  SELECT
    1 AS ts,
    1 AS value
  UNION
  ALL
  SELECT
    2 AS ts,
    12 AS value
  UNION
  ALL
  SELECT
    3 AS ts,
    13 AS value
  UNION
  ALL
  SELECT
    4 AS ts,
    5 AS value
  UNION
  ALL
  SELECT
    5 AS ts,
    1 AS value
),
data AS (
  SELECT
    ts,
    value
  FROM
    table_0
),
table_3 AS (
  SELECT
    ts AS t,
    value AS x
  FROM
    data
),
table_2 AS (
  SELECT
    t,
    x,
    x AS ewma
  FROM
    table_3
  WHERE
    t = 1
  UNION
  ALL
  SELECT
    table_5.t,
    table_5.x,
    2 / (3 + 1) * table_5.x + (1 - 2 / (3 + 1)) * table_2.ewma as ewma
  FROM
    table_2
    JOIN (
      SELECT
        ts AS t,
        value AS x
      FROM
        data
    ) AS table_5 ON table_5.t = table_2.t + 1
)
SELECT
  t AS ts,
  x AS value,
  ewma AS value_ewma3
FROM
  table_2 AS table_4

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

This might have been introduced in version 0.10 or 0.11 already as I noticed my examples failing to compile then. I've just been pinning to 0.9.5 since then.

@snth snth added the bug Invalid compiler output or panic label Jul 10, 2024
@kgutwin
Copy link
Collaborator

kgutwin commented Jul 10, 2024

I wanted to try to bisect this, but I can't reproduce a successful compile of your PRQL query going back to version 0.9.0.

SQL output from 0.9.0:

WITH RECURSIVE table_1 AS (
  SELECT
    1 AS ts,
    1 AS value
  UNION
  ALL
  SELECT
    2 AS ts,
    12 AS value
  UNION
  ALL
  SELECT
    3 AS ts,
    13 AS value
  UNION
  ALL
  SELECT
    4 AS ts,
    5 AS value
  UNION
  ALL
  SELECT
    5 AS ts,
    1 AS value
),
data AS (
  SELECT
    ts,
    value
  FROM
    table_1
),
table_3 AS (
  SELECT
    ts AS t,
    value AS x
  FROM
    data
),
table_0 AS (
  SELECT
    _expr_0,
    _expr_1,
    _expr_1 AS _expr_2
  FROM
    table_3
  WHERE
    _expr_0 = 1
  UNION
  ALL
  SELECT
    table_4.t AS _expr_3,
    table_4.x AS _expr_4,
    (2 / (3 + 1)) * table_4.x + (1 - (2 / (3 + 1))) * table_0._expr_2
  FROM
    table_0
    JOIN (
      SELECT
        ts AS t,
        value AS x
      FROM
        data
    ) AS table_4 ON table_4.t = table_4.t + 1
)
SELECT
  t AS ts,
  x AS value,
  _expr_2 AS value_ewma3
FROM
  table_0

-- Generated by PRQL compiler version:0.9.0 (https://prql-lang.org)

I didn't go back further than that, as the syntax changes introduced in 0.9.0 would require rewriting your query.

Can you confirm a version of PRQL that does successfully compile?

@snth
Copy link
Member Author

snth commented Jul 11, 2024

Thanks @kgutwin and sorry about the broken repro. I tried to create a self-contained repro from the stock price ewma example from this Google Colab notebook I use in presentations: https://colab.research.google.com/drive/146ohCI_WlnOKpQQ6sPMWu4uKGqZvwaIZ?usp=sharing

Looks like the column aliasing in select {t=ts, x=value} breaks it even in 0.9.5. It works if you remove that part, in 0.9.5 and 0.12.2 🤦‍♂️. So it looks like that's not the regression bug part but something else is because my actual example did stop working at some point which is why I've got it pinned to 0.9.5. Let me see if I can isolate that.

Here's the working example in the meantime (same calc_ewma function as above but different data):

let data = [
  {t=1, x=1},
  {t=2, x=12},
  {t=3, x=13},
  {t=4, x=5},
  {t=5, x=1},
  ]

from data
calc_ewma 3

@snth
Copy link
Member Author

snth commented Jul 11, 2024

The following also works (in both versions):

let data = [
  {t=1, x=1},
  {t=2, x=12},
  {t=3, x=13},
  {t=4, x=5},
  {t=5, x=1},
  ]

from data
calc_ewma 3
derive x_ewma3=ewma
select !{ewma}

but this doesn't (in both):

let data = [
  {ts=1, y=1},
  {ts=2, y=12},
  {ts=3, y=13},
  {ts=4, y=5},
  {ts=5, y=1},
]

from data
derive {t=ts, x=y}
calc_ewma 3
derive x_ewma3=ewma
select !{t, x, ewma}

Btw, this last version is really the example we want (using derive {t=...} ... select !{...}) because that does the necessary aliasing to use the function and then leaves the original relation with the original columns. (Actually, that function drops other columns in the source relation so perhaps the result should be joined onto the original but that's a separate issue.)

Still trying to track down the original regression bug.

@snth
Copy link
Member Author

snth commented Jul 11, 2024

Sorry, looks like it's actually one of the other examples that breaks

let list_subaccounts = root_key accounts -> (
  from parent=accounts
  filter parent_account_key==root_key
  select {parent.account_key, parent.account_name, account_path=parent.account_name, level=0}
  loop (
    join child=accounts (child.parent_account_key==parent.account_key)
    select {child.account_key, child.account_name, account_path=f"{account_path} / {child.account_name}", level=level+1}
    )
  select {level, account_name, account_path}
)

from balance_sheet
list_subaccounts 0          # Balance Sheet
# list_subaccounts 1272     # Assets

In 0.9.5 this produces:

WITH RECURSIVE table_0 AS (
  SELECT
    account_key,
    account_name,
    account_name AS account_path,
    0 AS level
  FROM
    balance_sheet AS parent
  WHERE
    parent_account_key = 0
  UNION
  ALL
  SELECT
    child.account_key AS _expr_0,
    child.account_name AS _expr_1,
    CONCAT(table_0.account_path, ' / ', child.account_name),
    table_0.level + 1
  FROM
    table_0
    JOIN balance_sheet AS child ON child.parent_account_key = table_0.account_key
)
SELECT
  level,
  account_name,
  account_path
FROM
  table_0

-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org/)

From 0.10 onwards it produces the following error:

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code
    async_ :  Bool (Experimental)

  File ["<ipython-input-2-acfd2b139886>"](https://localhost:8080/#), line 2, in <cell line: 2>
    print(prql.compile("""

  File "<string>", line unknown
SyntaxError: Error:
   ╭─[:8:13]
   │
 8 │     select {child.account_key, child.account_name, account_path=f"{account_path} / {child.account_name}", level=level+1}
   │             ────────┬────────
   │                     ╰────────── Unknown name `child.account_key`
───╯

It took me a long time to track this down because when you actually include some sample data (as you would for a repro) then it works and the error doesn't show up, e.g.

let balance_sheet = from_text format:csv """
account_key,parent_account_key,account_name
1221,0000,Balance Sheet
1272,1221,Assets
1300,1272,Current Assets
1301,1300,Cash & Cash Equivalent
1302,1301,Bank Accounts
1307,1300,Inventories
"""

from balance_sheet
list_subaccounts 0      # Balance Sheet

works.

@snth
Copy link
Member Author

snth commented Jul 11, 2024

Interestingly, without the function definition it compiles on 0.12.2:

from parent=accounts
filter parent_account_key==root_key
select {parent.account_key, parent.account_name, account_path=parent.account_name, level=0}
loop (
  join child=accounts (child.parent_account_key==parent.account_key)
  select {child.account_key, child.account_name, account_path=f"{account_path} / {child.account_name}", level=level+1}
  )
select {level, account_name, account_path}

produces

WITH RECURSIVE table_0 AS (
  SELECT
    account_key,
    account_name,
    account_name AS account_path,
    0 AS level
  FROM
    accounts AS parent
  WHERE
    parent_account_key = root_key
  UNION
  ALL
  SELECT
    child.account_key AS _expr_0,
    child.account_name AS _expr_1,
    CONCAT(table_0.account_path, ' / ', child.account_name),
    table_0.level + 1
  FROM
    table_0
    JOIN accounts AS child ON child.parent_account_key = table_0.account_key
)
SELECT
  level,
  account_name,
  account_path
FROM
  table_0 AS table_1

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

I'd argue that the column aliases in the recursive part of the query are wrong but DuckDB doesn't seem to mind and it produces the correct result set if you run the whole example:

let accounts = from_text format:csv """
account_key,parent_account_key,account_name
1221,0000,Balance Sheet
1272,1221,Assets
1300,1272,Current Assets
1301,1300,Cash & Cash Equivalent
1302,1301,Bank Accounts
1307,1300,Inventories
"""

let root_key=0

from parent=accounts
filter parent_account_key==root_key
select {parent.account_key, parent.account_name, account_path=parent.account_name, level=0}
loop (
  join child=accounts (child.parent_account_key==parent.account_key)
  select {child.account_key, child.account_name, account_path=f"{account_path} / {child.account_name}", level=level+1}
  )
select {level, account_name, account_path}

For the record, the correct SQL for the recursive section should probably be something like the following:

...
  UNION
  ALL
  SELECT
    child.account_key,
    child.account_name,
    CONCAT(table_0.account_path, ' / ', child.account_name) AS account_path,
    table_0.level + 1 AS level
  FROM
    table_0
    JOIN accounts AS child ON child.parent_account_key = table_0.account_key
...

@snth snth changed the title Regression bug for loop in 0.12.2 Column reference bug for loop inside functions Jul 11, 2024
@snth snth changed the title Column reference bug for loop inside functions Column reference error for loop inside functions Jul 11, 2024
@snth
Copy link
Member Author

snth commented Jul 11, 2024

I've changed the name of the issue.

Should I delete the first few EWMA examples since they actually work fine and are a lot to read which doesn't add to the actual problem description?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic
Projects
None yet
Development

No branches or pull requests

2 participants