Column named `timestamp` escapes backticks #2155

neelance · 2023-03-13T09:49:20Z

What's up?

Version 0.6.0 seems to be more restrictive regarding column names. I have two examples that stopped working:

from test
select [type]

gives error

function call expected one of !, (, +, -, ==, [, ], an identifier or keyword case, but found keyword type

and

from test
select [timestamp]

gives error

function std.select, param `columns` expected type `column`, but found type `set`

Is there an easy way to fix the PRQL code? If yes, then maybe the error message should include a hint for the fix, since the column names type and timestamp are probably common?

The text was updated successfully, but these errors were encountered:

max-sixty · 2023-03-13T09:54:36Z

Thanks a lot for the report. Agree that the error messages should be much better.

Does surrounding the name with backticks work? (Can look more tomorrow)

neelance · 2023-03-13T09:57:27Z

For type backticks seem to work, for timestamp they don't.

max-sixty · 2023-03-13T21:30:06Z

Great, confirmed on timestamp. For the moment, we can use an s-string, as a standard escape hatch:

from test
select [s"timestamp"]

But it's important for consistency that the backticks always work; they should take priority over any keyword.

max-sixty · 2023-03-13T21:32:00Z

Re the solution — this is happening because after parsing, we match on keywords:

prql/prql-compiler/src/semantic/context.rs

Line 388 in 2d6e986

"timestamp" => TyLit::Timestamp,

Possibly we should be matching for keyword during parsing, because that's when we know whether a term is escaped with backticks or not.

If you have thoughts @aljazerzen ...

max-sixty · 2023-03-13T21:45:59Z

And possible https://prql-lang.org/book/syntax.html#keywords is not accurate, and type should be there? Or this is a bug?

aljazerzen · 2023-03-14T07:37:39Z

Yes I added type keyword. Forgot to update the docs.

Re the timestamp: the dot prefix from #1619 would make this unambitious here: .timestamp would mean the column, timestamp would mean std.timestamp.

Also, you can use test.timestamp instead of just timestamp to refer to your column.

priithaamer · 2023-03-14T08:37:15Z

Another thing that broke in 0.6.0 and looks related to this

from test
derive [
  date = '2023-01-01'
]
select [
  date
]

Gives an error

Error: 
   ╭─[:6:3]
   │
 6 │   date
   ·   ──┬──  
   ·     ╰──── Ambiguous name. Could be from any of _frame.date, date
───╯

Same goes for column named time.

Is this just a bug that gets fixed or shall we escape all these fields going forward? I'd say that date and time are just too common words for column names. Would be super nice if there was no need to escape these.

Thanks for the confirmation on type keyword. Also managed to run into this.

aljazerzen · 2023-03-14T09:08:15Z

I wouldn't promise that we will just fix this, but we do need to find a way for adding things to std without breaking existing queries.

max-sixty · 2023-03-14T18:11:21Z

I wouldn't promise that we will just fix this

WDYM by this? (not disagreeing at all, just trying to understand the literal meaning)

aljazerzen · 2023-03-14T20:32:40Z

I meant that we want to have std.timestamp declared and we want it to be available without using std. prefix. But there is also a possibility that we introduce some new name resolution rules or structure module std differently so name timestamp can unambiguously be used for columns.

As I wrote, the overarching problem we will have to solve is the fact that currently any new declarations in std. are a breaking change. And I don't have any good ideas on how to tackle that.

max-sixty · 2023-03-15T03:28:14Z

As I wrote, the overarching problem we will have to solve is the fact that currently any new declarations in std. are a breaking change. And I don't have any good ideas on how to tackle that.

I'm less worried about this. I think for the moment they'll be small breaking changes, and then when we get to 1.0, we'll have versioning; same as rust... It depends on how widespread this is — if we're adding lots of keywords like date then that will be disruptive. If we add mod once, that's probably fine.

Also

I think we can avoid error-ing when the keyword is in backticks — do you agree?
Do we actually need to error when these are used as column names? This is an area of the code I still don't understand well, I can look more.

(to some extent these priorities are opposing — the first is easier if we parse keywords in the parser, the second is easier if we wait to resolve keywords at resolve time. Though atm we get neither correct :) )

Re the timestamp: the dot prefix from #1619 would make this unambitious here: .timestamp would mean the column, timestamp would mean std.timestamp.

FWIW I agree that this is one of the costs of using bare words like we do. It has benefits (I'd argue large ones, on net), but this is a cost.

aljazerzen · 2023-03-15T08:24:04Z

As I wrote, the overarching problem we will have to solve is the fact that currently any new declarations in std. are a breaking change. And I don't have any good ideas on how to tackle that.

I'm less worried about this. I think for the moment they'll be small breaking changes, and then when we get to 1.0, we'll have versioning; same as rust... It depends on how widespread this is — if we're adding lots of keywords like date then that will be disruptive. If we add mod once, that's probably fine.

Rust does not make breaking changes. That's why all releases are 1.x and not 2.0, 3.0 and so on. If we also want that (I strongly feel that we do), we won't be able to add, for example, std.string_starts_with or std.corr. Ever.

That's my concern.

* I think we can avoid error-ing when the keyword is in backticks — do you agree?

I agree, but this only works for type. Backticks essentially say "don't treat this as a keyword, but as an ident". But timestamp is an ident already, not a keyword. The problem is that when that ident is resolved, it may match either with std.timestamp of with test.timestamp (if test is the name of your relation).

max-sixty · 2023-03-15T18:37:06Z

I agree, but this only works for type. Backticks essentially say "don't treat this as a keyword, but as an ident". But timestamp is an ident already, not a keyword.

OK yes, I hadn't understood that distinction; I do now.

One possibility is to resolve timestamp to a type only when can be a type. But then we have more complicated scoping rules... I will think more.
Somehow we have to let date be a column, yet I'm also fairly convinced that SQL's use of bare words is a big advantage in approachability & usability, even if it's a bit less robust.

Rust does not make breaking changes. That's why all releases are 1.x and not 2.0, 3.0 and so on. If we also want that (I strongly feel that we do), we won't be able to add, for example, std.string_starts_with or std.corr. Ever.

It does add new keywords though. I get there's a subtle difference from adding things to std, but it has a similar effect — upgrading edition means a previously allowed word is no longer allowed.

aljazerzen · 2023-03-16T08:38:35Z

It does add new keywords though. I get there's a subtle difference from adding things to std, but it has a similar effect — upgrading edition means a previously allowed word is no longer allowed.

That's true, but editions are not the same as versions. A new version or rustc must support all editions from 1.0 onwards - i'm not sure exactly how we will handle versioning post 1.0, so I guess we don't have to bother with this problem now.

max-sixty · 2023-04-28T21:54:36Z

How do we think most SQL compilers handle this? Because we can write:

(edit: updated from below)

SELECT 
  date,
  CAST(date_int AS DATE)
FROM foo

...i.e. it works in both places (and with any caps).

Maybe they resolve the idents later in the compilation pipeline? (Not saying this is better or worse, just trying to understand the options)

aljazerzen · 2023-05-01T11:22:46Z

You probably mean:

SELECT 
  date,
  CAST(date_int AS DATE)
FROM foo

or

SELECT 
  date,
  date_int::DATE -- postgres cast syntax
FROM foo

The answer is that SQL has context-dependent name resolution rules - the thing that I try to avoid. I advocate that ident date cannot mean both "the column named date" and "the type named date", depending on syntactic structure it is used in.

max-sixty · 2023-05-01T19:08:58Z

The answer is that SQL has context-dependent name resolution rules - the thing that I try to avoid.

Yeah, I def agree with this impulse...

This issue boosts the status of #1619, or something similar to it.

Possibly it's a trilemma:

Bare words — i.e. refer to columns as date not "date"
Reasonable parsing & resolving — i.e. don't resolve whether something is a type or an ident at the end
Flexible naming — i.e. we can name columns date!

..pick two

max-sixty · 2023-06-10T03:35:57Z

I'm coming back to this. I wonder whether we can handle `date` as a column — i.e. backticks doesn't just mean "ident which might be a column or a type" but instead "definitely a column".

From the perspective of the language, I think this would work, and would provide a way of forcing an item to be interpreted as a column.

From the perspective of the compiler, I think It would require having a new PL type.

An alternative would be some form of #1619. I found the semantics in the original proposal quite confusing. I do find the jq approach (outlined at #1619 (comment)) quite reasonable. It would be another huge change though.

aljazerzen · 2023-06-12T05:59:08Z

Would this also apply to other things in std? Like sum? Or select? Right now, from perspective of name resolution, they are equivalent to date.

This would be similar to how Rust also has special name resolution rules for handling types (i32 instead of std::i32, Result instead of std::Result).

Alternatively, we could formalize the _frame namespace, rename it to relation or something and make it a part of the language.

max-sixty · 2023-06-13T00:04:41Z

Would this also apply to other things in std? Like sum? Or select? Right now, from perspective of name resolution, they are equivalent to date.

No, it would be tagged as a column.

Same as my comments in #1619 — the difference between a) columns vs. b) function&types is the difficulty, rather than between stdlib & others.

aljazerzen · 2023-06-13T07:10:41Z

Ok, so let me get this cleared up. This is what you propose:

from x
derive {date = 1, derive = 2, sum = 3}
derive {`date` + 1, `derive` + 2, `sum` + 3}
aggregate {std.sum `date`, std.sum `sum`}

I think it would work. A bit unconventional to tangle name resolution with quoting, but it would work.

What if I have:

let `my variable` = 5

from x
select {y + `my variable`}

max-sixty · 2023-06-13T20:54:02Z

let my variable = 5

from x
select {y + my variable}

These are good cases...

I'm thinking about `foo` as referring to a column / field in the same way that jq uses . (though jq requires it).

jq also has variables, which it doesn't prefix with .. So this would replicate that — `foo` always refers to a column, never to a variable.

let `my variable` = 5 would be an error; that's not a valid PRQL variable.

derive `x 2` = f"{x}_{x}" would be valid, since it's defining a column.

The cases are good, I'm slowing my enthusiasm a tiny bit on this, particularly when I think about name resolution. Maybe this is the price we pay for the life we choose (trilemma above).

But I'm hesitant to jump all the way to using .s ref:

I do find the jq approach (outlined at #1619 (comment)) quite reasonable. It would be another huge change though.

aljazerzen · 2023-06-14T08:28:00Z

let `my variable` = 5 would be an error; that's not a valid PRQL variable.

That's very limiting... imagine someone is using PRQL as a target for their UI, which allows defining tables. You would not be able to define tables with spaces in their names.

How do you feel about this?

from x
derive date = created_at
select {date}
        ----
         \____ Error: ambiguous name. Did you mean std.date or relation.date?

max-sixty · 2023-06-14T19:46:52Z

That's very limiting... imagine someone is using PRQL as a target for their UI, which allows defining tables. You would not be able to define tables with spaces in their names.

Yes, fair point. I've been thinking about it more as the query lang — same as jq — but you're right it's limiting once we expand from there.

How do you feel about this?

from x
derive date = created_at
select {date}
        ----
         \____ Error: ambiguous name. Did you mean std.date or relation.date?

I think it's a nice error message! It's maybe better than the `date` concept, given the holes in that.

It's still not great. But possibly that's the price we pay for the life we choose!

Do you have thoughts on the modified .foo idea (here)? I'm not convinced, partly because of the size of the change, but it does maaaybe synthesize the tradeoffs...

aljazerzen · 2023-06-15T15:54:53Z

Do you have thoughts on the modified .foo idea (#1619 (comment))? I'm not convinced, partly because of the size of the change, but it does maaaybe synthesize the tradeoffs...

Hmmm. So when referring to types and functions, there is no leading ., but when referring to data (i.e. everything else) there is. This would clash with our idea that function decls are no different than variable decls.

I don't like that very much, on top of the reasons for which we abolished #1619 in the first place.

It's still not great.

I do feel the same way here - it's quite verbose.

All in all, I'm:

-1 on `date`,
-0.5 on .date (modified Relative names #1619) and
-0 on relation.date.

max-sixty · 2023-06-16T20:03:14Z

Hmmm. So when referring to types and functions, there is no leading ., but when referring to data (i.e. everything else) there is. This would clash with our idea that function decls are no different than variable decls.

I see what you mean about treating these differently from the perspective of the compiler.

I do think it's an simple mental model for users, though — that data goes into a namespace which is queried with . (it would be the default_db namespace, or the _frame / relation namespace for columns). That was the primary issue I had with the previous proposal — that the mental model for users was awkward, even if easy for the compiler.

(Possibly I should either really propose it or drop it — rather than being on the fence but continuing discussion...)

aljazerzen · 2023-06-19T07:10:55Z

I do think it's an simple mental model for users, though

Apparently we have a bit different mental models and the feeling of what people find "simple". My reasoning may be influenced by compiler's perspective, but this is how I think of our semantics - and I'm sure I'm not the only one.

But this does show that such feature would divide PRQL users depending on how they think about variables. And we don't want that.

aljazerzen · 2023-06-19T07:11:21Z

Sooo... relation.timestamp?

max-sixty · 2023-06-19T08:25:29Z

Apparently we have a bit different mental models and the feeling of what people find "simple". My reasoning may be influenced by compiler's perspective, but this is how I think of our semantics - and I'm sure I'm not the only one.

I hear that. Arguably jq is some precedent though!

Sooo... relation.timestamp?

Yes I like it!

😁

aljazerzen · 2023-07-10T16:12:42Z

This is done. Timestamp can now be referenced with this.timestamp.

max-sixty added the bug Invalid compiler output or panic label Mar 13, 2023

max-sixty changed the title ~~Trouble with column names~~ Column named timestamp escapes backticks Mar 13, 2023

max-sixty mentioned this issue Mar 21, 2023

Columns of certain names like "date" can not be directly referred to? #2273

Closed

2 tasks

max-sixty added the priority label Mar 21, 2023

max-sixty mentioned this issue Apr 28, 2023

Can't reference date within a function on a relation #2503

Closed

2 tasks

aljazerzen closed this as completed Jul 10, 2023

max-sixty mentioned this issue Aug 7, 2023

Table name and transform name collisions #3271

Open

2 tasks

max-sixty mentioned this issue Dec 18, 2023

weird error when using a column that is a std function name #3912

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column named `timestamp` escapes backticks #2155

Column named `timestamp` escapes backticks #2155

neelance commented Mar 13, 2023

max-sixty commented Mar 13, 2023

neelance commented Mar 13, 2023

max-sixty commented Mar 13, 2023

max-sixty commented Mar 13, 2023

max-sixty commented Mar 13, 2023

aljazerzen commented Mar 14, 2023

priithaamer commented Mar 14, 2023

aljazerzen commented Mar 14, 2023

max-sixty commented Mar 14, 2023

aljazerzen commented Mar 14, 2023

max-sixty commented Mar 15, 2023

aljazerzen commented Mar 15, 2023

max-sixty commented Mar 15, 2023

aljazerzen commented Mar 16, 2023

max-sixty commented Apr 28, 2023 •

edited

Loading

aljazerzen commented May 1, 2023

max-sixty commented May 1, 2023

max-sixty commented Jun 10, 2023

aljazerzen commented Jun 12, 2023

max-sixty commented Jun 13, 2023

aljazerzen commented Jun 13, 2023

max-sixty commented Jun 13, 2023

aljazerzen commented Jun 14, 2023

max-sixty commented Jun 14, 2023

aljazerzen commented Jun 15, 2023

max-sixty commented Jun 16, 2023

aljazerzen commented Jun 19, 2023

aljazerzen commented Jun 19, 2023

max-sixty commented Jun 19, 2023

aljazerzen commented Jul 10, 2023

Column named timestamp escapes backticks #2155

Column named timestamp escapes backticks #2155

Comments

neelance commented Mar 13, 2023

What's up?

max-sixty commented Mar 13, 2023

neelance commented Mar 13, 2023

max-sixty commented Mar 13, 2023

max-sixty commented Mar 13, 2023

max-sixty commented Mar 13, 2023

aljazerzen commented Mar 14, 2023

priithaamer commented Mar 14, 2023

aljazerzen commented Mar 14, 2023

max-sixty commented Mar 14, 2023

aljazerzen commented Mar 14, 2023

max-sixty commented Mar 15, 2023

aljazerzen commented Mar 15, 2023

max-sixty commented Mar 15, 2023

aljazerzen commented Mar 16, 2023

max-sixty commented Apr 28, 2023 • edited Loading

aljazerzen commented May 1, 2023

max-sixty commented May 1, 2023

max-sixty commented Jun 10, 2023

aljazerzen commented Jun 12, 2023

max-sixty commented Jun 13, 2023

aljazerzen commented Jun 13, 2023

max-sixty commented Jun 13, 2023

aljazerzen commented Jun 14, 2023

max-sixty commented Jun 14, 2023

aljazerzen commented Jun 15, 2023

max-sixty commented Jun 16, 2023

aljazerzen commented Jun 19, 2023

aljazerzen commented Jun 19, 2023

max-sixty commented Jun 19, 2023

aljazerzen commented Jul 10, 2023

Column named `timestamp` escapes backticks #2155

Column named `timestamp` escapes backticks #2155

max-sixty commented Apr 28, 2023 •

edited

Loading