Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support expression in limit clause #19834

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

feat: support expression in limit clause #19834

wants to merge 10 commits into from

Conversation

lmatz
Copy link
Contributor

@lmatz lmatz commented Dec 17, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

fix #19830

required by SQLMesh's CI test

The fix is to evaluate the expression during binding stage so that we either accept the result (it is a number >=0 after casting to bigint) or reject it (all the other cases)

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@@ -655,7 +655,7 @@ impl fmt::Display for OrderByExpr {
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct Fetch {
pub with_ties: bool,
pub quantity: Option<String>,
pub quantity: Option<Expr>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as simple as it looks like. A parsing conflict can happen:

https://github.com/postgres/postgres/blob/REL_17_2/src/backend/parser/gram.y#L13233

* Allowing full expressions without parentheses causes various parsing
* problems with the trailing ROW/ROWS key words.  SQL spec only calls for
* <simple value specification>, which is either a literal or a parameter (but
* an <SQL parameter reference> could be an identifier, bringing up conflicts
* with ROW/ROWS). We solve this by leveraging the presence of ONLY (see above)
* to determine whether the expression is missing rather than trying to make it
* optional in this rule.

* c_expr covers almost all the spec-required cases (and more), but it doesn't
* cover signed numeric literals, which are allowed by the spec. So we include
* those here explicitly. We need FCONST as well as ICONST because values that
* don't fit in the platform's "long", but do fit in bigint, should still be
* accepted here. (This is possible in 64-bit Windows as well as all 32-bit
* builds.)

For example,

select 1 offset 1 row fetch first row only;
                                  ^

This is valid in PostgreSQL. When we see this row keyword, it should be part of the fetch first [n] row syntax where n is omitted, but it may be mistakenly treated as part of the row(field0, field1) expression because we will attempt to parse an expression (instead of a numerical constant) after first.

What is the actual parsing behavior when this commit sees this SQL?

Copy link
Contributor Author

@lmatz lmatz Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about let me revert the changes for "fetch" but keep the changes for "limit",
the CI test required by SQLMesh integration needs limit simple expression only.

let fetch still stick to a literal number

I will open a new issue for the reasoning posted above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we force parentheses around the expression in Fetch just like PG

SELECT id
FROM generate_series(1, 10) AS id
ORDER BY id
OFFSET 5 ROWS
FETCH NEXT (1+4) ROWS ONLY;

id 
----
  6
  7
  8
  9
 10
(5 rows)

SELECT id
FROM generate_series(1, 10) AS id
ORDER BY id
OFFSET 5 ROWS
FETCH NEXT 1+4 ROWS ONLY;

psql:commands.sql:11: ERROR:  syntax error at or near "+"
LINE 5: FETCH NEXT 1+4 ROWS ONLY;

Copy link
Contributor

@xiangjinwu xiangjinwu Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we force parentheses around the expression in Fetch just like PG

PostgreSQL is more complex than this, but yes parenthesized expr is a subset of what PostgreSQL allows (c_expr as defined in the linked gram.y above, while limit allows a_expr. A includes C includes parenthesized expr.).

Comment on lines 55 to 70
let limit = limit.unwrap_or(ExprImpl::literal_bigint(LIMIT_ALL_COUNT as i64));
if !limit.is_const() {
return Err(ErrorCode::ExprError(
"expects an integer or expression that can be evaluated to an integer after LIMIT"
.into(),
)
.into());
}
let limit_cast_to_bigint = limit.cast_explicit(DataType::Int64).map_err(|_| {
RwError::from(ErrorCode::ExprError(
"expects an integer or expression that can be evaluated to an integer after LIMIT"
.into(),
))
})?;
let limit = match limit_cast_to_bigint.fold_const() {
Ok(Some(datum)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This validation logic is better placed in binder, and BoundQuery::limit remains an Option<u64>. Planner is in charge of transforming SQL into a plan graph.
  • We can do cast first and then try_fold_const, rather than using is_const and fold_const separately.
  • The allowed casts in PostgreSQL is assign rather than explicit. (int requires implicit, decimal requires assign, jsonb requires explicit. I tested in PostgreSQL and LIMIT '1'::jsonb is not allowed.)
  • When the evaluated output is None, which means SQL NULL, PostgreSQL treats it as no limit. Let's add a test case for this.
  • Test error cases: wrong type, not const, eval error, negative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test error cases: wrong type, not const, eval error, negative

wrong type, negative can be constructed, I added them

but "eval error and not const" seems to be avoided by the parsing stage already

})?;
let limit = match limit_cast_to_bigint.try_fold_const() {
Some(Ok(Some(datum))) => {
let value = datum.as_integral();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as_i64 as we already did the cast above.

Comment on lines +207 to +211
// wrong type, not const, eval error all belongs to this branch
_ => return Err(ErrorCode::ExprError(
"expects an integer or expression that can be evaluated to an integer after LIMIT"
.into(),
).into()),
Copy link
Contributor

@xiangjinwu xiangjinwu Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Wrong type already handled as cast_assign error above.
  • Not const corresponds to None. The test case can be select 1 limit generate_series(1, 2);
  • Eval error corresponds to Some(Err(_)). The test case can be select 1 limit 2/0;

Preferably _ shall be avoided when possible.

Comment on lines +228 to +229
statement error
SELECT * FROM generate_series(0,3,1) as t(v) order by v limit NULL + '1';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the error message for all statement error cases below, to make sure they are not failing with an unexpected reason. For example, this query actually reports function add(unknown, unknown) is not unique

statement error
SELECT * FROM generate_series(0,3,1) as t(v) order by v limit NULL + '1';
----
db error: ERROR:  Failed to run the query

Caused by these errors (recent errors listed first):
  1: Failed to bind expression: NULL + '1'
  2: Bind error: function add(unknown, unknown) is not unique

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to support expression evaluation after LIMIT
2 participants