Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nuances of the SELECT syntax: WITH, UNION, subqueries etc. #106

Open
adamziel opened this issue Apr 30, 2024 · 2 comments
Open

Support nuances of the SELECT syntax: WITH, UNION, subqueries etc. #106

adamziel opened this issue Apr 30, 2024 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@adamziel
Copy link
Collaborator

adamziel commented Apr 30, 2024

Let's go through the MySQL documentation pages and make sure even the complex SELECT queries are supported by the SQLite integration plugin:

This likely means rewriting execute_select as more of a grammar parser or a state machine and reason about each encountered token. In contrast, the current approach is to consume all the tokens unless a tactical adjustment applies. This way we could reuse the SELECT logic for WITH, UNIONs, subqueries, etc. Currently we cannot, because the execute_select method assumes it acts on an entire query, not on a part of it.

The implementation could look like this:

// Parse WITH
if($next_token->is_operator('WITH')) {
    $this->consume_with_clause();
}

/**
 * Processes the WITH clause (https://dev.mysql.com/doc/refman/8.0/en/with.html):
 *      WITH [RECURSIVE]
 *          cte_name [(col_name [, col_name] ...)] AS (subquery)
 *          [, cte_name [(col_name [, col_name] ...)] AS (subquery)] ...
 */
protected function consume_with_clause() {
    $token = $this->rewriter->consume();
    if($token->is_operator('RECURSIVE')) {
        $token = $this->rewriter->consume();
    }
    
    while(true) {
        $table_alias = $this->rewriter->consume();
        $token = $this->rewriter->consume();
        $column_aliases = null;
        if($token->is_operator('(')) {
            $column_aliases = [];
            // ...parse column aliases...
        }

        $token = $this->rewriter->consume_assert_is_keyword( 'AS' );
        $this->consume_sub_query();
        $comma_maybe = $this->rewriter->peek();
        if(!$comma_maybe->is_operator(',')) {
            break;
        }
    }
}

/**
 * Processes the SELECT statement (https://dev.mysql.com/doc/refman/8.0/en/select.html)
 *    SELECT
 *       [ALL | DISTINCT | DISTINCTROW ]
 *       [HIGH_PRIORITY]
 *       [STRAIGHT_JOIN]
 *       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
 *       [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
 *        select_expr [, select_expr] ...
 */
protected function consume_select_query() {
    $this->rewriter->consume_assert_is_keyword( 'SELECT' );
    $token = $this->rewriter->peek();
    if($token->is_keyword(['ALL', 'DISTINCT', 'DISTINCTROW'])) {
         $this->rewriter->consume();
         $token = $this->rewriter->peek();
    }
    if($token->is_keyword('HIGH_PRIORITY')) {
         $this->rewriter->skip();
         $token = $this->rewriter->peek();
    }
    // ... keep going token by token, don't just skip over things like we do now
    //     with a while loop ...
    if($is_subquery) {
        $this->consume_sub_query();
    }
    // inevitably at some point:
    if($token->is_keyword('UNION')) {
        $this->consume_select_query();
    }
}

protected function consume_sub_query() {
    // ... consume a nested query ...
    // ... can it be just a SELECT query? Or can it also be something else? ...
    // ... can it have a WITH clause? ...
    // ...
    // inevitably at some point:
    $this->consume_select_query();
}

For starters, just migrating to a state machine approach would be more than enough as it would unlock support for UNIONs and easy ignoring of tokens like HIGH_PRIORITY or SQL_SMALL_RESULT.

@adamziel adamziel added the enhancement New feature or request label Apr 30, 2024
@adamziel adamziel changed the title Support full SELECT syntax Support SELECT syntax via a state machine Apr 30, 2024
@adamziel
Copy link
Collaborator Author

Perhaps the Rewriter model is no longer useful for this approach. Here's an alternative idea:

/**
 * Processes the SELECT statement (https://dev.mysql.com/doc/refman/8.0/en/select.html)
 *    SELECT
 *       [ALL | DISTINCT | DISTINCTROW ]
 *       [HIGH_PRIORITY]
 *       [STRAIGHT_JOIN]
 *       [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
 *       [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
 *        select_expr [, select_expr] ...
 */
protected function consume_select_query() {
    $token = $this->next_token();
    $token->assert_is_keyword('SELECT');

    $token = $this->next_token();
    if($token->is_keyword(['ALL', 'DISTINCT', 'DISTINCTROW'])) {
         $this->append( $token );
         $token = $this->rewriter->next_token();
    }
    if($token->is_keyword('HIGH_PRIORITY')) {
         $token = $this->rewriter->next_token();
    }
    // ... keep going token by token, don't just skip over things like we do now
    //     with a while loop ...
    if($is_subquery) {
        $this->consume_sub_query();
    }
    // inevitably at some point:
    if($token->is_keyword('UNION')) {
        $this->consume_select_query();
    }
}

Importantly, we cannot just operate on strings and translate the MySQL query into the SQLite query. Some transformations require acting on the SQLite database and executing SELECTs, INSERTs etc. Therefore, the input may be a MySQL query string, but the output should be the query result, not the corresponding SQL query for SQLite.

@adamziel adamziel changed the title Support SELECT syntax via a state machine Support nuances of the SELECT syntax: WITH, UNION, subqueries etc. Apr 30, 2024
@adamziel adamziel added this to the Zero Crashes milestone Apr 30, 2024
@brandonpayton
Copy link
Member

Intuitively, it feels like we need a thorough but lightweight syntax tree and an interpreter that explicitly handles each case.

I've started poking at that idea but don't yet have a working example to share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants