Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan is used, not Query on index #185

Open
gregmsanderson opened this issue Feb 6, 2019 · 9 comments
Open

Scan is used, not Query on index #185

gregmsanderson opened this issue Feb 6, 2019 · 9 comments
Labels
bug enhancement help wanted missing tests Feature or bug that needs failing tests to replicate it

Comments

@gregmsanderson
Copy link

Describe the bug

I have a table with a GSI that has a hash and a range.

The GSI is specified in the model, as described.

If I do a 'where' on the hash key used in the GSI, the index is not used. The raw output shows 'Scan' is picked, not 'Query'.

Schema

In the model I defined the table name etc and added the index:

* Indexes.
     * [
     *     '<simple_index_name>' => [
     *          'hash' => '<index_key>'
     *     ],
     *     '<composite_index_name>' => [
     *          'hash' => '<index_hash_key>',
     *          'range' => '<index_range_key>'
     *     ],
     * ]
     *
     * @var array
     */
    protected $dynamoDbIndexKeys = [
        'account-created_at-index' => [
            'hash' => 'account',
            'range' => 'created_at',
        ],
    ];

... the query works and it does indeed return the correct item, but when I look into how it got there with the raw query ...

Item::where('account', 'abcde')->toDynamoDbQuery()

... it shows a Scan was done. But 'account' is the hash of an index, so Query should have been used.

To try and debug myself, looking into how it works, I see the decision to use 'Query' appears to be taken in here:

/**
     * Return the raw DynamoDb query
     *
     * @param array $columns
     * @param int $limit
     * @return RawDynamoDbQuery
     */
    public function toDynamoDbQuery(
        $columns = [],
        $limit = DynamoDbQueryBuilder::MAX_LIMIT
    ) {
        $this->applyScopes();

        $this->resetExpressions();

        $op = 'Scan';
        $queryBuilder = DynamoDb::table($this->model->getTable());

        if (!empty($this->wheres)) {
            $analyzer = $this->getConditionAnalyzer();

            if ($keyConditions = $analyzer->keyConditions()) {
                $op = 'Query';
                $queryBuilder->setKeyConditionExpression($this->keyConditionExpression->parse($keyConditions));
            } ...

... and following back from there, in the Analyzer.php, the issue seems to be this line:

if (count(array_intersect($conditionKeys, $keys)) === count($keys)) {

... since - from what I see - that results in 1 === 2, which is of course not true and so the index is not used. And that explains why the query builder defaults to Scan. I'm not quite sure why it's checking the same number is in both since as I understand it, you can query a GSI using just one of the two (ie using just the hash). In the AWS example, they show doing that - using just the hash: (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.Indexes.QueryAndScan.html)

Version info

  • Laravel: 5.7
  • laravel-dynamodb: latest
@gregmsanderson
Copy link
Author

To continue this a bit further, I hacked in some lines to see what was going on in that key bit in the Analyzer.php that decides whether to use the index:

foreach ($this->model->getDynamoDbIndexKeys() as $name => $keysInfo) {
            $conditionKeys = array_pluck($this->conditions, 'column');
            $keys = array_values($keysInfo);
            
            ///// hacked in /////
            print_r($conditionKeys);
            print_r($keys);
            print_r(array_intersect($conditionKeys, $keys));
            echo count(array_intersect($conditionKeys, $keys));
            echo count($keys);
            ///// hacked in /////

            if (count(array_intersect($conditionKeys, $keys)) === count($keys)) {

... and thus showing what decision was being made internally. Since from what I can see, if that line above does not return true, the index is not used. The raw output of those various print_r and echo ...

Array
(
    [0] => account
)
Array
(
    [0] => account
    [1] => created_at
)
Array
(
    [0] => account
)
12

... so again, showing with the final "12" that the value on the left of the === is the 1, and the value on the right of the === is the 2.

So that indicates to me that wherever you have an index that has a hash and a range (so 2), but the where only uses the hash (so 1), that index won't be used. As 1 !== 2

Is that correct?

@phileon
Copy link

phileon commented Feb 11, 2019

Hi,
I often have a similar problem, in my case helped using the function withIndex().

$item = new Item();
$item->where('account', 'abcde')->withIndex('account-created_at-index')->get();

Basically the information should be unnecessary, but it helped me to solve the problem sufficiently.

@gregmsanderson
Copy link
Author

@phileon Ok, thanks for the idea. I can try that.

@baopham
Copy link
Owner

baopham commented Feb 12, 2019

@gregmsanderson love the detailed analysis! looks like the decision logic could be tweaked a bit. This would require a bit of digging in the docs again to make sure we don't cause regressions. Feel free to contribute!

@nelson6e65
Copy link
Collaborator

This is maybe related with #254 and #253

@nelson6e65 nelson6e65 added the missing tests Feature or bug that needs failing tests to replicate it label Feb 24, 2023
@excitexcite
Copy link

@phileon Greetings, I have the same problem, but withIndex() function call doesn't perform query, but scan. Could you (or anybody else) provide some information how to query table, when you have compositeKey and use only PK in query (as it shown in your example above).

@nelson6e65
Copy link
Collaborator

@excitexcite Are you using the latest version (v6.4.0)? Version 6.2.0 introduced bugfix related with this issue.

@excitexcite
Copy link

@nelson6e65 Yes, I double checked it, it's 6.4.0
image

@excitexcite
Copy link

I'd like to inform that I made some mistake while trying to make query with GSI. I configured the request more precisely with decorate() and it did work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug enhancement help wanted missing tests Feature or bug that needs failing tests to replicate it
Projects
None yet
Development

No branches or pull requests

5 participants