Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't fulltext search against reference column of varchar vector #674

Closed
komainu8 opened this issue Mar 1, 2024 · 1 comment · Fixed by groonga/groonga#1780
Closed
Labels

Comments

@komainu8
Copy link
Member

komainu8 commented Mar 1, 2024

What happend?

Mroonga returns no record when we send the following query.
https://github.com/mroonga/mroonga/actions/runs/8107720441/job/22160239499#step:5:1983

CREATE TABLE tags (
  name VARCHAR(64) PRIMARY KEY
) DEFAULT CHARSET=utf8mb4
  COLLATE=utf8mb4_bin
  COMMENT='default_tokenizer "TokenDelimit"';

CREATE TABLE bugs (
  id INT UNSIGNED PRIMARY KEY,
  tags VARCHAR(40) COMMENT 'type "tags", flags "COLUMN_VECTOR"',
  FULLTEXT INDEX bugs_tags_index (tags) COMMENT 'table "tags"'
) DEFAULT CHARSET=utf8mb4;

INSERT INTO bugs (id, tags) VALUES (1, "Linux MySQL");
INSERT INTO bugs (id, tags) VALUES (2, "MySQL groonga");
INSERT INTO bugs (id, tags) VALUES (3, "mroonga");

SELECT *
       FROM bugs
       WHERE MATCH (tags) AGAINST ("MySQL" IN BOOLEAN MODE);
id	tags

How to reproduce it

mkdir build-dir/mysql-8.0.36
cd build-dir/mysql-8.0.36
cmake ../../mysql-8.0.36-source \
  -DCMAKE_INSTALL_PREFIX=/tmp/local \
  -DWITH_DEBUG=yes \
  -DWITH_BOOST=../../mysql-8.0.36-source/boost
make -j$(nproc); make install
cd mroonga
./autgen.sh
./configure \
  --prefix=/tmp/local \
  --with-debug \
  --with-cutter-source-path=../cutter \
  --with-groonga-source-path=../groonga \
  --with-mysql-source=../mysql-8.0.36-source \
  --with-mysql-build=../build-dir/mysql-8.0.36 \
  --with-mysql-config=/tmp/local/bin/mysql_config \
  PKG_CONFIG_PATH=/tmp/local/lib/pkgconfig
make; make install

./test/run-sql-test.sh mysql-test/mroonga/storage/fulltext/groonga/t/varchar_vector.test

Expected behavior

CREATE TABLE tags (
  name VARCHAR(64) PRIMARY KEY
) DEFAULT CHARSET=utf8mb4
  COLLATE=utf8mb4_bin
  COMMENT='default_tokenizer "TokenDelimit"';

CREATE TABLE bugs (
  id INT UNSIGNED PRIMARY KEY,
  tags VARCHAR(40) COMMENT 'type "tags", flags "COLUMN_VECTOR"',
  FULLTEXT INDEX bugs_tags_index (tags) COMMENT 'table "tags"'
) DEFAULT CHARSET=utf8mb4;

INSERT INTO bugs (id, tags) VALUES (1, "Linux MySQL");
INSERT INTO bugs (id, tags) VALUES (2, "MySQL groonga");
INSERT INTO bugs (id, tags) VALUES (3, "mroonga");

SELECT *
       FROM bugs
       WHERE MATCH (tags) AGAINST ("MySQL" IN BOOLEAN MODE);
 id	tags
 1	Linux MySQL
 2	MySQL groonga

Environment

  • OS: Debian
  • OS Version: Debian 12
  • DB: MySQL
  • DB Version: 8.0.36
  • Mroonga Version: main
  • libgroonga Version: main

Additional context

No response

@komainu8 komainu8 added the Bug label Mar 1, 2024
@komainu8
Copy link
Member Author

komainu8 commented Mar 1, 2024

Groonga's table

table_create bugs TABLE_PAT_KEY UInt32
column_create bugs id COLUMN_SCALAR UInt32

table_create mroonga_operations TABLE_NO_KEY
column_create mroonga_operations record COLUMN_SCALAR UInt32
column_create mroonga_operations table COLUMN_SCALAR ShortText
column_create mroonga_operations type COLUMN_SCALAR ShortText

table_create tags TABLE_PAT_KEY ShortText --default_tokenizer TokenDelimit
column_create tags name COLUMN_SCALAR ShortText

column_create bugs tags COLUMN_VECTOR tags

load --table tags
[
["_key","name"],
["Linux",""],
["MySQL",""],
["groonga",""],
["mroonga",""]
]

load --table bugs
[
["_key","id","tags"],
[1,1,["Linux","MySQL"]],
[2,2,["MySQL","groonga"]],
[3,3,["mroonga"]]
]

column_create tags bugs_tags_index COLUMN_INDEX|WITH_POSITION bugs tags

select --table bugs --query tags:@MySQL
[[0,1709283617.656619,0.02607202529907227],[[[0],[["_id","UInt32"],["_key","UInt32"],["id","UInt32"],["tags","tags"]]]]]

select --table bugs --filter 'tags @ "MySQL"'
[[0,1709283928.190391,0.0006036758422851562],[[[0],[["_id","UInt32"],["_key","UInt32"],["id","UInt32"],["tags","tags"]]]]]

kou added a commit to groonga/groonga that referenced this issue May 21, 2024
Fix mroonga/mroonga#674.

## What happend?

Groonga always returns that the number of hits is 0 when we execute
full-text search against reference column of vector as below. Note that
a lexicon (`tags` in this case) must have a tokenizer.

```
table_create bugs TABLE_PAT_KEY UInt32

table_create tags TABLE_PAT_KEY ShortText --default_tokenizer TokenDelimit
column_create tags name COLUMN_SCALAR ShortText

column_create bugs tags COLUMN_VECTOR tags

load --table bugs
[
["_key", "tags"],
[1, "Linux MySQL"],
[2, "MySQL Groonga"],
[3, "Mroonga"]
]

column_create tags bugs_tags_index COLUMN_INDEX bugs tags

select --table bugs --filter 'tags @ "MySQL"'
[[0,0.0,0.0],[[[0],[["_id","UInt32"],["_key","UInt32"],["tags","tags"]]]]]
```

## Solution

In this PR, If the source of index was reference column, Groonga get
`key_type` from a table that the reference column refer.

This problem occurs from 535844b.
If the source of index was reference column, Groonga couldn't get `key`
and `key_type` correctly in the above modification.

---------

Co-authored-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant