Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix to 245 issue. Using search.in query when having multiple values for the same Tag #262

Merged
merged 14 commits into from
Feb 7, 2024

Conversation

luismanez
Copy link
Contributor

Motivation and Context (Why the change? What's the scenario?)

Solving scenario described in issue #245

High level description (Approach, Design)

We check if there are multiple values for the same Tag, and if so, a search.in() filter is composed.

@dluc
Copy link
Collaborator

dluc commented Jan 23, 2024

There's a problem with special chars allowed inside tag values, e.g. with comma ,.

For instance given this scenario:

var filters = new List<MemoryFilter>
{
    new MemoryFilter().ByTag("colors", "red,white"),
    new MemoryFilter().ByTag("colors", "green,yellow"),
};

the old code produces this filter:

# tags == 'color:red,white' OR tags == 'color:green,yellow'

(tags/any(s: s eq 'color:red,white')) or (tags/any(s: s eq 'color:green,yellow'))

while the new code produces this filter which is not equivalent

# tags == 'color:red' OR tags == 'white' OR tags == 'color:green' OR tags == 'yellow'

tags/any(s: search.in(s, 'color:red,white,color:green,yellow'))

With ' (single quote) the code throws Invalid expression: Found an unbalanced bracket expression.. There are probably a bunch of special chars to escape.

@luismanez
Copy link
Contributor Author

Thanks @dluc for the testing. Let us have a look and will back to you.

@luismanez
Copy link
Contributor Author

hi @dluc
I've updated the PR. Basically, the issue happens because the default delimiter in the search.in method is ' ,' which means that any values with spaces and/or commas between them will be separated.

In the update, we are passing a different delimiter: |

As per our tests, all these special chars are working fine:

"Crazy Charz Inc. '' + - = && ! ( ) { } [ ] ^ \"\" ~ * ? : ; @ ` \\ /"

and only fails if the value has the delimiter:

"Crazy Charz Inc. '' + - = && ! ( ) { } [ ] ^ \"\" ~ * ? : ; @ ` \\ / |"

We have passed KM unit tests and everything worked. Let me know if you find any other issue.

Thanks!

@luismanez
Copy link
Contributor Author

Forgot to mention the single quote is also being escaped.

@dluc
Copy link
Collaborator

dluc commented Jan 29, 2024

unfortunately changing char doesn't address the problem, because | is a valid symbol too, allowed inside tag values. I think the solution is not about finding a new special char, but rather about escaping the tag values.

Something like

tags/any(s: search.in(s, 'color:red\,white,color:green\,yellow'))

(note the backslash)

@luismanez
Copy link
Contributor Author

I see. Thing is that we tried some escaping actions considering all these special chars:

+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

but we had some issues with some of them. We'll give it another try and let you know.

Thanks again for your help.

@luismanez
Copy link
Contributor Author

@dluc we are not able to find any escaping syntax that work fine with the search.in. I'm pretty sure the search.in function does not allow to escape the delimiter. This is what doc says about it (https://learn.microsoft.com/en-us/azure/search/search-query-odata-search-in-function):

A string where each character is treated as a separator when parsing the valueList parameter. The default value of this parameter is ' ,' which means that any values with spaces and/or commas between them will be separated. If you need to use separators other than spaces and commas because your values include those characters, you can specify alternate delimiters such as '|' in this parameter.

And there is this issue on GH / SO

Azure/azure-sdk-for-net#9828
https://stackoverflow.com/questions/60066616/azure-cognitive-search-how-to-filter-by-fields-containing-special-characters

As you can see, the response from a MS guy is considering a bunch of special chars, except the one passed as delimiter:

@"search.in(hotelName, 'Crazy Charz Inc. '' + - && ! ( ) { } [ ] ^ "" ~ * ? : \ /', '|')"

If you can find out what delimiter to use, and how to escape it, we're happy to update our code.

Thanks again.

@dluc
Copy link
Collaborator

dluc commented Jan 30, 2024

@dluc we are not able to find any escaping syntax that work fine with the search.in. I'm pretty sure the search.in function does not allow to escape the delimiter. This is what doc says about it (https://learn.microsoft.com/en-us/azure/search/search-query-odata-search-in-function):

A string where each character is treated as a separator when parsing the valueList parameter. The default value of this parameter is ' ,' which means that any values with spaces and/or commas between them will be separated. If you need to use separators other than spaces and commas because your values include those characters, you can specify alternate delimiters such as '|' in this parameter.

And there is this issue on GH / SO

Azure/azure-sdk-for-net#9828 https://stackoverflow.com/questions/60066616/azure-cognitive-search-how-to-filter-by-fields-containing-special-characters

As you can see, the response from a MS guy is considering a bunch of special chars, except the one passed as delimiter:

@"search.in(hotelName, 'Crazy Charz Inc. '' + - && ! ( ) { } [ ] ^ "" ~ * ? : \ /', '|')"

If you can find out what delimiter to use, and how to escape it, we're happy to update our code.

Thanks again.

what about a compromise?

  • if tag values contain special chars, use the old syntax with OR
  • if tag values don't contain special chars, then use the new syntax

@dluc
Copy link
Collaborator

dluc commented Jan 30, 2024

search.in allows to pass a custom separator, so we should be able to cover most scenarios searching for a valid separator at runtime, scanning all the filters. The logic would look like this:

  • define a list of valid separators
  • choose the first separator symbol, e.g. ,
  • scan all given filters, if any tag value contains , then take the next separator symbol and try again
  • if run out of options, throw exception, or fall back to old logic, splitting the query in multiple queries

There are various optimizations that can be applied, e.g. making symbol(s) configurable would be the easiest

@luismanez
Copy link
Contributor Author

hey @dluc
new update implemented following your advice. I haven't found a "not too intrusive" way of passing the desired delimiter, so I've created an array of chars in the same AzureAISearchMemory class with some special chars, and if none of them are available, fallback to the "old" way. Feel free to add any other char to the array.
Functional tests are running fine, and I've done some testing with our product and works fine too.

Thanks.

@dluc
Copy link
Collaborator

dluc commented Feb 6, 2024

There are still some bugs, I added some unit tests you can run to reproduce.

image

Two issues in particular:

  1. Special chars can occur also in tag names. The logic choosing SearchInDelimiter is checking only tag values. E.g. in test ItHandlesEdgeCase1:

    • tags/any(s: search.in(s, 'col|or:blue|col|or:white', '|')) or tags/any(s: search.in(s, 'si,ze:small|si,ze:medium', '|')) should be
    • tags/any(s: search.in(s, 'col|or:blue,col|or:white', ',')) or tags/any(s: search.in(s, 'si,ze:small|si,ze:medium', '|'))
  2. Edge cases like MemoryFilters.ByTag("color", "blue").ByTag("color", "blue") are handled incorrectly, using OR instead of AND. E.g. in test ItHandlesEdgeCase3:

    • tags/any(s: search.in(s, 'color:blue|color:blue', '|')) should be
    • (tags/any(s: s eq 'color:blue') and tags/any(s: s eq 'color:blue'))

Copy link
Collaborator

@dluc dluc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few fixes before merge.

@luismanez
Copy link
Contributor Author

Thanks @dluc
Fixes submitted. Now tests are passing:

image

Note that I've done a change to test ItHandlesEdgeCase2 as the expected value was wrong.

Hopefully all is good and we can finally merged it 😄

@dluc
Copy link
Collaborator

dluc commented Feb 7, 2024

Thank you @luismanez !

@dluc dluc merged commit 10d928b into microsoft:main Feb 7, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants