Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add exclusive And/Or search options #314

Open
3 tasks done
coolesding opened this issue Jul 11, 2024 · 5 comments
Open
3 tasks done

[Feature Request]: Add exclusive And/Or search options #314

coolesding opened this issue Jul 11, 2024 · 5 comments
Labels
TagStudio: Search The TagStudio search engine Type: Enhancement New feature or request

Comments

@coolesding
Copy link

Checklist

  • I am using an up-to-date version.
  • I have read the documentation.
  • I have searched existing issues.

Description

Currently, the search can only be set to either And (Includes all Tags) and Or (Includes any Tag). I just had the problem of trying to find an image that has two tags, but no others, which seemingly I have to do manually searching for all the tags I want and filtering the ones I don't want myself.

Solution

I think adding the options
Ex. Or (Exclusively includes any Tags)
Ex. And (Exclusively includes all Tags)
would be great additions to the versatility of Tagstudio!

EXAMPLE

I have a database of many images from and about the bocchi the rock anime, which among others includes the tags:

Kita
Bocchi
Nijika
Ryou

There are all combinations of images tagged, some with only one, some with multiple, some with all characters tagged.

Searching for "Kita, Bocchi" in Exclusive Or mode would result in all images that only have the kita tag, only have the bocchi tag or only have the bocchi and kita tags, and no others.

Searching for "Nijika, Ryou" in Exclusive And mode would result in all images that only have the Nijika and the Ryou tags, and no others.

Alternatives

The naming "Ex. Or" and "Ex. And" could probably be improved, but I can't think of a better solution currently (In my defense, i am writing this at 1 in the morning after having a horrible sleep)

@coolesding coolesding added the Type: Enhancement New feature or request label Jul 11, 2024
@KillyMXI
Copy link

KillyMXI commented Jul 12, 2024

This is related to #112 (as a part of larger discussion about search queries)
Also useful as a confirmation of user demand for more Boolean operators.

Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or
There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.

Truth tables:

A B AND OR XOR XNOR NAND NOR

@CyanVoxel CyanVoxel added the TagStudio: Search The TagStudio search engine label Jul 20, 2024
@samuellieberman
Copy link
Contributor

samuellieberman commented Jul 20, 2024

This is related to #112 (as a part of larger discussion about search queries) Also useful as a confirmation of user demand for more Boolean operators.

Exclusive OR is commonly written as XOR: https://en.wikipedia.org/wiki/Exclusive_or There is no such thing as Exclusive AND, but there are such operators as XNOR, NAND.

Truth tables:
A B AND OR XOR XNOR NAND NOR
❌ ❌ ❌ ❌ ❌ ✔ ✔ ✔
❌ ✔ ❌ ✔ ✔ ❌ ✔ ❌
✔ ❌ ❌ ✔ ✔ ❌ ✔ ❌
✔ ✔ ✔ ✔ ❌ ✔ ❌ ❌

@KillyMXI, I don't believe that @coolesding was referring to exclusive OR the boolean operation. As I understand it, Coolesding was hoping to exclude entries from their searches without explicitly typing out the tags they want to exclude. As I understand it, the premise is that if the only tags Coolesding explicitly types out are Kita and Bocchi, then Coolesding doesn't want entries tagged Nijika or Ryou appearing in the search.

Personally, I strongly believe that Coolesding should be able to make the library work that way, but I really dislike the idea of adding features that only work if you only have a single category of tags. Coolesding ought to be able to use non-character tags without breaking the search. For example, if Coolesding adds a "text" tag to some of the entries, then there will be no way to include both entries with text and entries without text in an "Ex. And" search.

What I would suggest is the solution is adding tags to each file for the number of characters. Eg. 1_character, 2_characters, 3_or_more_characters... This is actually something I do in my own library. Then Coolesding can perform the example "Ex. And" search with this:
Nijika Ryou 2_characters
Unfortunately I can't think of a concise way of getting "Ex. Or" to work, even if full boolean syntax is implemented. With the example given at least, boolean syntax would allow the "Ex. Or" search to be done with this:
( Kita OR Bocchi AND 1_character ) OR ( Kita AND Bocchi AND 2_characters )
But with three characters...
( Kita OR Bocchi OR Nijika AND 1_character ) OR ( Kita AND Bocchi AND 2_characters ) OR ( Kita AND Nijika AND 2_characters ) OR ( Bocchi AND Nijika AND 2_characters) OR ( Kita AND bocchi AND Nijika AND 3_characters )
Depending on the number of character tags, it would probably be easier to just manually exclude every other character. Eg.
Kita OR Bocchi OR Nijika AND NOT ( Ryou OR Gotou_Futari OR Gotou_Michiyo OR Goutou_Naoki OR [...] OR untagged_characters )
Or depending on the number of entries in the library, to just perform a preliminary search, and ignore any entries that aren't relevant.
Kita OR Bocchi OR Nijika AND ( 1_character OR 2_characters OR 3_characters )

Does anyone else have any thoughts on this issue?

@KillyMXI
Copy link

Right, I misinterpreted the issue text because I had strong and different interpretation of those terms in my head.

What booru-like systems such as Hydrus can offer to allow a search query like this:

  • namespaces, such as character:
  • wildcards, such as character:*
  • system meta tags to limit on the number of tags, system:number_of_tags=2

Number of tags within a certain namespace is tricky though. 2_characters is definitely a working workaround, but it might get annoying to maintain. There are certain problems in boorus with those numbering tags...

If we ignore the possibility of other tags, examples from OP can potentially look like this:

(Kita OR Bocchi) AND system:number_of_tags<=2
Nijika AND Ryou AND system:number_of_tags=2

But that's not a very practical assumption - it's natural to expect more tags besides characters.

Limiting the number of tags within a namespace or wildcard can be an interesting design challenge.
My momentary thought is that the Set Theory might be helpful alongside the Boolean algebra to describe this. I'll try explain it later, along with some other suggestions I had previously and relevant to this.

@samuellieberman
Copy link
Contributor

samuellieberman commented Jul 23, 2024

That's really interesting @KillyMXI. CyanVoxel actuallly has tag categories as a planned feature: https://github.com/TagStudioDev/TagStudio/blob/main/doc/library/tag_categories.md I don't know how namespaces work in Hydrus, but the concept of tag categories may be similar.

Also, your first example of (Kita OR Bocchi) AND system:number_of_tags<=2 doesn't do exactly what Coolesding asked for, since an entry with Kita and Nijika would match that search as well. Though that's not a unique problem. If you search for black_clothes shirt in a different library then that will match entries with black shirts, but it will also match non-black shirts if there are black clothes elsewhere in the image. There isn't really a solution besides creating tags for every possible combination, doing hardcore boolean reasoning, or just ignoring irrelevant entries with one's own mind.

@KillyMXI
Copy link

KillyMXI commented Jul 23, 2024

Dang, I goofed twice in one thread...

So, within the same constraints, the first example can be fixed like this:
((Kita OR Bocchi) AND system:number_of_tags=1) OR ((Kita AND Bocchi) AND system:number_of_tags=2)

I think this creates a stronger case for Set Theory.
I'm not aware of it being used the same way, so it might become a strong competitive advantage for TagStudio. But this also means low familiarity and the necessity to invent the syntax for it.

The OP examples can be formulated as following:
the set of file tags is a subset of {Kita, Bocchi}
the set of file tags is equal to {Nijika, Ryou}

This can then be improved by limiting to character tags:
the set of file tags in character namespace is a subset of {character:Kita, character:Bocchi}
the set of file tags in character namespace is equal to {character:Nijika, character:Ryou}

To make this possible, few features needed:

  • being able to define sets of literal values (such as {Kita, Bocchi} or {1, 2})
  • being able to define sets of queried values (such as a set of "all file tags" or "tags in a wildcard or a namespace" or satisfying any other predicate)
  • being able to use set operations (such as "is a subset/superset" (⊆, ⊇), "is a proper subset/superset" (⊂, ⊃), equivalence, union of sets (⋃), intersection of sets (⋂), difference of sets (\), symmetric difference (△), size of set)

What syntax can look like:

  • curly braces are universal syntax for sets
  • literals: {Kita, Bocchi} or {Kita Bocchi}
    • avoiding punctuation might be desirable, but raises the concern about allowing operator-less syntax elsewhere - AND/OR interpretation is kind of murky
  • empty set: {}
  • symbols require plain English words or equivalent Boolean operators because there are no alternative symbols present on a common keyboard
    • "{A} is a subset of {B}" = {A} in {B}
    • "{A} is a superset of {B}" = {A} includes {B} or {A} contains {B}
    • "{A} equals {B}" = {A} = {B} or {A} is {B}
    • proper (strict) subset/superset can probably be ignored as less useful in practice
    • union is equivalent to OR operation but applied to sets
    • intersection is equivalent to AND operation but applied to sets
    • difference is equivalent to subtraction, except we don't have it, closest Boolean equivalent is the combination {A} and not {B}, can probably live with that
    • symmetric difference is equivalent to XOR operation but applied to sets
    • note: if one of operands is not a set but literal, it can be made into a set (lifted) implicitly
      • need to check for possible implications of this, there might be pros and cons
  • syntax for set size might be tricky
    • (here, num_op is any of supported numeric comparators)
    • function-like syntax: size({A}) num_op 2 - would work better for prefix grammar, not so natural for infix grammar, but more function-like syntax may appear later for other features
    • property-like syntax: {A}.size num_op 2 or {A}:size num_op 2 - not like anything on the table for the grammar, so creates many questions
    • extending on reserved keywords: size_of:{A} num_op 2 - I don't like it, I'd prefer any reserved keywords be gated after their own namespace like in Hydrus, but this might look not so bad with other current proposals
    • implicit: {A} num_op 2 - no new syntax, most clean but can be somewhat obscure, makes impossible to do set operations with size
  • queried sets might be tricky
    • {character:*} - wildcard inside curly braces is what comes to mind first
    • {character:*, Ryou} - can potentially mix with literals
    • {all_tags} or {tag:*} or something else - not sure how to go about this, depends on other considerations that are outside of scope of this issue

Our examples may look like this:
{all_tags} in {Kita, Bocchi}
{all_tags} = {Nijika, Ryou}
{character:*} in {character:Kita, character:Bocchi}
{character:*} is {character:Nijika, character:Ryou}

And I overlooked one more thing:
Empty set (no tags) should be in any other set, but it is often not practical.
Here, it will also match files without tags.
Can be fixed in query like this:

{all_tags} in {Kita, Bocchi} and {all_tags} != {}
{character:*} in {character:Kita, character:Bocchi} and {character:*} != {}

But this will be a common inconvenience.
Empty set might be handy in different situations, and prohibiting it also makes the system unsound, so I don't think it is an option.
Instead, it might be practical to introduce some kind of shorthand for non-emptiness of a queried set.

Definitions of proper (strict) subset/superset does not fit this issue exactly - they work at the wrong end of it.

What is needed are variations on subset/superset operator:

  • "{A} is a non-empty subset of {B}"
  • "{A} is a superset of non-empty {B}"

Asked ChatGPT whether there is a common notation for this, there seems to be none, and ChatGPT suggests introducing custom notation, so:

{all_tags} in! {Kita, Bocchi}
{character:*} in! {character:Kita, character:Bocchi}

This is probably most unambiguous way to introduce the non-emptiness clause at the right place.
I've no idea what separate single English words can be used instead and be clean about the distinction.
This assumes there is no conflict with proper (strict) subset/superset. Even if they are not needed, may be worth to think how they might be distinguished. Maybe p_in, p_includes, or using different suffix symbols for non-emptiness and strictness.

Not really considering {A} < {B}, {A} <= {B}, {A} >= {B} and {A} > {B}, since it might be confusing what is being compared. Size comparison is more expected, so can't repurpose the same symbols.

Attaching non-emptiness condition to queried set rather than operator will create different problems, it doesn't have good behavior there.


I can't comment on Tag Categories. One sentence description gives me no understanding, without also being an active user of TagStudio currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TagStudio: Search The TagStudio search engine Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants