Replies: 2 comments
-
hey Matt - this is a really good question. The switches come with a bunch of clues defining neighbour words/tags to tip the scales. We'd need to define some way to curate those. The good news is that you can produce the same effect with a series of match/tag statements. I would do something like this, for now: //collect all the ambiguous terms
let ambig = doc.match('(lavender|ginger)')
// lookahead/behind for whatever clues
if( ambig.before('(fresh|diced|#Plant)$').found ){ //diced lavender
ambig.tag('Plant')
} else {
// ...
} clever idea to also use the Noun/Adjective tags - 'a lavender couch' should be an adjective, and '2 tbs of lavender' should be a noun, but who knows. haha. happy to go back/forth on this, sounds like a fun problem |
Beta Was this translation helpful? Give feedback.
-
Hey Spencer, Thanks for replying! I think maybe my understanding of applying multiple tags for words may have been wrong, which led to part of my confusion. When applying multiple tags to a word, it seems like they should they be complementary (ie ['Noun', 'Singular']) as opposed to something like ['Noun', 'Adjective']. Is that correct? In stepping through the code, I see that maybe I thought the "Adj|Noun" switch was a little more magical than it currently is, hence "lavender" switching POS above when defined in a different order. I'll implement your suggestion and see how it goes. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hi,
I'm working to extract plants and colors from sentences. There are situations where a color is named after a plant, such as lavender, ginger, lemon, etc. and I am trying to decipher that. I've dug through the code, and it seems like switches are what I'm looking for, particularly "Adj|Noun". When I apply that to the words that can be colors or plants, the end tag POS seems to be correct most times. The problem that I am running into is that I have created custom POS tags and have applied them to these words that I have added to the lexicon.
Here's an example:
If I run this against the following sentence:
This bouquet includes fresh ginger, lavender mums, and tulips.
"ginger" is tagged as:
[ "Noun", "Singular" ]
"lavender" is tagged as:
[ "Adjective", "Color", "Purple" ]
The base POS (noun/adjective) are absolutely correct, and that's awesome. The problem is that when Noun was detected for "ginger", it didn't also include the tags "Ginger" and "Plant", however it did apply all correct tags for "lavender".
Now, if I reverse the order of tags defined for the words and run this against the same sentence:
"ginger" is tagged as:
[ "Noun", "Plant", "Ginger", "Singular" ]
"lavender" is tagged as:
[ "Noun", "Plant", "Lavender", "Singular" ]
"ginger" has all the correct tags, but now "lavender" is detected as a noun.
It seems like maybe that's because when setTag encounters multiple tags for a word, it looks like the last one wins, and the others are lost: https://github.com/spencermountain/compromise/blob/master/src/1-one/tag/methods/setTag.js#L93
Is there some other way I can handle what I'm doing in order to get compromise to know that a word can be different tags based on the context?
Beta Was this translation helpful? Give feedback.
All reactions