Handling words that can be different POS #958

msjonker · 2022-10-01T13:07:05Z

msjonker
Oct 1, 2022

Hi,

I'm working to extract plants and colors from sentences. There are situations where a color is named after a plant, such as lavender, ginger, lemon, etc. and I am trying to decipher that. I've dug through the code, and it seems like switches are what I'm looking for, particularly "Adj|Noun". When I apply that to the words that can be colors or plants, the end tag POS seems to be correct most times. The problem that I am running into is that I have created custom POS tags and have applied them to these words that I have added to the lexicon.

Here's an example:

const tags = {
  Orange: {
    isA: 'Color'
  },
  Purple: {
    isA: 'Color'
  },
  Color: {
    isA: 'Adjective',
  },
  ...
  Lavender: {
    isA: 'Plant'
  },
  Ginger: {
    isA: 'Plant'
  },
  Plant: {
    isA: 'Noun',
  }
}

const words = {
  lavender: ['Lavender', 'Purple'],
  ginger: ['Ginger', 'Orange'],
}

const model = {
  two: {
    switches: {
      lavender: 'Adj|Noun',
      ginger: 'Adj|Noun'
    }
  }
}

nlp.plugin({
  tags,
  words,
  model
})

If I run this against the following sentence:
This bouquet includes fresh ginger, lavender mums, and tulips.

"ginger" is tagged as: [ "Noun", "Singular" ]

"lavender" is tagged as: [ "Adjective", "Color", "Purple" ]

The base POS (noun/adjective) are absolutely correct, and that's awesome. The problem is that when Noun was detected for "ginger", it didn't also include the tags "Ginger" and "Plant", however it did apply all correct tags for "lavender".

Now, if I reverse the order of tags defined for the words and run this against the same sentence:

const words = {
  ginger: ['Orange', 'Ginger'],
  lavender: ['Purple', 'Lavender']
}

"ginger" is tagged as: [ "Noun", "Plant", "Ginger", "Singular" ]

"lavender" is tagged as: [ "Noun", "Plant", "Lavender", "Singular" ]

"ginger" has all the correct tags, but now "lavender" is detected as a noun.

It seems like maybe that's because when setTag encounters multiple tags for a word, it looks like the last one wins, and the others are lost: https://github.com/spencermountain/compromise/blob/master/src/1-one/tag/methods/setTag.js#L93

Is there some other way I can handle what I'm doing in order to get compromise to know that a word can be different tags based on the context?

spencermountain · 2022-10-03T13:07:14Z

spencermountain
Oct 3, 2022
Maintainer

hey Matt - this is a really good question.
Yeah - good catch with the internal lexicon-switch concept - that began as a small feature this winter, and has become very useful, and I'd love to push it into the public api somehow.

The switches come with a bunch of clues defining neighbour words/tags to tip the scales. We'd need to define some way to curate those.

The good news is that you can produce the same effect with a series of match/tag statements. I would do something like this, for now:

//collect all the ambiguous terms
let ambig = doc.match('(lavender|ginger)')
// lookahead/behind for whatever clues
if( ambig.before('(fresh|diced|#Plant)$').found ){ //diced lavender
  ambig.tag('Plant')
} else {
  // ...
}

clever idea to also use the Noun/Adjective tags - 'a lavender couch' should be an adjective, and '2 tbs of lavender' should be a noun, but who knows. haha.

happy to go back/forth on this, sounds like a fun problem
cheers

0 replies

msjonker · 2022-10-04T10:21:50Z

msjonker
Oct 4, 2022
Author

Hey Spencer,

Thanks for replying!

I think maybe my understanding of applying multiple tags for words may have been wrong, which led to part of my confusion. When applying multiple tags to a word, it seems like they should they be complementary (ie ['Noun', 'Singular']) as opposed to something like ['Noun', 'Adjective']. Is that correct?

In stepping through the code, I see that maybe I thought the "Adj|Noun" switch was a little more magical than it currently is, hence "lavender" switching POS above when defined in a different order.

I'll implement your suggestion and see how it goes.

Thanks again!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling words that can be different POS #958

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Handling words that can be different POS #958

msjonker Oct 1, 2022

Replies: 2 comments

spencermountain Oct 3, 2022 Maintainer

msjonker Oct 4, 2022 Author

msjonker
Oct 1, 2022

spencermountain
Oct 3, 2022
Maintainer

msjonker
Oct 4, 2022
Author