Skip to content
This repository has been archived by the owner on May 30, 2019. It is now read-only.

add a confidence key #228

Open
gnardari opened this issue Mar 10, 2017 · 4 comments
Open

add a confidence key #228

gnardari opened this issue Mar 10, 2017 · 4 comments

Comments

@gnardari
Copy link
Contributor

I'm trying to add a confidence key to duckling's output.

{:dim :number, :body "43", :value {:type "value", :value 43}, :start 0, :end 2, :confidence 1.0}

I'm using Math/exp to convert the log probability

https://github.com/gnardari/duckling/blob/confidence/src/duckling/engine.clj#L236

I'm trying to normalize the probability with

P(d|c) / P(d) where 
P(d) = P(d|c1) + P(d|c2) + ... P(d|cn) and
P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c)

https://github.com/gnardari/duckling/blob/confidence/src/duckling/ml/naivebayes.clj#L21

but couldn't get it right. Maybe someone here could help me..

@justinasvd
Copy link
Contributor

You don't need to add the confidence key. It's simply enough not to remove the log-prob key that is already there (see select-winners in core.clj).

Besides, log-prob numerically is much nicer than plain probabilities, because products of probabilities like P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c) can underflow rather quickly.

@gnardari
Copy link
Contributor Author

I see your point, but log probabilities are not as intuitive for API users as [0,1] IMO.

@justinasvd
Copy link
Contributor

justinasvd commented Mar 12, 2017

I think that those API users who would care about confidence and -- more importantly! -- know how to use it correctly would prefer plain log-prob.

Consider a sentence "Wake me up at five am tomorrow". Duckling yields these parses:

  1. number ("five"), log-prob: -0.14
  2. distance ("five"), log-prob: -2.31
  3. volume ("five"), log-prob: -2.20
  4. temperature ("five"), log-prob: -2.29
  5. time ("at five am tomorrow"), log-prob: -18.26

If you selected a parse simply by max(log-prob) or even max(exp(log-prob)), you would have to say that the winning parse is #1, and that the user wants to see a number. Certainly, this is incorrect. So instead of using log-prob of a whole parse, you would probably want to use another metric: some kind of measure of confidence per character. For instance, log-prob * exp(-(end - start)) would be a metric that would favor longer parses, and the parse #5 would then be winning.

Moreover, any person who would want to compute confidence, would also have to take into the account whether the parse is latent or not. More likely than not, you would want to disfavor latent parses.

Summa summarum: I don't think that you can appease all the people by adding :confidence key, so it would be best not to do it. A more constructive and more general approach would be to expose :log-prob and let the people do whatever they want with it.

@gnardari
Copy link
Contributor Author

gnardari commented Mar 18, 2017

Thanks for your feedback, decided to use something close to what you suggested on my fork. Maybe someone from wit can comment on this issue since it looks like their version of Duckling running in production has a confidence key with a [0,1] interval.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants