add a confidence key #228

gnardari · 2017-03-10T19:29:31Z

I'm trying to add a confidence key to duckling's output.

{:dim :number, :body "43", :value {:type "value", :value 43}, :start 0, :end 2, :confidence 1.0}

I'm using Math/exp to convert the log probability

https://github.com/gnardari/duckling/blob/confidence/src/duckling/engine.clj#L236

I'm trying to normalize the probability with

P(d|c) / P(d) where 
P(d) = P(d|c1) + P(d|c2) + ... P(d|cn) and
P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c)

https://github.com/gnardari/duckling/blob/confidence/src/duckling/ml/naivebayes.clj#L21

but couldn't get it right. Maybe someone here could help me..

The text was updated successfully, but these errors were encountered:

justinasvd · 2017-03-10T23:05:16Z

You don't need to add the confidence key. It's simply enough not to remove the log-prob key that is already there (see select-winners in core.clj).

Besides, log-prob numerically is much nicer than plain probabilities, because products of probabilities like P(d|c) = P(x1|c) . P(x2|c) ... (Pxn|c) . P(c) can underflow rather quickly.

gnardari · 2017-03-11T12:32:51Z

I see your point, but log probabilities are not as intuitive for API users as [0,1] IMO.

justinasvd · 2017-03-12T09:53:23Z

I think that those API users who would care about confidence and -- more importantly! -- know how to use it correctly would prefer plain log-prob.

Consider a sentence "Wake me up at five am tomorrow". Duckling yields these parses:

number ("five"), log-prob: -0.14
distance ("five"), log-prob: -2.31
volume ("five"), log-prob: -2.20
temperature ("five"), log-prob: -2.29
time ("at five am tomorrow"), log-prob: -18.26

If you selected a parse simply by max(log-prob) or even max(exp(log-prob)), you would have to say that the winning parse is #1, and that the user wants to see a number. Certainly, this is incorrect. So instead of using log-prob of a whole parse, you would probably want to use another metric: some kind of measure of confidence per character. For instance, log-prob * exp(-(end - start)) would be a metric that would favor longer parses, and the parse #5 would then be winning.

Moreover, any person who would want to compute confidence, would also have to take into the account whether the parse is latent or not. More likely than not, you would want to disfavor latent parses.

Summa summarum: I don't think that you can appease all the people by adding :confidence key, so it would be best not to do it. A more constructive and more general approach would be to expose :log-prob and let the people do whatever they want with it.

gnardari · 2017-03-18T18:23:44Z

Thanks for your feedback, decided to use something close to what you suggested on my fork. Maybe someone from wit can comment on this issue since it looks like their version of Duckling running in production has a confidence key with a [0,1] interval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a confidence key #228

add a confidence key #228

gnardari commented Mar 10, 2017

justinasvd commented Mar 10, 2017

gnardari commented Mar 11, 2017

justinasvd commented Mar 12, 2017 •

edited

gnardari commented Mar 18, 2017 •

edited

add a confidence key #228

add a confidence key #228

Comments

gnardari commented Mar 10, 2017

justinasvd commented Mar 10, 2017

gnardari commented Mar 11, 2017

justinasvd commented Mar 12, 2017 • edited

gnardari commented Mar 18, 2017 • edited

justinasvd commented Mar 12, 2017 •

edited

gnardari commented Mar 18, 2017 •

edited