Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chord decoder confidence #431

Open
hpx7 opened this issue May 13, 2019 · 7 comments
Open

Chord decoder confidence #431

hpx7 opened this issue May 13, 2019 · 7 comments

Comments

@hpx7
Copy link

hpx7 commented May 13, 2019

The CRFChordRecognitionProcessor uses the Viterbi algorithm to compute the most likely chord sequence at the frame level. I'm wondering if there's a way during decoding to obtain confidence levels per frame about chord possibilities.

In particular, is there an easy way to obtain marginal probabilities per chord label from the CNNChordFeatureProcessor feature vectors? From a bit of research it looks like the forward-backward algorithm may be useful for this, but I'm not exactly sure how we would apply it here.

@hpx7
Copy link
Author

hpx7 commented May 25, 2019

So I guess we can achieve this in a frame-local way by using the learned emission probabilities in the CRF and applying a softmax function:

from madmom.models import CHORDS_CFCRF
from madmom.ml.crf import ConditionalRandomField
from madmom.features import CNNChordFeatureProcessor
import numpy as np

def softmax(x):
  return np.exp(x) / sum(np.exp(x))

crf = ConditionalRandomField.load(CHORDS_CFCRF[0])
feats = CNNChordFeatureProcessor()('audio-file.wav')
probabilities = [softmax(np.dot(feat, crf.W)) for feat in feats]

However if we wanted to take the transition probabilities etc into account we would probably need to apply the forward-backward algorithm. Is this something that could be added to the ConditionalRandomField class?

@superbock
Copy link
Collaborator

Ping @fdlm since he is the chord/CRF guy.

@fdlm
Copy link
Contributor

fdlm commented May 27, 2019

Yes, this is correct. If you want to consider transition probabilities, you need the forward-backward algorithm, and yes, it could be added to the ConditionalRandomField class, if anyone is willing to implement it 😃

@hpx7
Copy link
Author

hpx7 commented May 28, 2019

I took a stab at partial implementation (master...hpx7:patch-1) but I'm stuck on a few things I haven't been able to figure out yet:

  1. I'm not sure how to properly apply the bias and final potentials
  2. I couldn't get it to return reasonable results until I realized that the transition probabilities in the CRF (self.A) were not normalized. After applying a softmax it started behaving more normally. Am I doing something wrong by normalizing here? Maybe this is a difference between HMMs and CRFs that I'm missing?
  3. The algorithm appears to be suffering from underflow problems for long sequences. Is there something you did for the Viterbi implementation to help avoid this? (I notice the uses of addition instead of multiplication, which was confusing to me, is that related somehow?)

Sorry for all the questions, I'm just posting them as I work through understanding all this as it's a new area for me.

@fdlm
Copy link
Contributor

fdlm commented Jun 15, 2019

  1. add the bias at every step, add the final potential at the last step.
  2. self.A is not a transition probability matrix, but a transition potential matrix. Potentials are not normalized in any way. The model behaving "more normally" might be a coincidence, but there is probably another bug in the code.
  3. Yes, it does. When computing the Viterbi path, you can operate in log-space (hence + and not *). For the forward/backward algorithm, you usually normalize at every step and keep track of the normalization constant to get the correct final answer in the end.

You can check out my CRF code for Theano here: https://github.com/fdlm/Spaghetti/blob/master/spaghetti/layers.py (_get_forward_output_for)
Maybe it helps you.

@hpx7
Copy link
Author

hpx7 commented Aug 6, 2019

Thanks for the pointers @fdlm. I got some time to look into this more.

The only way I could figure out how to normalize at each step was to apply the softmax function (since we have to account for negative potentials). However I'm not sure how to calculate a normalization constant with softmax and apply it to correct the answer at the end.

I've gotten rid of softmaxing self.A and am incorporating the bias and final potentials. The results are looking mostly reasonable now (comparable to the viterbi output) but I'm not sure what the best way to verify that the implementation is correct or not.

You can see my changes here: master...hpx7:patch-1. Is there anything you'd like to see prior to me making a PR?

@fdlm
Copy link
Contributor

fdlm commented Sep 17, 2019

Hey, thanks for your work. The best way to verify that your implementation is correct is to test it against an existing one. Since the chord recognition model you're using was actually trained using the code here: https://github.com/fdlm/Spaghetti/blob/master/spaghetti/layers.py and only later translated to a pure Python/Numpy implementation, this would be the way to go. I understand that it's probably a lot of work, especially to get Theano running again :), but to be honest, I'm not willing to merge code that I'm not 100% sure it's correct. I still believe there's something wrong with the softmaxing, but I don't have time to debug it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants