Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beat-aligned chords #403

Open
hpx7 opened this issue Dec 26, 2018 · 13 comments
Open

Beat-aligned chords #403

hpx7 opened this issue Dec 26, 2018 · 13 comments

Comments

@hpx7
Copy link

hpx7 commented Dec 26, 2018

First of all, thank you for making such a great library!

I'm trying to implement some chordify functionality for myself, and one of the things I'd like to be able to do is to understand the chord progression of a song at the beat level (which chords occur on which beats).

I wrote a simple script using the separate beat detection and chord recognition features of madmom. While it works mostly great, I noticed there are often sequences like the following in the output:

9.13 1.0 C#:min
9.68 2.0 C#:min
10.24 3.0 C#:min
10.8 4.0 C#:min

11.32 1.0 C#:min
11.89 2.0 E:maj
12.47 3.0 E:maj
13.04 4.0 E:maj

13.61 1.0 E:maj
14.16 2.0 B:maj
14.69 3.0 B:maj
15.24 4.0 B:maj

15.83 1.0 B:maj
16.41 2.0 A:maj
16.99 3.0 A:maj
17.52 4.0 A:maj

Of course, in a lot of modern western music, the chord change often occurs on the downbeat. It looks like the madmom chord recognition has a bias of being too late with the chord changes (either that, or the downbeats come too early, but empirically the beat recognition seems to be correct). On US pop music, the downbeat chord is misclassified around 10% of the time.

I wonder if the same features that help madmom detect downbeats could be helpful in determining the chord intervals as well. I'm not the first to consider that downbeat detection and chord estimation might benefit from a common feature set.

Before digging deeper myself, I was curious to hear if you had any thoughts on this subject or suggestions for things I could try to improve the results.

@hpx7
Copy link
Author

hpx7 commented Dec 26, 2018

In case it wasn't clear above, I've included some more information here that showcase what's happening with my script.

chords:

9.300	11.700	C#:min
11.700	13.900	E:maj
13.900	16.200	B:maj

beats:

9.130	1
9.680	2
10.240	3
10.800	4
11.320	1
11.890	2
12.470	3
13.040	4
13.610	1

So the downbeat is reported to occur at 11.32 but the chord change to E:maj is reported to occur at 11.70 (which is closer to beat 2 than beat 1). Similarly, the downbeat is reported to occur at 13.61 but the chord change to B:maj is reported to occur at 13.90.

Similar results should be reproducible on any US pop song (and I suspect other types of music as well).

@superbock
Copy link
Collaborator

If the chords are reproducibly late, an easy "solution" would be to first move it by a certain amount.

Another — more sophisticated — solution would be to give the downbeats more weight to be more 'attractive' to chord changes, i.e. not to align the chords to the closest beat, but rather to the most likely (down-)beat position. You could implement something like this by only considering the downbeat positions for the coarse alignment (or shifting of the chord sequence) before doing the fine beat-level alignment.

IIRC, @fdlm did something like this for last year's MIREX. Probably he can share his code/approach.

@hpx7
Copy link
Author

hpx7 commented Dec 26, 2018

Thanks for your response @superbock.

One interesting way to think about this is to use the obtained downbeats as an evaluator for the intervals returned by the chord estimator. Using this simple script, we can compute the difference between the estimated chord change time returned by the chord estimator with the reported downbeat time (we can treat the difference as an error).

From my testing, the errors appear to follow a normal distribution with a mean of 0.11 and a stddev of 0.20. Notably, there are several instances in which the chords are too early, but on average they are too late (so the proposed solution of simply shifting the chord estimates isn't likely to be very robust).

The variance doesn't bother me much but what's surprising is the shifted mean - I'm not sure why the chord estimator model has learned this bias. Again, all this is assuming the downbeats themselves are correct, which I have no reason to doubt at this time.

The given bias leaves me wondering if there is a flaw with the model. Instead of teaching the chord recognizer about downbeats (hard-coding features), I would have hoped that it would be able to learn the relationship through the data itself. Maybe there are other features we should be extracting?

As you can probably tell, I don't have an ML background, so happy to be told that I'm approaching this the wrong way.

@fdlm
Copy link
Contributor

fdlm commented Dec 27, 2018

Interesting observation, I never investigated if the model learned any kind of bias. One thing to consider is that the chord recognition algorithm uses 10 fps, while the downbeat tracking algorithm uses 100 fps, so we get some misalignment just because of that. If the mean misalignment is indeed 0.11s (which is close to 1 frame at 10 fps), there might be a bug somewhere that produces an off-by-one error, but we'd need do really make sure that this is the case (ideally, using downbeat ground truth).

Another reason for the shift might be that there is more "noise" at downbeats (e.g. cymbals) that masks the chord transition, and the model only predicts the chord change once it is quite confident about the new chord. Again, a this needs to be investigated more thoroughly.

What I did for MIREX was just to run the beat tracker, and align the chord transitions to the next beat time. Results improved a tiny bit. I will merge the code for that into madmom at some point, but do not have an ETA at the moment.

I'm sure there is a way to combine the downbeat model and the chord model in some way that produces better predictions to begin with, but this is not something I want to work on right now.

@geraldoramos
Copy link

geraldoramos commented Sep 29, 2020

After 2 years, I found this thread as I needed to implement a beats/chords solution, first of all, thanks guys for sharing your sample code.

My solution does not need perfect alignment, and the current script provided here did an amazing job for what I needed.

I did some modifications to save the chords/beat data to a JSON file, and it's working fairly well. The main issue is performance. It is taking close to the song duration to process it. Do you guys have any tips/suggestions on how to optimize it?

Here is my current script:

from madmom.features import CNNChordFeatureProcessor, CRFChordRecognitionProcessor, RNNDownBeatProcessor, DBNDownBeatTrackingProcessor
import sys
import json

class Chord:  
    def __init__ (self, curr_beat_time, curr_beat, prev_chord):  
         self.curr_beat_time   = curr_beat_time  
         self.curr_beat    = curr_beat  
         self.prev_chord = prev_chord  

print('processing: '+sys.argv[1]+' '+sys.argv[2])        

audio_file_name = sys.argv[1]
chord_processor = CNNChordFeatureProcessor()
chord_decoder = CRFChordRecognitionProcessor()
chords = chord_decoder(chord_processor(audio_file_name))

beat_processor = RNNDownBeatProcessor()
beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
beats = beat_decoder(beat_processor(audio_file_name))

chordsArray = []
chord_idx = 0
for beat_idx in range(len(beats) - 1):
  curr_beat_time, curr_beat = beats[beat_idx]

  # find the corresponding chord for this beat
  while chord_idx < len(chords):
    chord_time, _ , _= chords[chord_idx]
    prev_beat_time, _ = (0, 0) if beat_idx == 0 else beats[beat_idx - 1]
    eps = (curr_beat_time - prev_beat_time) / 2
    if chord_time > curr_beat_time + eps:
      break
    chord_idx += 1

  # append to array
  _, _, prev_chord = chords[chord_idx - 1]
  chord = Chord(curr_beat_time, curr_beat, prev_chord)
  chordsArray.append(chord)

class MyEncoder(json.JSONEncoder):
        def default(self, o):
            return o.__dict__    

with open('/app/out/'+sys.argv[2], 'w') as outfile:
    json.dump(MyEncoder().encode(chordsArray), outfile)

@superbock
Copy link
Collaborator

First of all, it is important to determine which part is the culprit, I suspect the downbeat tracker, since it averages 8 RNNs, whereas the chord transcriptions should be relatively fast. So my recommendation is to only use a single network for downbeat tracking, just alter DOWNBEATS_BLSTM of RNNDownBeatProcessor in line 91 to contain only the network(s) which work best for you.

Besides that, please make sure that numpy uses an optimizes BLAS library.

@geraldoramos
Copy link

Thanks a lot @superbock

@geraldoramos
Copy link

@superbock I run some tests and looks like (weirdly) the chord detection is taking more time.

When running this:

startChord = time.time()
chords = chord_decoder(chord_processor(audio_file_name))
endChord = time.time()
print(endChord - startChord)

startBeat = time.time()
beat_processor = RNNDownBeatProcessor()
beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
beats = beat_decoder(beat_processor(audio_file_name))
endBeat = time.time()
print(endBeat - startBeat)

For a 60 seconds song, the chord process is taking 22.4seconds and the beat process is taking 9.6 with a total of ~32 seconds for total execution. This is a bit shy of 2x of song duration. I'm running this in a docker with 1 CPU and 8gb of ram. Running in a docker limited to 4CPU made no difference, so I guess the process does not leverage multiple processors.

As you suggested, I made sure numpy was using openBLAS and it went from ~37s total to ~32s, which is a good improvement already, thanks a lot!

Any insights why chord recognition is taking more time than beats?

@superbock
Copy link
Collaborator

Another thought: consider installing OpenCV, it has much faster convolutions and mamdmom uses them if available.

@geraldoramos
Copy link

@superbock Thanks!

To anyone interested in this, I'm also working on a solution to run chords and beats in parallel.

@superbock
Copy link
Collaborator

This should be quite easy to accomplish by wrapping the two processors in a ParallelProcessor.

@geraldoramos
Copy link

geraldoramos commented Oct 3, 2020

@superbock This is how I did it using Process from the lib multiprocessing. Improved from ~35s to ~22s.

...
def generateChords():
    print('Generating chords')
    chord_processor = CNNChordFeatureProcessor()
    chord_decoder = CRFChordRecognitionProcessor()
    chords = chord_decoder(chord_processor(audio_file_name))
    np.save(chordsPath, np.array(chords))

def generateBeats():
    print('Generating beats')
    beat_processor = RNNDownBeatProcessor()
    beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
    beats = beat_decoder(beat_processor(audio_file_name))
    np.save(beatsPath, np.array(beats))

if __name__ == "__main__":
    thread1 = Process(target = generateChords)
    thread2 = Process(target = generateBeats)
    thread1.start()
    thread2.start()
    thread2.join()
    thread1.join()
    print('End of all threads')
...

ps: Installed libopencv-dev and python3-opencv but it did not change the performance.

@superbock
Copy link
Collaborator

Yes, this is one way to accomplish it. It might be possible that OpenCV does not improve performance on your machine, but to be sure I suggest to check if it is used correctly. It is imported in line 697 of ml.nn.layers, but depending on the way you installed madmom, this module might be cythonized. If so, you can delete the .so/.dll file to manually check and re-cythonize/re-install madmom afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants