-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat-aligned chords #403
Comments
In case it wasn't clear above, I've included some more information here that showcase what's happening with my script. chords:
beats:
So the downbeat is reported to occur at Similar results should be reproducible on any US pop song (and I suspect other types of music as well). |
If the chords are reproducibly late, an easy "solution" would be to first move it by a certain amount. Another — more sophisticated — solution would be to give the downbeats more weight to be more 'attractive' to chord changes, i.e. not to align the chords to the closest beat, but rather to the most likely (down-)beat position. You could implement something like this by only considering the downbeat positions for the coarse alignment (or shifting of the chord sequence) before doing the fine beat-level alignment. IIRC, @fdlm did something like this for last year's MIREX. Probably he can share his code/approach. |
Thanks for your response @superbock. One interesting way to think about this is to use the obtained downbeats as an evaluator for the intervals returned by the chord estimator. Using this simple script, we can compute the difference between the estimated chord change time returned by the chord estimator with the reported downbeat time (we can treat the difference as an error). From my testing, the errors appear to follow a normal distribution with a mean of 0.11 and a stddev of 0.20. Notably, there are several instances in which the chords are too early, but on average they are too late (so the proposed solution of simply shifting the chord estimates isn't likely to be very robust). The variance doesn't bother me much but what's surprising is the shifted mean - I'm not sure why the chord estimator model has learned this bias. Again, all this is assuming the downbeats themselves are correct, which I have no reason to doubt at this time. The given bias leaves me wondering if there is a flaw with the model. Instead of teaching the chord recognizer about downbeats (hard-coding features), I would have hoped that it would be able to learn the relationship through the data itself. Maybe there are other features we should be extracting? As you can probably tell, I don't have an ML background, so happy to be told that I'm approaching this the wrong way. |
Interesting observation, I never investigated if the model learned any kind of bias. One thing to consider is that the chord recognition algorithm uses 10 fps, while the downbeat tracking algorithm uses 100 fps, so we get some misalignment just because of that. If the mean misalignment is indeed 0.11s (which is close to 1 frame at 10 fps), there might be a bug somewhere that produces an off-by-one error, but we'd need do really make sure that this is the case (ideally, using downbeat ground truth). Another reason for the shift might be that there is more "noise" at downbeats (e.g. cymbals) that masks the chord transition, and the model only predicts the chord change once it is quite confident about the new chord. Again, a this needs to be investigated more thoroughly. What I did for MIREX was just to run the beat tracker, and align the chord transitions to the next beat time. Results improved a tiny bit. I will merge the code for that into madmom at some point, but do not have an ETA at the moment. I'm sure there is a way to combine the downbeat model and the chord model in some way that produces better predictions to begin with, but this is not something I want to work on right now. |
After 2 years, I found this thread as I needed to implement a beats/chords solution, first of all, thanks guys for sharing your sample code. My solution does not need perfect alignment, and the current script provided here did an amazing job for what I needed. I did some modifications to save the chords/beat data to a JSON file, and it's working fairly well. The main issue is performance. It is taking close to the song duration to process it. Do you guys have any tips/suggestions on how to optimize it? Here is my current script: from madmom.features import CNNChordFeatureProcessor, CRFChordRecognitionProcessor, RNNDownBeatProcessor, DBNDownBeatTrackingProcessor
import sys
import json
class Chord:
def __init__ (self, curr_beat_time, curr_beat, prev_chord):
self.curr_beat_time = curr_beat_time
self.curr_beat = curr_beat
self.prev_chord = prev_chord
print('processing: '+sys.argv[1]+' '+sys.argv[2])
audio_file_name = sys.argv[1]
chord_processor = CNNChordFeatureProcessor()
chord_decoder = CRFChordRecognitionProcessor()
chords = chord_decoder(chord_processor(audio_file_name))
beat_processor = RNNDownBeatProcessor()
beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
beats = beat_decoder(beat_processor(audio_file_name))
chordsArray = []
chord_idx = 0
for beat_idx in range(len(beats) - 1):
curr_beat_time, curr_beat = beats[beat_idx]
# find the corresponding chord for this beat
while chord_idx < len(chords):
chord_time, _ , _= chords[chord_idx]
prev_beat_time, _ = (0, 0) if beat_idx == 0 else beats[beat_idx - 1]
eps = (curr_beat_time - prev_beat_time) / 2
if chord_time > curr_beat_time + eps:
break
chord_idx += 1
# append to array
_, _, prev_chord = chords[chord_idx - 1]
chord = Chord(curr_beat_time, curr_beat, prev_chord)
chordsArray.append(chord)
class MyEncoder(json.JSONEncoder):
def default(self, o):
return o.__dict__
with open('/app/out/'+sys.argv[2], 'w') as outfile:
json.dump(MyEncoder().encode(chordsArray), outfile) |
First of all, it is important to determine which part is the culprit, I suspect the downbeat tracker, since it averages 8 RNNs, whereas the chord transcriptions should be relatively fast. So my recommendation is to only use a single network for downbeat tracking, just alter Besides that, please make sure that numpy uses an optimizes BLAS library. |
Thanks a lot @superbock |
@superbock I run some tests and looks like (weirdly) the chord detection is taking more time. When running this: startChord = time.time()
chords = chord_decoder(chord_processor(audio_file_name))
endChord = time.time()
print(endChord - startChord)
startBeat = time.time()
beat_processor = RNNDownBeatProcessor()
beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
beats = beat_decoder(beat_processor(audio_file_name))
endBeat = time.time()
print(endBeat - startBeat) For a 60 seconds song, the chord process is taking As you suggested, I made sure numpy was using openBLAS and it went from Any insights why chord recognition is taking more time than beats? |
Another thought: consider installing OpenCV, it has much faster convolutions and mamdmom uses them if available. |
@superbock Thanks! To anyone interested in this, I'm also working on a solution to run |
This should be quite easy to accomplish by wrapping the two processors in a |
@superbock This is how I did it using ...
def generateChords():
print('Generating chords')
chord_processor = CNNChordFeatureProcessor()
chord_decoder = CRFChordRecognitionProcessor()
chords = chord_decoder(chord_processor(audio_file_name))
np.save(chordsPath, np.array(chords))
def generateBeats():
print('Generating beats')
beat_processor = RNNDownBeatProcessor()
beat_decoder = DBNDownBeatTrackingProcessor(beats_per_bar=[4], fps=100)
beats = beat_decoder(beat_processor(audio_file_name))
np.save(beatsPath, np.array(beats))
if __name__ == "__main__":
thread1 = Process(target = generateChords)
thread2 = Process(target = generateBeats)
thread1.start()
thread2.start()
thread2.join()
thread1.join()
print('End of all threads')
... ps: Installed |
Yes, this is one way to accomplish it. It might be possible that OpenCV does not improve performance on your machine, but to be sure I suggest to check if it is used correctly. It is imported in line 697 of |
First of all, thank you for making such a great library!
I'm trying to implement some chordify functionality for myself, and one of the things I'd like to be able to do is to understand the chord progression of a song at the beat level (which chords occur on which beats).
I wrote a simple script using the separate beat detection and chord recognition features of madmom. While it works mostly great, I noticed there are often sequences like the following in the output:
Of course, in a lot of modern western music, the chord change often occurs on the downbeat. It looks like the madmom chord recognition has a bias of being too late with the chord changes (either that, or the downbeats come too early, but empirically the beat recognition seems to be correct). On US pop music, the downbeat chord is misclassified around 10% of the time.
I wonder if the same features that help madmom detect downbeats could be helpful in determining the chord intervals as well. I'm not the first to consider that downbeat detection and chord estimation might benefit from a common feature set.
Before digging deeper myself, I was curious to hear if you had any thoughts on this subject or suggestions for things I could try to improve the results.
The text was updated successfully, but these errors were encountered: