Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for drums/percussion? #30

Open
tripathiarpan20 opened this issue Jul 10, 2022 · 4 comments
Open

Support for drums/percussion? #30

tripathiarpan20 opened this issue Jul 10, 2022 · 4 comments
Assignees
Labels
question Further information is requested

Comments

@tripathiarpan20
Copy link

Hi!
Thanks for this amazing open-source work, I'm really enjoying using it. :)

I noticed that Basic Pitch works great for tracks with single mono/polyphonic instrument for most instruments, however it is unable to encode drums at all.

I understand that MIDI encoding for drums/percussion instruments is somewhat different compared to the rest of the instruments, but are there any future plans to add support for percussion instruments?

@jugoodma
Copy link

@tripathiarpan20 -- I found your comment interesting, so I took a short dive into the literature.

There's a niche, and interesting, sub-sub-field of Music Information Retrieval (MIR) called Automatic Drum Transcription (ADT). Here's a literature review of ADT. The authors of that review describe different "drum transcription tasks" -- drum-only transcription (DTD) and drum-plus-accompaniment transcription (DTM) seem particularly relevant.

If you want to "solve" drum encoding, you could look at some of the methods in the more recently referenced papers in the mentioned literature review and give them a try! Ref 80 appeared to have high scoring metrics, but might not work for drum kits with more than a kick, snare, and hi-hat. The authors (of ref 80) also have a GitHub repo, and a demo site linked!

For another approach, you might find https://github.com/magenta/mt3 interesting/useful. Unfortunately, the related paper doesn't focus too heavily on drums, so you might find the mt3 model doesn't work that well for drum transcription.

Finally, perhaps we could make use of Facebook's demucs. This model is seemingly SOTA for demixing audio tracks, so we can use it to separate out the drums stem of a track. This turns a DTM task into a DTD task quite effectively (and thus, in my opinion, makes solving ADT easier). Unfortunately, this somewhat disregards the call-to-action in the NMP/basic-pitch paper -- to encourage low-resource models in future research. Maybe we can trim down the demucs model? Regardless, perhaps we could then train the NMP model on a drum-specific dataset, like E-GMD. We could then compose the architectures like so:

                demucs                   NMP(E-GMD)
original track -------> drum-only track -----------> drum-only MIDI

I'll give this a try, and post on the results. Luckily, since NMP is so light it probably trains much faster than huge models, And who knows, maybe demucs isn't even needed. Or, maybe this entire approach won't work! It's all part of the scientific method 😄

@rabitt rabitt added the question Further information is requested label Jul 20, 2022
@rabitt
Copy link
Contributor

rabitt commented Jul 21, 2022

are there any future plans to add support for percussion instruments?

@tripathiarpan20 no plans at the moment, but will let you know if that changes. @jugoodma 's comment is great, and points to some open source drum transcription options. Here are two more open source systems I'm aware of:
(1) "Increasing Drum Transcription Vocabulary Using Data Synthesis" by Cartwright et. al [paper] [code]
(2) "Towards Multi-Instrument Drum Transcription" by Vogl et. al [paper] [code]

@tripathiarpan20
Copy link
Author

Hi @jugoodma and @rabitt ,
Thank you for the amazing feedbacks!

To be frank I am not familiar with how the instrument class is predicted in the NMP pipeline, but if retraining the Basic Pitch's architecture on Drum dataset for DTD along with devising the suitable posteriorgram post-processing works, I believe that it would make the domain of instruments in this project truly whole (afaik).

Good luck on the process and keep us updated :D.
The DTD task seems to be the relevant one in the context of Basic Pitch (which deals with polyphonic recordings of a single instrument class), demucs shouldn't be required given its high inference time and the availability of the E-GMD dataset & conversion to drum audio tracks with suitable soundfonts and label preserving data augmentation.

Elsewhere, I also tried demucs on Psychosocial(Slipknot) & tried to use basic-pitch on the demixed drum track, and that's how I eventually raised the issue/question. Although demucs has amazing performance, the inference times are relatively higher (typically takes minutes).

Meanwhile, perhaps Spotify could develop a lightweight demixing model which might benefit from end-to-end deep learning that uses CQT for preprocessing (rather than Mel spectrograms as in past demixing methods) in the future?
It might be bit of a stretch as my understanding of the working of spectrograms, past Demixing models & NMP has missing pieces.
I would especially like to hear @rabitt 's thoughts on the feasibility of such a lightweight demixing model and whether there would be any benefits if it is formulated as an end-to-end (demixing + transcription) task.

Any feedback from anyone else is welcome too!

@rabitt rabitt assigned rabitt and bgenchel and unassigned rabitt Mar 24, 2023
@sslupsky
Copy link

@jugoodma Did you get around to attempting retraining as described above?

            demucs                   NMP(E-GMD)

original track -------> drum-only track -----------> drum-only MIDI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants