-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyin seems "slow", maybe add PWVD? #1798
Comments
I'm mainly interested in vocals, so maybe I'll check out this method from 2009 or any updates thereof... |
Kind of, yes. pyin is inherently sequential, since it's doing a viterbi decode over the frame-wise pitch likelihoods. This creates a pretty severe computational bottleneck that we've done our best to optimize around, but there's a limit to how far you can push it.
Nah, padding won't help here - modern FFT implementations aren't so sensitive to that, and as I mentioned before, the bottleneck is mostly due to the sequential decode. What can help here is if you can restrict the range of pitches that you're searching over. Do you know something about the plausible f0 range of your data? The smaller you can make that, the fewer output states the model will need, and the faster it will go.
It's already parallelized / vectorized in all the places that made sense to do so.
CREPE is probably what you want, as long as your data is near-enough in distribution to crepe's training set. |
Thanks for your thoughtful and detailed response, Brian! I'll try limiting my frequency range and/or try CREPE. |
Oh, yeah, restricting the range to span C2 to C5 seems to have cut the execution time significantly (compared to C2-to-C7)! :-) Great. |
Hello @drscotthawley. For future reference, also consider Basic Pitch Presented here by @rabitt https://www.youtube.com/watch?v=sGwiuGvHz0o |
@lostanlen Thanks but interestingly it's because Basic Pitch wasn't detecting vocal pitches properly for me (cf. spotify/basic-pitch#103) that I'm looking at other options. Basic Pitch would leave large gaps where no f0 was predicted even though there was a continuous, isolated vocal melody present. |
If you've got isolated vocals and just need to track a single f0, pyin ought to be quite good. If you don't have isolated vocals, you'd probably do well enough to run it through ht-demucs first and then track the estimated vocal line. Voicing estimates might suffer a bit, but the f0 should be pretty good. |
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
HI Brian et al! librosa is amazing.
It seems like the pyin f0 estimator takes a "long" time, like around 4 seconds for about 30 seconds of low-SR audio. Like, all manner of spectrogram calculations and beat analyses can be completed in a fraction of a second, then the f0 estimation just...slogs.
Is this...normal?
Describe the solution you'd like
I'd like it to run faster, somehow! Perhaps there are suggestions already that I'm missing? e.g. maybe I need to pad my waveform until it's got a power 2 number of samples or something?
I see that you already did some work on accelerating pyin and yin, but...to me it's still taking longer than I'd prefer. (Not that I have competing code that's any faster, mind you! )
(I looked for a Python implementation but haven't found one yet.)
Describe alternatives you've considered
Um...not sure but I'm looking around and checking the internet. Maybe there are pretrained deep learning models that can run fast in inference mode on CPUs? ....Not sure. Seems the core of librosa is classic signal processing, run on the CPU.
Additional context
The text was updated successfully, but these errors were encountered: