Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyin seems "slow", maybe add PWVD? #1798

Closed
drscotthawley opened this issue Jan 17, 2024 · 7 comments
Closed

pyin seems "slow", maybe add PWVD? #1798

drscotthawley opened this issue Jan 17, 2024 · 7 comments
Labels
question Issues asking for help doing something

Comments

@drscotthawley
Copy link

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

HI Brian et al! librosa is amazing.
It seems like the pyin f0 estimator takes a "long" time, like around 4 seconds for about 30 seconds of low-SR audio. Like, all manner of spectrogram calculations and beat analyses can be completed in a fraction of a second, then the f0 estimation just...slogs.
Is this...normal?

Describe the solution you'd like

I'd like it to run faster, somehow! Perhaps there are suggestions already that I'm missing? e.g. maybe I need to pad my waveform until it's got a power 2 number of samples or something?

I see that you already did some work on accelerating pyin and yin, but...to me it's still taking longer than I'd prefer. (Not that I have competing code that's any faster, mind you! )

  • Could it be parallelized somehow?
  • What about implementing a PWVD method, e.g. in "A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution", https://arxiv.org/abs/2210.15272?
    (I looked for a Python implementation but haven't found one yet.)

Describe alternatives you've considered

Um...not sure but I'm looking around and checking the internet. Maybe there are pretrained deep learning models that can run fast in inference mode on CPUs? ....Not sure. Seems the core of librosa is classic signal processing, run on the CPU.

Additional context

@drscotthawley
Copy link
Author

I'm mainly interested in vocals, so maybe I'll check out this method from 2009 or any updates thereof...
https://www.aes.org/e-lib/browse.cfm?elib=15165
Really I'm a newb in this domain!

@bmcfee bmcfee added the question Issues asking for help doing something label Jan 17, 2024
@bmcfee
Copy link
Member

bmcfee commented Jan 17, 2024

Like, all manner of spectrogram calculations and beat analyses can be completed in a fraction of a second, then the f0 estimation just...slogs.
Is this...normal?

Kind of, yes. pyin is inherently sequential, since it's doing a viterbi decode over the frame-wise pitch likelihoods. This creates a pretty severe computational bottleneck that we've done our best to optimize around, but there's a limit to how far you can push it.

I'd like it to run faster, somehow! Perhaps there are suggestions already that I'm missing? e.g. maybe I need to pad my waveform until it's got a power 2 number of samples or something?

Nah, padding won't help here - modern FFT implementations aren't so sensitive to that, and as I mentioned before, the bottleneck is mostly due to the sequential decode.

What can help here is if you can restrict the range of pitches that you're searching over. Do you know something about the plausible f0 range of your data? The smaller you can make that, the fewer output states the model will need, and the faster it will go.

Could it be parallelized somehow?

It's already parallelized / vectorized in all the places that made sense to do so.

Maybe there are pretrained deep learning models that can run fast in inference mode on CPUs?

CREPE is probably what you want, as long as your data is near-enough in distribution to crepe's training set.

@drscotthawley
Copy link
Author

Thanks for your thoughtful and detailed response, Brian! I'll try limiting my frequency range and/or try CREPE.
We can close this now or...whatever. Your call.

@drscotthawley
Copy link
Author

drscotthawley commented Jan 17, 2024

Oh, yeah, restricting the range to span C2 to C5 seems to have cut the execution time significantly (compared to C2-to-C7)! :-) Great.
I'ma close this.

@lostanlen
Copy link
Contributor

Hello @drscotthawley. For future reference, also consider Basic Pitch
https://github.com/spotify/basic-pitch (Apache License)
It is faster than CREPE and has the advantage of accommodating multipitch estimation. It jointly predicts onsets, MIDI notes, and f0 contours.

Presented here by @rabitt https://www.youtube.com/watch?v=sGwiuGvHz0o

@drscotthawley
Copy link
Author

@lostanlen Thanks but interestingly it's because Basic Pitch wasn't detecting vocal pitches properly for me (cf. spotify/basic-pitch#103) that I'm looking at other options. Basic Pitch would leave large gaps where no f0 was predicted even though there was a continuous, isolated vocal melody present.
I haven't actually compared the results for the different libraries yet, but so far librosa.pyin does predict continuous melody in a plausible way, unlike what I got with Basic Pitch.

@bmcfee
Copy link
Member

bmcfee commented Jan 18, 2024

If you've got isolated vocals and just need to track a single f0, pyin ought to be quite good. If you don't have isolated vocals, you'd probably do well enough to run it through ht-demucs first and then track the estimated vocal line. Voicing estimates might suffer a bit, but the f0 should be pretty good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issues asking for help doing something
Development

No branches or pull requests

3 participants