pyin seems "slow", maybe add PWVD? #1798

drscotthawley · 2024-01-17T01:33:27Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

HI Brian et al! librosa is amazing.
It seems like the pyin f0 estimator takes a "long" time, like around 4 seconds for about 30 seconds of low-SR audio. Like, all manner of spectrogram calculations and beat analyses can be completed in a fraction of a second, then the f0 estimation just...slogs.
Is this...normal?

Describe the solution you'd like

I'd like it to run faster, somehow! Perhaps there are suggestions already that I'm missing? e.g. maybe I need to pad my waveform until it's got a power 2 number of samples or something?

I see that you already did some work on accelerating pyin and yin, but...to me it's still taking longer than I'd prefer. (Not that I have competing code that's any faster, mind you! )

Could it be parallelized somehow?
What about implementing a PWVD method, e.g. in "A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution", https://arxiv.org/abs/2210.15272?
(I looked for a Python implementation but haven't found one yet.)

Describe alternatives you've considered

Um...not sure but I'm looking around and checking the internet. Maybe there are pretrained deep learning models that can run fast in inference mode on CPUs? ....Not sure. Seems the core of librosa is classic signal processing, run on the CPU.

Additional context

drscotthawley · 2024-01-17T01:35:35Z

I'm mainly interested in vocals, so maybe I'll check out this method from 2009 or any updates thereof...
https://www.aes.org/e-lib/browse.cfm?elib=15165
Really I'm a newb in this domain!

bmcfee · 2024-01-17T02:03:05Z

Like, all manner of spectrogram calculations and beat analyses can be completed in a fraction of a second, then the f0 estimation just...slogs.
Is this...normal?

Kind of, yes. pyin is inherently sequential, since it's doing a viterbi decode over the frame-wise pitch likelihoods. This creates a pretty severe computational bottleneck that we've done our best to optimize around, but there's a limit to how far you can push it.

I'd like it to run faster, somehow! Perhaps there are suggestions already that I'm missing? e.g. maybe I need to pad my waveform until it's got a power 2 number of samples or something?

Nah, padding won't help here - modern FFT implementations aren't so sensitive to that, and as I mentioned before, the bottleneck is mostly due to the sequential decode.

What can help here is if you can restrict the range of pitches that you're searching over. Do you know something about the plausible f0 range of your data? The smaller you can make that, the fewer output states the model will need, and the faster it will go.

Could it be parallelized somehow?

It's already parallelized / vectorized in all the places that made sense to do so.

Maybe there are pretrained deep learning models that can run fast in inference mode on CPUs?

CREPE is probably what you want, as long as your data is near-enough in distribution to crepe's training set.

drscotthawley · 2024-01-17T02:09:51Z

Thanks for your thoughtful and detailed response, Brian! I'll try limiting my frequency range and/or try CREPE.
We can close this now or...whatever. Your call.

drscotthawley · 2024-01-17T02:16:43Z

Oh, yeah, restricting the range to span C2 to C5 seems to have cut the execution time significantly (compared to C2-to-C7)! :-) Great.
I'ma close this.

lostanlen · 2024-01-18T08:32:29Z

Hello @drscotthawley. For future reference, also consider Basic Pitch
https://github.com/spotify/basic-pitch (Apache License)
It is faster than CREPE and has the advantage of accommodating multipitch estimation. It jointly predicts onsets, MIDI notes, and f0 contours.

Presented here by @rabitt https://www.youtube.com/watch?v=sGwiuGvHz0o

drscotthawley · 2024-01-18T08:42:23Z

@lostanlen Thanks but interestingly it's because Basic Pitch wasn't detecting vocal pitches properly for me (cf. spotify/basic-pitch#103) that I'm looking at other options. Basic Pitch would leave large gaps where no f0 was predicted even though there was a continuous, isolated vocal melody present.
I haven't actually compared the results for the different libraries yet, but so far librosa.pyin does predict continuous melody in a plausible way, unlike what I got with Basic Pitch.

bmcfee · 2024-01-18T14:27:08Z

If you've got isolated vocals and just need to track a single f0, pyin ought to be quite good. If you don't have isolated vocals, you'd probably do well enough to run it through ht-demucs first and then track the estimated vocal line. Voicing estimates might suffer a bit, but the f0 should be pretty good.

bmcfee added the question Issues asking for help doing something label Jan 17, 2024

drscotthawley closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyin seems "slow", maybe add PWVD? #1798

pyin seems "slow", maybe add PWVD? #1798

drscotthawley commented Jan 17, 2024

drscotthawley commented Jan 17, 2024

bmcfee commented Jan 17, 2024

drscotthawley commented Jan 17, 2024

drscotthawley commented Jan 17, 2024 •

edited

lostanlen commented Jan 18, 2024

drscotthawley commented Jan 18, 2024

bmcfee commented Jan 18, 2024

pyin seems "slow", maybe add PWVD? #1798

pyin seems "slow", maybe add PWVD? #1798

Comments

drscotthawley commented Jan 17, 2024

drscotthawley commented Jan 17, 2024

bmcfee commented Jan 17, 2024

drscotthawley commented Jan 17, 2024

drscotthawley commented Jan 17, 2024 • edited

lostanlen commented Jan 18, 2024

drscotthawley commented Jan 18, 2024

bmcfee commented Jan 18, 2024

drscotthawley commented Jan 17, 2024 •

edited