-
Notifications
You must be signed in to change notification settings - Fork 75
Transforms
Currently, the following basic digital transforms are implemented:
- Fast Fourier Transform (FFT)
- Short-Time Fourier Transform (STFT)
- Discrete Cosine Transform (type I-IV)
- Cepstral Transform
- Hilbert Transform
- Goertzel algorithm for evaluation of one particular FFT term
Note, the implementation of algorithms is far from optimized. Nevertheless, even on humble modern hardware the overall speed of transforms allows performing a lot of useful things without significant or noticeable delays. And who says that it's the last version of NWaves? 😃 There's always a place for optimizations, and corresponding PRs are very welcome.
The FFT code is conventional decimation radix-2 FFT algorithm (O(Nlog(N))). Hence, direct/inverse FFT requires the length of input data to be a power of 2, otherwise exception will be thrown.
Implementation of DCT I-IV transformers is straightforward (O(N^2)), since so far it's used only in MFCC-like algorithms where small array sizes are involved.
Other transforms are based on FFT, so their performance is conditioned by the performance of FFT.
For each transform there's a corresponding transformer object. Each transformer object has Direct()
and Inverse()
methods.
The only parameter in Fft
constructor is the size of FFT (that should be power of 2, as mentioned above). Both direct and inverse transforms are performed in-place on arrays of real and imaginary parts of samples. If the length of data differs from specified size of Fft
object, exception will be thrown.
var fft = new Fft(1024);
float[] real = signal.First(1024).Samples;
float[] imag = new float [1024];
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);
In some cases it's better to work directly with real
and imag
arrays (see Phase Vocoder code). But often what we need is to compute some real-valued post-processed results of complex FFT, like magnitude spectrum or power spectrum:
var magnitudeSpectrum =
fft.MagnitudeSpectrum(signal[1000, 2024]);
var powerSpectrum =
fft.PowerSpectrum(signal.First(1024), normalize: false);
var logPowerSpectrum =
fft.PowerSpectrum(signal.Last(1024))
.Samples
.Select(s => Scale.ToDecibel(s))
.ToArray();
The second parameter defines whether the spectrum should be normalized by the size of FFT or not. By default power spectrum is normalized, and magnitude spectrum is not normalized.
Also, methods MagnitudeSpectrum
and PowerSpectrum
will automatically zero-pad or truncate samples to fit FFT size (so unlike Direct()
and Inverse()
methods there will be no exceptions in this case).
The magnitude/power spectrum has length (FFT size) / 2 + 1, i.e. it includes 0th coefficient (DC component) and does not contain the symmetric part of the spectrum.
Finally, methods MagnitudeSpectrum
and PowerSpectrum
have overloaded versions for float[]
arrays designed for efficient sequential spectral analysis of a signal (discussed below).
Fft.Shift
is quite conventional routine in DSP. It shifts the zero-frequency component of spectrum to the center of the spectrum. It can work with any even-sized array:
float[] array = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Fft.Shift(array);
// array = { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 }
var stft = new Stft(800, 512, WindowTypes.Hamming, 1024);
var timefreq = stft.Direct(signal);
var reconstructed = stft.Inverse(timefreq);
The first parameter in the constructor is the window size (it may be less than FFT size and doesn't have to be a power of 2). The second parameter is the hop size. The third parameter is the type of a window function. The fourth parameter is the size of FFT that is calculated by default as the nearest power of 2 to window size.
The result of direct STFT has type List<Tuple<float[], float[]>>
, i.e. collection of spectra. Each spectrum is represented with two arrays containing real and imaginary parts of spectral components. This collection contains all necessary data for inverse STFT. Often this information is redundant, and we need to evaluate only the so called spectrogram of a signal which is essentially a sequence of power spectra. In this case we'd use the Spectrogram
method that returns simply the List<float[]>
:
var stft = new Stft(512, 256);
var spectrogram = stft.Spectrogram(signal);
13 well-known windowing functions are implemented in NWaves:
- Rectangular
- Triangular
- Hamming
- Blackman
- Hann
- Gaussian
- Kaiser
- Kaiser-Bessel derived
- Bartlett-Hann
- Lanczos
- Power of sine
- Flat-top
- Liftering
Window samples can be generated one of two ways:
// 1)
var blackmanWindow = Window.Blackman(512);
// 2)
var hannWindow = Window.OfType(WindowTypes.Hann, 512);
WindowTypes
is enum for window functions. Many algorithms in NWaves can accept parameters of this type.
All windows require length
parameter, and some of them have additional parameters:
var kaiserWindow1 = Window.Kaiser(512); // by default, alpha = 12
var kaiserWindow2 = Window.Kaiser(512, alpha: 9);
var kaiserWindow3 = Window.OfType(WindowTypes.Kaiser, 512, 15);
var kbdWindow1 = Window.Kbd(512); // by default, alpha = 4
var kbdWindow2 = Window.Kbd(512, alpha: 9);
var kbdWindow3 = Window.OfType(WindowTypes.Kbd, 512, 15);
var sineWindow1 = Window.PowerOfSine(512); // by default, alpha = 1.5
var sineWindow2 = Window.PowerOfSine(512, alpha: 0.5);
var sineWindow3 = Window.OfType(WindowTypes.PowerOfSine, 512, 2);
var lifterWindow1 = Window.Liftering(512); // by default, l = 22
var lifterWindow2 = Window.Liftering(512, l: 16);
var lifterWindow3 = Window.OfType(WindowTypes.Liftering, 512, 28);
There are several extension methods for applying windows:
var signal = new DiscreteSignal(8000, 128, 1f);
var samples = new float[128];
var window = Window.Blackman(128);
signal.ApplyWindow(window); // 1) window samples are pre-computed
samples.ApplyWindow(samples); // 2) window samples are pre-computed
signal.ApplyWindow(WindowTypes.Blackman); // 3) window samples will be created each time
samples.ApplyWindow(WindowTypes.Blackman); // 4) window samples will be created each time
var input = new float[20]; // fill it with values
var output = new float[12];
var dct1 = new Dct1(20, 12); // type I
dct1.Direct(input, output);
dct1.Inverse(output, input);
var dct2 = new Dct2(20, 12); // type II
dct2.Direct(input, output);
dct2.Inverse(output, input);
// normalized DCT-II (used in PNCC algorithm):
dct2.DirectN(input, output)
var dct3 = new Dct3(20, 12); // type III
dct3.Direct(input, output);
dct3.Inverse(output, input);
var dct4 = new Dct4(20, 12); // type IV
dct4.Direct(input, output);
dct4.Inverse(output, input);
var ct = new CepstralTransform(20, fftSize: 512);
var cepstrum = ct.Direct(signal);
var powerCepstrum = ct.Direct(signal, power: true);
The first parameter in the constructor of CepstralTransform
is the number of cepstral coefficients. The second parameter is the size of FFT that will be used to compute cepstrum.
Cepstrum is calculated according to formula:
Power cepstrum is calculated according to formula:
var ht = new HilbertTransform(1024);
var result = ht.Direct(doubleSamples);
HilbertTransform
class also provides method for computing complex analytic signal. Thus, previous line is equivalent to:
var result = ht.AnalyticSignal(doubleSamples).Imag;
By default HilbertTransformer works with double precision. This code is for floats:
var ht = new HilbertTransform(1024, doublePrecision: false);
var result = ht.AnalyticSignal(floatSamples).Item2;
Example of computing only one term (#2) of 8-size FFT using Goertzel algorithm:
float[] array = { 1, 2, 3, 4, 5, 6, 7, 8 };
Complex secondFftTerm = new Goertzel(8).Direct(array, 2);
In previous cases the result of each transform was a newly created object of the DiscreteSignal
class. If the sequence of blocks must be processed then it's better to work with reusable arrays in memory (all output and intermediate results will be stored in reusable arrays, i.e. there will be no unnecessary memory allocations):
var spectrum = new float[1024];
var cepstrum = new float[20];
fft.PowerSpectrum(signal[1000, 2024].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[2024, 3048].Samples, spectrum);
// do something with spectrum
fft.PowerSpectrum(signal[3048, 4072].Samples, spectrum);
// do something with spectrum
ct.Direct(signal[5000, 5512].Samples, cepstrum)
// do something with cepstrum
//...
Hence, the following code is the best for spectral analysis and widely used in many parts of NWaves library:
// for extension method FastCopyTo()
using NWaves.Utils;
var fft = new Fft(512);
var window = Window.Blackman(512);
var spectrum = new float[512];
var block = new float[512];
// loop:
for (var pos = 0; pos + fftSize < signal.Length; pos += hopSize)
{
// copy 512 samples to 'block' starting from index 'pos'
signal.Samples.FastCopyTo(block, 512, pos);
block.ApplyWindow(window);
fft.MagnitudeSpectrum(block, spectrum);
// post-process/visualize spectrum
}
In filter design and analysis we're dealing with doubles (64 bit). Therefore, 64-bit version of FFT was added to library. It can be used just like its 32-bit counterpart, however, only Direct()
and Inverse
methods are available:
// for extension method ToDoubles()
using NWaves.Utils;
var fft = new Fft64(512);
double[] real = signal.First(512).Samples.ToDoubles();
double[] imag = new double [512];
// in-place FFT
fft.Direct(real, imag);
// ...do something with real and imaginary parts of the spectrum...
// in-place IFFT
fft.Inverse(real, imag);