Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncertainties in Vector Fitting fit parameters #1000

Open
neutrinonerd3333 opened this issue Jan 18, 2024 · 11 comments
Open

Uncertainties in Vector Fitting fit parameters #1000

neutrinonerd3333 opened this issue Jan 18, 2024 · 11 comments

Comments

@neutrinonerd3333
Copy link

Hi! I currently use the the Vector Fitting functionality implemented in scikit-rf for extracting Q-factors of modes in spectra with a lot of background -- the generality of the functional form (arbitrary number of poles) is very convenient for fitting both signal and background features. (In contrast, skrf.qfactor is very sensitive to background, which in our system is very difficult to eliminate.)

Currently, there is no uncertainty estimate for the parameters returned from a Vector Fit. For our work we'd like to bound the error on the fitted Q-factors, so I'm wondering:

  1. How would one estimate this uncertainty? (Is there a straightforward way?)
  2. If this were possible, would it make sense to include this functionality in skrf.vectorFitting? It seems natural to provide uncertainty estimates on the results of a fit.

For (1), one idea I have is to take a Vector Fit result and use it as a starting guess for scipy.optimize.curve_fit (or another scipy.optimize routine) to "refine" the fit returned by Vector Fitting. Of course, one would not expect much refinement, but crucially the optimization would return a covariance matrix for the fit parameters. (This "second fit" step would also allow specifying a custom loss function for the fit -- since the uncertainties on VNA measurements can vary depending on signal strength, for example.)

I'd love to get feedback on this "second fit" idea and hear any other ideas for estimating uncertainties on Vector Fit params. Since I need to get some such uncertainty estimates for my work anyway, I'd also be interested in contributing my code back, should it be natural to include here.

@Vinc0110
Copy link
Collaborator

Vinc0110 commented Jan 19, 2024

Hi Tony!
Welcome and congratulations for opening the thousandth issue on this project 🥇

Please clarify what you mean by uncertainty. I was initially thinking this is about random deviations between subsequent measurements (due to noise), resulting in deviations of the fitted models. But I'm assuming it is about the accuracy of the individual fits, i.e. the error between the fit and the data. If that's the case, are you aware of the function get_rms_error()?

Apparently, the various types of error calculations are often correlated or even equal:
https://stackoverflow.com/questions/40187517/getting-covariance-matrix-of-fitted-parameters-from-scipy-optimize-least-squares

In the final step of the vector fitting process in scikit-rf, the model residues are fitted using numpy.linalg.lstsq(), which also returns the sums of squared residuals. These are currently ignored in the code, but we could make them accessible for the user.

# solve least squares and obtain results as stack of real part vector and imaginary part vector
x, residuals, rank, singular_vals = np.linalg.lstsq(np.vstack((A.real, A.imag)),
np.hstack((freq_responses.real, freq_responses.imag)).transpose(),
rcond=None)

Is this already what you need? It would also be possible to add new error functions other than rms error.

Regards,
Vincent

@neutrinonerd3333
Copy link
Author

Hi Vincent — thanks for the thoughtful reply. I spent some time in the intervening weeks thinking about this (and reading up more on the vector fitting algorithm itself). With respect to your questions:

Please clarify what you mean by uncertainty. I was initially thinking this is about random deviations between subsequent measurements (due to noise), resulting in deviations of the fitted models. But I'm assuming it is about the accuracy of the individual fits, i.e. the error between the fit and the data. If that's the case, are you aware of the function get_rms_error()?

To clarify, I am asking first thing you mentioned: I'm looking to quantify the level of variation I could expect in the fit parameters (poles, residues, …) were I to run the exact same fit on new measurement of the same system.

In the final step of the vector fitting process in scikit-rf, the model residues are fitted using numpy.linalg.lstsq(), which also returns the sums of squared residuals. These are currently ignored in the code, but we could make them accessible for the user.

This was an interesting lead, but after reading up on the vector fitting algorithm in more detail, I don't think this will provide quite what I need.

The interesting part: Since the residue fit procedure is just ordinary least-squares ($A\mathbf{x} = \mathbf{b} + \epsilon$), it would be straightforward to propagate uncertainty in the VNA measurements (in $\mathbf{b}$) into the estimator for the residues $\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}$, since everything is linear.

But this only gives the uncertainty on the residues, and only in the situation where we assume the poles are fixed/certain (at the locations given by the pole relocation procedure, by design). To get uncertainties on the poles (which I care more about anyway), we would need to propagate uncertainties through a matrix diagonalization operation (Appendix B of Gustavsen and Semlyen (1999)), which seems difficult. (And then this pole uncertainty would then also need to go into the $A$ matrix, where they would propagate to the uncertainty in $\hat{\mathbf{x}}$)

Instead, I did have success with the "second fit" idea I mentioned before — I took the vector fitting results and used them as an initial guess for a nonlinear least squares curve fitting procedure (scipy.optimize.curve_fit), and got fit results and uncertainties that look reasonable to me so far. This resolves my proximate issue, but I'd be happy to discuss what I'm doing if there's any interest (though at a glance it does seem somewhat out of scope for inclusion in scikit-rf).

@Vinc0110
Copy link
Collaborator

Vinc0110 commented Feb 11, 2024

OK, thanks for clarifying. It would certainly be cool if we could extend the existing code with uncertainty indicators for the fitted poles and the residues. I must admit that I'm not a huge expert on the mathematical details, but my linear algebra skills were sufficient to take the algorithms from Gustavsen's papers and implemented them in Python.

You mentioned the difficulties to propagate uncertainties through the eigenvalue computation step. I'm afraid it gets worse because of an additional QR decomposition before the eigenvalue computation in order to reduce the size of the coefficient matrix for faster computation [1].
However, Gustavsen commented on the accuracy of the pole estimation, which can be analyzed by the singular values of the coefficient matrix, see section 6 in [2]. We already have those singular values available from numpy.linalg.lstsq(), maybe that helps:

x, residuals, rank, singular_vals = np.linalg.lstsq(A_fast, b, rcond=None)

Another idea: In [3], they propose to weigh the individual data samples by their variance: $w(s_k) = 1 / {\sigma^2(s_k)}$. That would improve both the rate of convergence and the accuracy. I have already implemented weighting of the responses based on their 2-norm:

# responses will be weighted according to their norm;
# alternative: equal weights with weight_response = 1.0
# or anti-proportional weights with weight_response = 1 / np.linalg.norm(freq_response)
weights_responses = np.linalg.norm(freq_responses, axis=1)
#weights_responses = np.ones(self.network.nports ** 2)
#weights_responses = 10 / np.exp(np.mean(np.log(np.abs(freq_responses)), axis=1))
# weight of extra equation to avoid trivial solution
weight_extra = np.linalg.norm(weights_responses[:, None] * freq_responses) / n_samples
# weights w are applied directly to the samples, which get squared during least-squares fitting; hence sqrt(w)
weights_responses = np.sqrt(weights_responses)
weight_extra = np.sqrt(weight_extra)

As you can see, I have also played around with different types of weights. It would be possible to have the user provide weights as a parameter for vector_fit().

Finally, I'm also wondering why it is at all required to have uncertainty estimates on the poles. Aren't the poles just providing some sort of basis function, similar to providing a polynomial to be fitted to the data? Sure, the pole locations will be (slightly) different if you repeat the fit with new data of the same electrical network, due to noise. Still, given a reasonable set of poles placed at roughly the "right" positions, I would expect the residues to have a much stronger effect on the overall uncertainty. But I might be wrong here.

[1] D. Deschrijver, M. Mrozowski, T. Dhaene, D. De Zutter, “Marcomodeling of Multiport Systems Using a Fast Implementation of the Vector Fitting Method”, IEEE Microwave and Wireless Components Letters, vol. 18, no. 6, pp. 383-385, June 2008, DOI: https://doi.org/10.1109/LMWC.2008.922585

[2] B. Gustavsen, A. Semlyen, “Rational Approximation of Frequency Domain Responses by Vector Fitting”, IEEE Transactions on Power Delivery, vol. 14, no. 3, pp. 1052-1061, July 1999, DOI: https://doi.org/10.1109/61.772353

[3] F. Ferranti, Y. Rolain, L. Knockaert and T. Dhaene, "Variance Weighted Vector Fitting for Noisy Frequency Responses," in IEEE Microwave and Wireless Components Letters, vol. 20, no. 4, pp. 187-189, April 2010, DOI: https://doi.org/10.1109/LMWC.2010.2042546

@Vinc0110 Vinc0110 added the Feature Request Wished Feature label Feb 11, 2024
@Vinc0110
Copy link
Collaborator

I'm wondering what kind of results you get from scipy.optimize.curve_fit(). Is there any significant change in the model parameters? Or does it more or less return the same parameters you got from vector_fit(), only with the extra uncertainty information? Out of curiosity, can you maybe share some of the results you get (or even the frequency responses that you are fitting)?

In case there is not much change in the poles and the variation mostly happens in the residues, what if you fit the poles only once for your network and then always work with the same set of poles, without having them re-fitted to the new data? For this to work, we would need to split the procedures in vector_fit() into something like fit_poles() and fit_residues().

@Vinc0110
Copy link
Collaborator

Another idea: what about brute force? You take a number of $N$ individual measurements of your network to calculate statistical values such as mean and standard deviation. Instead of trying to map those values to the fitting parameters, what if you vector fit all of the individual measurements and then calculate mean and standard deviation of the fit parameters (poles, residues, ...)? Depending on the number of poles and the number of frequency samples, the computational load might still be acceptable.

@neutrinonerd3333
Copy link
Author

Finally, I'm also wondering why it is at all required to have uncertainty estimates on the poles. Aren't the poles just providing some sort of basis function, similar to providing a polynomial to be fitted to the data? Sure, the pole locations will be (slightly) different if you repeat the fit with new data of the same electrical network, due to noise. Still, given a reasonable set of poles placed at roughly the "right" positions, I would expect the residues to have a much stronger effect on the overall uncertainty. But I might be wrong here.

My science goal here is to extract the $Q$ of various resonance features from VNA traces, and to have errorbars on those $Q$ values. The original Gustavsen and Semlyen paper frames the fitting procedure as a way to approximate a full frequency response, with the fitted model as an end-goal. I care about having an accurate model too, but only so that I can get a reliable estimate on some $Q$ — in a fitted model, the pole corresponding to the resonance of interest is ultimately all I use for science. (I'm not using skrf.qcircle or other similar routines because they don't seem to handle well the sorts of backgrounds that we have in our data. The extra flexibility we get from Vector Fitting has been helpful for this.)

However, Gustavsen commented on the accuracy of the pole estimation, which can be analyzed by the singular values of the coefficient matrix, see section 6 in [2]. We already have those singular values available from numpy.linalg.lstsq(), maybe that helps…

This is interesting — is there a way to use the singular values to quantify the uncertainty on the fitted poles? (I don't have great intuition for singular value decomposition…)

Another idea: In [3], they propose to weigh the individual data samples by their variance…

Nice! This is exactly what the sigma parameter of scipy.optimize.curve_fit lets you do as well. Restricted only to the problem of fitting the residues given fixed poles, I think the two methods would be equivalent.

I'm wondering what kind of results you get from scipy.optimize.curve_fit(). Is there any significant change in the model parameters? Or does it more or less return the same parameters you got from vector_fit(), only with the extra uncertainty information?

The fitted parameters change very little, which is a good sign (barely any visible change when plotting model responses). The extra uncertainty information looks reasonable as well and agrees with intuition. (For numerical stability, I had to work in frequency units where the average network frequency is 1, as the current vector_fit implementation does. With raw frequencies in Hz, I get very unreasonably small uncertainty estimates. I'm considering further improving this by working in units where the frequency span is 1 and shifting pole imaginary parts to center the average network frequency at 0.)

Out of curiosity, can you maybe share some of the results you get (or even the frequency responses that you are fitting)?

Yes! I'm making some plots and will get back to you on this after I'm done with them.

In case there is not much change in the poles and the variation mostly happens in the residues, what if you fit the poles only once for your network and then always work with the same set of poles, without having them re-fitted to the new data? For this to work, we would need to split the procedures in vector_fit() into something like fit_poles() and fit_residues().

Could you elaborate? I'm not sure how this would give an estimate of the pole uncertainty.

Another idea: what about brute force? You take a number of individual measurements of your network to calculate statistical values such as mean and standard deviation. Instead of trying to map those values to the fitting parameters, what if you vector fit all of the individual measurements and then calculate mean and standard deviation of the fit parameters (poles, residues, ...)? Depending on the number of poles and the number of frequency samples, the computational load might still be acceptable.

Estimating uncertainties Monte Carlo–style is an option, but would be difficult for us logistically: our apparatus is cryogenic, and we share the cryostat with others, so we have limited time to take data.

@neutrinonerd3333
Copy link
Author

Here's an example of a Vector Fit together with a refined fit. (The panels are just different views of the same data.)

Vector fit with refined fit.

The refined fit looks almost the same as the original fit. There is a slight improvement in rms error which is partly due to shifts in the pole locations and partly due to the assumption of i.i.d. errors in my fit (which actually makes the optimization equivalent to minimizing rms error; this is in contrast with the implicit assumption $\mathrm{var}(S(\omega)) \propto \frac{1}{S(\omega)}$ made by the choice of weighting observations by $S(\omega)$ as you mentioned above, which corresponds to minimizing $\chi^2 = \sum_k \frac{(S_\text{fit}(\omega_k) - S(\omega_k))^2}{\mathrm{var} S(\omega_k)}$ instead).

The four fitted complex poles $p = \sigma + i\omega$ correspond to resonances with (angular) frequencies $\omega$ and linewidths (full-width half-max in the absence of a background) $\gamma = -2\sigma$ as follows:

Frequency (GHz) FWHM
83.965(15) 40(60) MHz
83.9669(4) 2.7(5) MHz
83.9656211(10) 50.6(1.9) kHz
83.96614099(24) 26.6(5) kHz

The first two just help fit the background (so the large uncertainties in their linewidths is understandable), while the other two correspond to the obvious resonance features in the plot (and have correspondingly tight uncertainty in frequency and linewidth).

@Vinc0110
Copy link
Collaborator

This is interesting — is there a way to use the singular values to quantify the uncertainty on the fitted poles? (I don't have great intuition for singular value decomposition…)

Apparently, one can do all sorts of cool things with SVD. I found these lecture notes online, but it did not help me very much. It became once again obvious that I'm not a mathematician. Maybe it does help you. See page 11 for example:
https://cims.nyu.edu/~cfgranda/pages/MTDS_spring19/notes/svd.pdf

I'm afraid I'm not very helpful here, but one last question:
What I still don't understand is why you cannot measure multiple sweeps with the network analyzer, e.g. $N=100$ or $N=1000$. Is the sweep time really that long that you cannot afford the time in the cryostat?
I was calling this the brute force approach before, but isn't that the default (and required?) method for error estimations in any measurement? Is it enough to analyze and fit one single noisy frequency sweep and trust the output (popt, pcov) from curve_fit()? What would you provide for sigma in curve_fit() if you only measured one sweep?

Again, with $N \gg 1$, you could easily vector fit every single sweep and calculate the standard deviation on the $N$ solutions for the poles. Obviously, the vector fitting can be done at a later time, just measure the frequency sweeps and leave the cryostat.

@neutrinonerd3333
Copy link
Author

neutrinonerd3333 commented Mar 11, 2024

I think I agree with you now: you are right that the additional sweep time isn't infeasible. After ruminating a bit more carefully on this, I think the problem is a combination of the additional sweep time required (1, 2), the technical effort required to integrate this with our existing sequences (3, 4), and the time we would need to retake old measurements (5, 6).

  1. The sweep time cost is still substantial: we care about 10–15 modes at a time, and the individual sweep times can be long (we have low SNR and take measurements with IF bandwidth between 3 and 30 Hz).
  2. Our cooldowns are already long by nature: we don't know a priori where to expect the modes we want, so we take broadband data for 2.5 days, then manually analyze to find our modes. So the extra time from the additional sweeps doesn't feel like a good use of the remaining limited fridge time given that our existing data should already have uncertainty information. (e.g. sigma in curve_fit() can come just by looking at vector fit residuals -- they look pretty Gaussian.)
  3. The added complexity to our cooldown sequence costs additional (human) time to implement and debug. (Not a problem asymptotically, but given very finite fridge time it's significant.)
  4. It's technically challenging to maintain the fridge at temperatures between 1 and 4 K, where we happen to want data often. So we start the (relatively quick) 4-to-1 cooldown sequence and take repeated measurements of our modes during this process. For technical cryostat reasons it would be difficult to fully automate this process.
  5. We have past data without $N\gg 1$ repeated measurements. We could go back and retake those measurements, but that would require extra effort to reconfigure our setup to old configurations and impose an opportunity cost: with the limited number of cooldowns we get (a few annually), there are other measurements we'd rather make.
  6. Sometimes uninteresting resonances in the broadband scan turn out to be interesting in retrospect. Then we run into problem (5) again.

@Vinc0110
Copy link
Collaborator

Thanks for sharing the details, very interesting stuff! I did not expect such a long sweep time, but only a few Hz of bandwidth is pretty extreme... It would be cool if you could keep us updated how the new approach works out.

Regarding the vector fitting code, I'm not sure if there is much we can do about your measurement challenges. Maybe we can keep this issue open as a reminder for the feature request of user-specific sample weighting.

@neutrinonerd3333
Copy link
Author

Your questions were still helpful — I actually had to think hard about why exactly we take our measurements the way we do now, and I think the answer is just that our circumstances are rather unique. (Though I'm sure there are optimizations to be made to our process if we thought hard enough about it.)

I think I have a solution that works well enough for now (the second fit), but since it seems out of scope for scikit-rf, I don't personally have any outstanding issues here. The user-specific weighting seems easy enough to implement, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants