Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractGP and Kriging perform badly due to lack of hyperparameter optimisation #328

Open
st-- opened this issue Mar 29, 2022 · 4 comments

Comments

@st--
Copy link
Contributor

st-- commented Mar 29, 2022

If not improving it outright, it would be good to make this clear from the documentation, as the current state is quite confusing if you don't dive into the code and realize what's missing (e.g. see #251). This might partially be resolved by #224, but to be competitive with other packages such as mogp-emulator, a lot more work is needed, and this package doesn't work out of the box. (E.g. beyond hyperparameter optimisation, also careful initialisation of hyperparameters & priors on the parameters would be required.)
Happy to add more detailed explanation if required.

@vikram-s-narayan
Copy link
Contributor

Yes. I will add this info to the documentation. Thank you!

@vikram-s-narayan
Copy link
Contributor

vikram-s-narayan commented Apr 8, 2022

@st--

I'm planning on adding the following example to the documentation.

#this is a starter example for how to
#find optimal initial hyperparameters

using Surrogates
using AbstractGPs
using Hyperopt

sp(x) = sum(x.^2)
n_samples = 50
lower_bound = [-5.12, -5.12]
upper_bound = [5.12, 5.12]

xys = sample(n_samples, lower_bound, upper_bound, SobolSample())
zs = sp.(xys)
true_val = sp((0.0,0.0)) #only one validation point is taken in this example; more points can give better results

function surrogate_err_min(kernelType, Σcandidate)
    candidate_gp_surrogate = AbstractGPSurrogate(xys, zs, gp=kernelType, Σy=Σcandidate)
    return candidate_gp_surrogate((0.0,0.0)) - true_val
end

ho = @hyperopt for i=100,
    sampler = RandomSampler(),
    a = [GP(SqExponentialKernel()), GP(Matern32Kernel()), GP(Matern52Kernel())],  
    b = LinRange(0,1,100)
@show surrogate_err_min(a,b)
end

Hope this is in line with your suggestion?

@st--
Copy link
Contributor Author

st-- commented Apr 14, 2022

Hi @vikram-s-narayan, just throwing Hyperopt.jl at it is definitely better than not optimising at all, but if I understand your example correctly, it makes a bunch of limiting assumptions:

  • it only optimises the noise variance, not the kernel hyperparameters (e.g. signal variance, lengthscale)
  • it treats the GP model the same you would e.g. a neural network where you have no guarantees on anything, simply minimising the error on some validation points (NB: should your surrogate_err_min return MAE or RMSE (always >= 0) instead of the difference (which can be arbitrarily negative)?)

For GPs as a surrogate model, it'd be great to actually treat them properly, e.g. you can optimise all hyperparameters using the marginal likelihood as an objective (that doesn't require any validation points - just on the training points themselves!), see e.g. https://juliagaussianprocesses.github.io/AbstractGPs.jl/stable/examples/1-mauna-loa/#Hyperparameter-Optimization

(For more background reading, see these great tutorials: https://distill.pub/2019/visual-exploration-gaussian-processes/ and http://tinyurl.com/guide2gp)

@st--
Copy link
Contributor Author

st-- commented Apr 14, 2022

It'd be good to make clear to readers/users what the limitations/assumptions of the examples are, so when they try it out they know that bad performance might be due to these limitations of your implementation, rather than due to any issues with the underlying method. (Then there's an incentive to improve the implementation, instead of just walking away from it thinking it's useless!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants