Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: getParamsFromMethods - Could this be annotations? #47

Open
salamanders opened this issue Oct 14, 2016 · 8 comments
Open

FR: getParamsFromMethods - Could this be annotations? #47

salamanders opened this issue Oct 14, 2016 · 8 comments

Comments

@salamanders
Copy link

This seems perfect for annotations:

@TunableParameter(
  name="RBFKernel_Sigma",
  minValue=0.001,
  maxValue=2_000,
  startValue=1832,
  tunePriority=TunableParameterPriority.HIGH
)
@EdwardRaff
Copy link
Owner

I'm not sure annotations make sense, as some times the min/max values will be dependent upon the data. The RBF Kernel is actually a good example of that if you look at the code (there is a guessSigma method that returns a distribution to search over for the value of Sigma).

I do like the idea of a tuning priority though. I think it will take some thought on how that should be integrated. Maybe just an extra parameter when auto-populating tunable values?

@salamanders
Copy link
Author

I was over-eager with the min/max, but doesn't the RandomSearch need some sort of constraints?

My high-level FR 0: the code in getParamsFromMethods(final Object obj, String prefix) uses string parsing of the method names, which seems more fragile than expressly tagging tunable parameters with an annotation, and is harder to search the code for tunable algorithms. This is after I got all excited about auto-tune, and then ran into "This model doesn't seem to have any easy to tune parameters" over and over, and then tried to see which were tunable.

Add-on 1: If a class is declaring parameters as tunable, the docs often say "you should bother adjusting A and B, don't worry about C unless you have a very odd case" which is the motivator for the priority ranking in the annotation. If it already have the params declared w/ annotations, it feels like tagging a (default MEDIUM) priority to LOW or HIGH would be easy.

Add-on 2: Sane boundaries also communicate knowledge AND could enhance the effectiveness of the RandomSearch. Should the abc value be from 0 to 1? 1 to 1000? 0 to 1 but really almost always 0 to 0.001? heckifIknow.

You have to be doing something similar already, I was tracing getParam-getGuess-guessMethod-invoke but got a bit lost. Maybe one really can't guess without looking at the data?
But you do - baseLearner = new RandomDecisionTree(1, Integer.MAX_VALUE, 3, TreePruner.PruningMethod.NONE, 1e-15); seems like a decent guess, and that "testProportion=1e-15" came from some valuable knowledge in your head.

@EdwardRaff
Copy link
Owner

but doesn't the RandomSearch need some sort of constraints?

RandomSearch needs a distribution. Which may or may not just be uniform. The current framework always returns a distribution object by default. GridSearch then uses quantiles from that distribution.

This is after I got all excited about auto-tune, and then ran into "This model doesn't seem to have any easy to tune parameters" over and over, and then tried to see which were tunable.

Any thoughts on how to make it easier to search? Some of it is compositional though. Like any algorithm that takes the Kernel Trick will have different parameters depending on the kernel given.

Sane boundaries also communicate knowledge AND could enhance the effectiveness of the RandomSearch. Should the abc value be from 0 to 1? 1 to 1000? 0 to 1 but really almost always 0 to 0.001? heckifIknow.

See above, RandomSearch doesn't need boundaries - it needs a distribution. And the values can change depending on the data. I try to put what the value range is in the documentation. I think if you are going to set them yourself, you should be reading up on the algorithm. Otherwise just trust the auto-fill defaults.

You have to be doing something similar already, I was tracing getParam-getGuess-guessMethod-invoke but got a bit lost. Maybe one really can't guess without looking at the data?

Depends on the algorithm :-/

seems like a decent guess, and that "testProportion=1e-15" came from some valuable knowledge in your head.

That was actually an implementation detail because earlier versions didn't allow 0. Has nothing to do with what parameters you should try. In fact, for RF you should never make that value larger.

@salamanders
Copy link
Author

Understood. Hummm... maybe it is as easy as two functions, or examples on how to emulate two functions, the facetiously named

  1. willThisAlgoEverBeOptimizableRegardlessOfCurrentData()
  2. isThisAlgoOptimizableGivenTheDataSetAndKernelAndOtherStuffItCurrentlyHas()

Where #1 makes it easy to search the code, and #2 is the current "you don't know until you get there" reality.

@EdwardRaff
Copy link
Owner

Hmm. Are you more interested in finding the tunable parameters themselves, or just the algorithms that have some?

Could be just a annotation with no code meaning. "@Tunable". Just lets you know that the object has parameters to tune.

@salamanders
Copy link
Author

That would be excellent. It is what I kinda assumed "implements Parameterized" indicated, but now I'm thinking I was reading too much into it.

@EdwardRaff
Copy link
Owner

Do you have a usecase where you wanted/need these annotations at runtime, or purely to make it easier to search through the docs?

@salamanders
Copy link
Author

I wanted them at runtime to see if it was worth passing the algo through a RandomSearch to try to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants