-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Metadata for SMAC enabling Multi-Fidelity #771
base: main
Are you sure you want to change the base?
Conversation
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
self.trial_info_map: Dict[ConfigSpace.Configuration, TrialInfo] = {} | ||
self.trial_info_df: pd.DataFrame = pd.DataFrame( | ||
columns=["Configuration", "Metadata", "TrialInfo", "TrialValue"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replacing this with a dataframe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because configurations are no longer a unique identifier if we support metadata.
An alternative would be to use the type:
Dict[Tuple[ConfigSpace.Configuration, pd.DataFrame], Tuple[TrialInfo, TrialValue]]
but I thought using a dataframe was simpler and more flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not certain you can actually use that Tuple as a Dict key (may not be hashable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuples are hashable, but dataframes are not
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
self.lock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some comments regarding the need for the threading lock.
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
[df_ctx.equals(ctx) for df_ctx in self.trial_info_df["Metadata"]] | ||
) | ||
|
||
# make a new entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments could use some improvement.
Make a new entry of what? What's the first part doing that this one now needs explanation?
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py
Outdated
Show resolved
Hide resolved
for _, _, _, metadata in observations]).reset_index(drop=True) | ||
return (configs, scores, contexts if len(contexts.columns) > 0 else None, metadatas if len(metadatas.columns) > 0 else None) | ||
|
||
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Tuples
are getting a little large and hard to read (recall a previous version of this PR where the order of them was mistakenly swapped at one point).
Think we discussed creating a NamedTuple or small DataClass for them instead so that they can be accessed by name in order to make it more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want I can do this in this PR, or another follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a predecessor PR would be better. Much like we did with adding the metadata args and named args first.
@pytest.mark.parametrize(('optimizer_type', 'verify', 'kwargs'), [ | ||
# Enumerate all supported Optimizers | ||
*[(member, verify, {"seed": SEED, "facade": MFFacade, "intensifier": SuccessiveHalving, "min_budget": 1, "max_budget": 9}) | ||
for member, verify in [(OptimizerType.SMAC, smac_verify_best)]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty weird pattern. If we wanted to add more types in the future those kwargs wouldn't work.
What we've done elsewhere is
- retain the loop over
OptimizerType
in order to make sure that all optimizers get a test added (or explicitly excluded) whenever we add new optimizer backends. - When necessary, within the body of the test set additional parameters in a switch statement for a given optimizer type. e.g.,
if optimizer_type == OptimizerType.SMAC:
verifier = smac_verify_best
elif optimizer_type == OptimizerType.FLAML:
pytest.skip("TODO: FLAML Optimizer does not yet support metadata") # though even here, I think we should test *something*
else:
raise NotImplementedError(f"Missing test handler for OptimizerType {optimizer_type}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the kwargs into the tuple which fixes the kwargs problem.
I find switching over unsupported optimizers messy since it means, in this case, we should skip the entire test as you have above, unless we are testing SMAC. There are other places where we test to make sure that using metadata throws exceptions, so without metadata support there is nothing to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when we add that functionality to another optimizer it will be very easy to forget to enable related testing support, so I'd rather add that now as it will be easier to search for what we need to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be the same either way? One requires adding a line, the other requires moving a line. I make this more similar to other tests, but I don't think it changes much either way.
mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
…zer.py Co-authored-by: Brian Kroth <[email protected]>
Enables changing the defaults in the SMAC optimizer to enable state passing (what SMAC internally calls context, but that doesn't match MLOS's notion of context) and multi-fidelity tuning.
This required a few additional changes to metadata throughout the project
Additionally, this PR includes a test case for Multi-Fidelity Tuning
This PR replaces the previous one (#751)