Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Metadata for SMAC enabling Multi-Fidelity #771

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
548af15
Implement metadata for multifidelity in SMAC
jsfreischuetz Jul 2, 2024
36ac67a
Merge branch 'main' into multi-fidelity
jsfreischuetz Jul 2, 2024
4cc133b
Merge branch 'main' into multi-fidelity
bpkroth Jul 2, 2024
5ab03c8
Merge branch 'main' into multi-fidelity
bpkroth Jul 3, 2024
bfd2a42
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
16208f4
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
938f8f0
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
81d6d56
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
bf2f3cc
Update mlos_core/mlos_core/optimizers/README
jsfreischuetz Jul 8, 2024
1686c7c
some comments
jsfreischuetz Jul 8, 2024
d263613
more comments for README
jsfreischuetz Jul 8, 2024
6766b8d
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
bae1763
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
53af62b
mergE
jsfreischuetz Jul 8, 2024
81e8bb0
Merge branch 'multi-fidelity' of github.com:jsfreischuetz/MLOS into m…
jsfreischuetz Jul 8, 2024
dcff9cc
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
50ef16c
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
c32bd67
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
2b15694
Update mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py
jsfreischuetz Jul 8, 2024
574b8cc
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
3d4c055
Update mlos_core/mlos_core/optimizers/utils.py
jsfreischuetz Jul 8, 2024
41ee533
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
cfa936a
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
abd3eb6
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jul 8, 2024
e0ac571
comment
jsfreischuetz Jul 8, 2024
c1e0845
Merge branch 'multi-fidelity' of github.com:jsfreischuetz/MLOS into m…
jsfreischuetz Jul 8, 2024
9234599
comments
jsfreischuetz Jul 8, 2024
054fce3
Merge branch 'main' into multi-fidelity
jsfreischuetz Jul 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"linalg",
"llamatune",
"matplotlib",
"metadatas",
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
"mlos",
"mloscore",
"mwait",
Expand Down Expand Up @@ -72,6 +73,7 @@
"sklearn",
"skopt",
"smac",
"Sobol",
"sqlalchemy",
"srcpaths",
"subcmd",
Expand Down
2 changes: 1 addition & 1 deletion mlos_bench/mlos_bench/optimizers/mlos_core_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ def register(self, tunables: TunableGroups, status: Status,
return registered_score

def get_best_observation(self) -> Union[Tuple[Dict[str, float], TunableGroups], Tuple[None, None]]:
(df_config, df_score, _df_context) = self._opt.get_best_observations()
(df_config, df_score, _df_context, _df_metadata) = self._opt.get_best_observations()
if len(df_config) == 0:
return (None, None)
params = configspace_data_to_tunable_values(df_config.iloc[0].to_dict())
Expand Down
27 changes: 27 additions & 0 deletions mlos_core/mlos_core/optimizers/README
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Optimizers

This is a directory that contains wrappers for different optimizers to integrate into MLOS.
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`.

The main goal of these optimizers is to `suggest` configurations, possibly based on prior trial data to find an optimum based on some objective(s).
This process is interacted with through `register` and `suggest` interfaces.

The following definitions are useful for understanding the implementation

- `configuration`: a vector representation of a configuration of a system to be evaluated.
- `score`: the objective(s) associated with a configuration
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
- `context`: additional (static) information about the evaluation used to extend the internal model used for suggesting samples.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
For instance, a descriptor of the VM size (vCore count and # of GB of RAM), and some descriptor of the workload.
The intent being to allow either sharing or indexing of trial info between "similar" experiments in order to help make the optimization process more efficient for new scenarios.
> Note: This is not yet implemented.
The interface for these classes can be described as follows:
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved

- `register`: this is a function that takes a configuration, a score, and, optionally, metadata about the evaluation to update the model for future evaluations.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
- `suggest`: this function returns a new configuration for evaluation.

Some optimizers will return additional metadata for evaluation, that should be used during the register phase.
This function can also optionally take context (not yet implemented), and an argument to force the function to return the default configuration.
- `register_pending`: registers a configuration and metadata pair as pending to the optimizer.
- `get_observations`: returns all observations reported to the optimizer as a triplet of DataFrames (config, score, context, metadata).
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
- `get_best_observations`: returns the best observation as a triplet of best (config, score, context, metadata) DataFrames.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
250 changes: 220 additions & 30 deletions mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

Large diffs are not rendered by default.

59 changes: 41 additions & 18 deletions mlos_core/mlos_core/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ def __init__(self, *,
raise ValueError("Number of weights must match the number of optimization targets")

self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = []
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []
self._has_context: Optional[bool] = None
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = []
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []

def __repr__(self) -> str:
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})"
Expand Down Expand Up @@ -98,7 +98,7 @@ def register(self, *, configs: pd.DataFrame, scores: pd.DataFrame,
"Mismatched number of configs and context."
assert configs.shape[1] == len(self.parameter_space.values()), \
"Mismatched configuration shape."
self._observations.append((configs, scores, context))
self._observations.append((configs, scores, context, metadata))
self._has_context = context is not None

if self._space_adapter:
Expand Down Expand Up @@ -197,26 +197,48 @@ def register_pending(self, *, configs: pd.DataFrame,
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def _get_observations(self, observations:
List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]]
) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
"""
Returns the observations as a triplet of DataFrames (config, score, context).
Returns the observations as a quad of DataFrames(config, score, context, metadata)
for a specific set of observations.

Parameters
----------
observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]]
Observations to run the transformation on

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of (config, score, context) DataFrames of observations.
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of(config, score, context, metadata) DataFrames of observations.
"""
if len(self._observations) == 0:
if len(observations) == 0:
raise ValueError("No observations registered yet.")
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True)
configs = pd.concat([config for config, _, _, _ in observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _, _ in observations]).reset_index(drop=True)
contexts = pd.concat([pd.DataFrame() if context is None else context
for _, _, context in self._observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None)
for _, _, context, _ in observations]).reset_index(drop=True)
metadatas = pd.concat([pd.DataFrame() if metadata is None else metadata
for _, _, _, metadata in observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None, metadatas if len(metadatas.columns) > 0 else None)

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Tuples are getting a little large and hard to read (recall a previous version of this PR where the order of them was mistakenly swapped at one point).

Think we discussed creating a NamedTuple or small DataClass for them instead so that they can be accessed by name in order to make it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want I can do this in this PR, or another follow up PR

Copy link
Contributor

@bpkroth bpkroth Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a predecessor PR would be better. Much like we did with adding the metadata args and named args first.

"""
Returns the observations as a quad of DataFrames(config, score, context, metadata).

Returns
-------
observations: Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of(config, score, context, metadata) DataFrames of observations.
"""
return self._get_observations(self._observations)

def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame],
Optional[pd.DataFrame]]:
"""
Get the N best observations so far as a triplet of DataFrames (config, score, context).
Get the N best observations so far as a quad of DataFrames (config, score, context, metadata).
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets.
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood.

Expand All @@ -227,15 +249,16 @@ def get_best_observations(self, *, n_max: int = 1) -> Tuple[pd.DataFrame, pd.Dat

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of best (config, score, context) DataFrames of best observations.
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]], Optional[pd.DataFrame]]
A quad of best (config, score, context, metadata) DataFrames of best observations.
"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
(configs, scores, contexts) = self.get_observations()
(configs, scores, contexts, metadatas) = self.get_observations()
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index
return (configs.loc[idx], scores.loc[idx],
None if contexts is None else contexts.loc[idx])
None if contexts is None else contexts.loc[idx],
None if metadatas is None else metadatas.loc[idx])

def cleanup(self) -> None:
"""
Expand Down
57 changes: 57 additions & 0 deletions mlos_core/mlos_core/optimizers/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
#
"""
Contains utils used for implementing the mlos_core optimizers
"""
import inspect
from typing import Any, Callable, Dict, List, Optional
import pandas as pd


def to_metadata(metadata: Optional[pd.DataFrame]) -> Optional[List[pd.Series]]:
"""
Converts a list of metadata dataframe objects to a list of metadata objects
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
Parameters
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
----------
metadata : Optional[pd.DataFrame]
The dataframe to convert to metadata

Returns
-------
Optional[List[pd.Series]]
The created metadata object
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
"""
if metadata is None:
return None
return [idx_series[1] for idx_series in metadata.iterrows()]


def filter_kwargs(function: Callable, **kwargs: Any) -> Dict[str, Any]:
bpkroth marked this conversation as resolved.
Show resolved Hide resolved
"""
Filters arguments provided in the kwargs dictionary to be restricted to the arguments legal for
the called function.

Parameters
----------
function : Callable
function over which we filter kwargs for.
kwargs:
kwargs that we are filtering for the target function

Returns
-------
dict
kwargs with the non-legal argument filtered out
"""
sig = inspect.signature(function)
filter_keys = [
param.name
for param in sig.parameters.values()
if param.kind == param.POSITIONAL_OR_KEYWORD
]
filtered_dict = {
filter_key: kwargs[filter_key] for filter_key in filter_keys & kwargs.keys()
}
return filtered_dict
99 changes: 99 additions & 0 deletions mlos_core/mlos_core/tests/optimizers/optimizer_metadata_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
#
"""
Tests for Optimizers using Metadata.
"""

from typing import Callable

import logging
import pytest

import pandas as pd
import ConfigSpace as CS

from smac import MultiFidelityFacade as MFFacade
from smac.intensifier.successive_halving import SuccessiveHalving

from mlos_core.optimizers import (
OptimizerType, OptimizerFactory, BaseOptimizer)
from mlos_core.tests import SEED

_LOG = logging.getLogger(__name__)
_LOG.setLevel(logging.DEBUG)


def smac_verify_best(metadata: pd.DataFrame) -> bool:
"""
Function to verify if the metadata used by SMAC is in a legal state

Parameters
----------
metadata : pd.DataFrame
metadata returned by SMAC

Returns
-------
bool
if the metadata that is returned is valid
"""
max_budget = metadata["budget"].max()
if isinstance(max_budget, float):
return max_budget == 9
return False


@pytest.mark.parametrize(('optimizer_type', 'verify', 'kwargs'), [
# Enumerate all supported Optimizers
*[(member, verify, {"seed": SEED, "facade": MFFacade, "intensifier": SuccessiveHalving, "min_budget": 1, "max_budget": 9})
for member, verify in [(OptimizerType.SMAC, smac_verify_best)]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty weird pattern. If we wanted to add more types in the future those kwargs wouldn't work.

What we've done elsewhere is

  1. retain the loop over OptimizerType in order to make sure that all optimizers get a test added (or explicitly excluded) whenever we add new optimizer backends.
  2. When necessary, within the body of the test set additional parameters in a switch statement for a given optimizer type. e.g.,
if optimizer_type == OptimizerType.SMAC:
  verifier = smac_verify_best
elif optimizer_type == OptimizerType.FLAML:
  pytest.skip("TODO: FLAML Optimizer does not yet support metadata")  # though even here, I think we should test *something*
else:
  raise NotImplementedError(f"Missing test handler for OptimizerType {optimizer_type}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the kwargs into the tuple which fixes the kwargs problem.

I find switching over unsupported optimizers messy since it means, in this case, we should skip the entire test as you have above, unless we are testing SMAC. There are other places where we test to make sure that using metadata throws exceptions, so without metadata support there is nothing to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when we add that functionality to another optimizer it will be very easy to forget to enable related testing support, so I'd rather add that now as it will be easier to search for what we need to enable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be the same either way? One requires adding a line, the other requires moving a line. I make this more similar to other tests, but I don't think it changes much either way.

])
def test_optimizer_metadata(optimizer_type: OptimizerType, verify: Callable[[pd.DataFrame], bool], kwargs: dict) -> None:
"""
Toy problem to test if metadata is properly being handled for each supporting optimizer
"""
max_iterations = 100

def objective(point: pd.DataFrame) -> pd.DataFrame:
# mix of hyperparameters, optimal is to select the highest possible
return pd.DataFrame({"score": point["x"] + point["y"]})

input_space = CS.ConfigurationSpace(seed=SEED)
# add a mix of numeric datatypes
input_space.add_hyperparameter(CS.UniformIntegerHyperparameter(name='x', lower=0, upper=5))
input_space.add_hyperparameter(CS.UniformFloatHyperparameter(name='y', lower=0.0, upper=5.0))

optimizer: BaseOptimizer = OptimizerFactory.create(
parameter_space=input_space,
optimization_targets=['score'],
optimizer_type=optimizer_type,
optimizer_kwargs=kwargs,
)

with pytest.raises(ValueError, match="No observations"):
optimizer.get_best_observations()

with pytest.raises(ValueError, match="No observations"):
optimizer.get_observations()

for _ in range(max_iterations):
config, metadata = optimizer.suggest()
assert isinstance(metadata, pd.DataFrame)

optimizer.register(configs=config, scores=objective(config), metadata=metadata)
bpkroth marked this conversation as resolved.
Show resolved Hide resolved

(all_configs, all_scores, all_contexts, all_metadata) = optimizer.get_observations()
assert isinstance(all_configs, pd.DataFrame)
assert isinstance(all_scores, pd.DataFrame)
assert all_contexts is None
assert isinstance(all_metadata, pd.DataFrame)
assert smac_verify_best(all_metadata)

(best_configs, best_scores, best_contexts, best_metadata) = optimizer.get_best_observations()
assert isinstance(best_configs, pd.DataFrame)
assert isinstance(best_scores, pd.DataFrame)
assert best_contexts is None
assert isinstance(best_metadata, pd.DataFrame)
assert smac_verify_best(best_metadata)
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,21 @@ def objective(point: pd.DataFrame) -> pd.DataFrame:
assert set(observation.columns) == {'main_score', 'other_score'}
optimizer.register(configs=suggestion, scores=observation)

(best_config, best_score, best_context) = optimizer.get_best_observations()
(best_config, best_score, best_context, best_metadata) = optimizer.get_best_observations()
assert isinstance(best_config, pd.DataFrame)
assert isinstance(best_score, pd.DataFrame)
assert best_context is None
assert best_metadata is None
assert set(best_config.columns) == {'x', 'y'}
assert set(best_score.columns) == {'main_score', 'other_score'}
assert best_config.shape == (1, 2)
assert best_score.shape == (1, 2)

(all_configs, all_scores, all_contexts) = optimizer.get_observations()
(all_configs, all_scores, all_contexts, all_metadata) = optimizer.get_observations()
assert isinstance(all_configs, pd.DataFrame)
assert isinstance(all_scores, pd.DataFrame)
assert all_contexts is None
assert all_metadata is None
assert set(all_configs.columns) == {'x', 'y'}
assert set(all_scores.columns) == {'main_score', 'other_score'}
assert all_configs.shape == (max_iterations, 2)
Expand Down
Loading
Loading