Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Optimizer state and Multi-Fidelity passthrough in SMAC #751

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7f8a43b
minimal implementation of mutli-fidelity
jsfreischuetz May 22, 2024
8afb5f0
revert changes
jsfreischuetz May 23, 2024
08575af
revert
jsfreischuetz May 23, 2024
fcfca53
fix minor bug with logging
jsfreischuetz Jun 1, 2024
838c1db
undo formatting
jsfreischuetz Jun 1, 2024
7533b4e
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 3, 2024
3904020
merge
jsfreischuetz Jun 3, 2024
4ffff6c
Merge branch 'microsoft-main' into multifidleity
jsfreischuetz Jun 3, 2024
b7de120
merge
jsfreischuetz Jun 3, 2024
7278994
add checks back to optimizer
jsfreischuetz Jun 4, 2024
c79294a
add checks back
jsfreischuetz Jun 4, 2024
048269c
add checks back
jsfreischuetz Jun 4, 2024
019192a
update name of context to metadata, and add readme
jsfreischuetz Jun 5, 2024
88d63c1
update tests to also use correct terminology
jsfreischuetz Jun 5, 2024
4e36f28
Update mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimi…
jsfreischuetz Jun 6, 2024
3326ac9
Update mlos_core/mlos_core/optimizers/README.md
jsfreischuetz Jun 6, 2024
2399d3e
Update mlos_core/mlos_core/optimizers/README.md
jsfreischuetz Jun 6, 2024
1f210b5
Add context back to the register interface
jsfreischuetz Jun 6, 2024
87a5af9
Merge branch 'main' into multifidleity
motus Jun 7, 2024
48af70f
Apply suggestions from code review
bpkroth Jun 12, 2024
cd8deff
Merge branch 'main' into multifidleity
motus Jun 12, 2024
271a79b
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
98c7398
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
9726410
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
bf4602b
Update mlos_core/mlos_core/optimizers/optimizer.py
jsfreischuetz Jun 12, 2024
8d2a894
fix comments for python
jsfreischuetz Jun 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
"sklearn",
"skopt",
"smac",
"SOBOL",
"sqlalchemy",
"srcpaths",
"subcmd",
Expand Down
2 changes: 1 addition & 1 deletion mlos_bench/mlos_bench/optimizers/mlos_core_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def suggest(self) -> TunableGroups:
tunables = super().suggest()
if self._start_with_defaults:
_LOG.info("Use default values for the first trial")
df_config = self._opt.suggest(defaults=self._start_with_defaults)
df_config, _ = self._opt.suggest(defaults=self._start_with_defaults)
self._start_with_defaults = False
_LOG.info("Iteration %d :: Suggest:\n%s", self._iter, df_config)
return tunables.assign(
Expand Down
27 changes: 27 additions & 0 deletions mlos_core/mlos_core/optimizers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Optimizers

This is a directory that contains wrappers for different optimizers to integrate into MLOS.
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
This is implemented though child classes for the `BaseOptimizer` class defined in `optimizer.py`.

The main goal of these optimizers is to `suggest` configurations, possibly based on prior trial data to find an optimum based on some objective(s).
This process is interacted with through `register` and `suggest` interfaces.

The following defintions are useful for understanding the implementation
- `configuration`: a vector representation of a configuration of a system to be evaluated.
- `score`: the objective(s) associated with a configuration
- `metadata`: additional information about the evaluation, such as the runtime budget used during evaluation.
- `context`: additional (static) information about the evaluation used to extend the internal model used for suggesting samples.
For instance, a descriptor of the VM size (vCore count and # of GB of RAM), and some descriptor of the workload.
The intent being to allow either sharing or indexing of trial info between "similar" experiments in order to help make the optimization process more efficient for new scenarios.
> Note: This is not yet implemented.

The interface for these classes can be described as follows:

- `register`: this is a function that takes a configuration, a score, and, optionally, metadata about the evaluation to update the model for future evaluations.
- `suggest`: this function returns a new configuration for evaluation.

Some optimizers will return additional metadata for evaluation, that should be used during the register phase.
This function can also optionally take context (not yet implemented), and an argument to force the function to return the default configuration.
- `register_pending`: registers a configuration and metadata pair as pending to the optimizer.
- `get_observations`: returns all observations reproted to the optimizer as a triplet of DataFrames (config, score, context, metadata).
- `get_best_observations`: returns the best observation as a triplet of best (config, score, context, metadata) DataFrames.
343 changes: 276 additions & 67 deletions mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

Large diffs are not rendered by default.

17 changes: 11 additions & 6 deletions mlos_core/mlos_core/optimizers/flaml_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
Contains the FlamlOptimizer class.
"""

from typing import Dict, List, NamedTuple, Optional, Union
from typing import Dict, List, NamedTuple, Optional, Tuple, Union
from warnings import warn

import ConfigSpace
Expand Down Expand Up @@ -86,7 +86,7 @@ def __init__(self, *, # pylint: disable=too-many-arguments
self._suggested_config: Optional[dict]

def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations and scores.

Parameters
Expand All @@ -96,12 +96,15 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,

scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : None
Not Yet Implemented.
metadata : None
Not Yet Implemented.
"""
if context is not None:
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
if metadata is not None:
warn(f"Not Implemented: Ignoring metadata {list(metadata.columns)}", UserWarning)
for (_, config), (_, score) in zip(configurations.astype('O').iterrows(), scores.iterrows()):
cs_config: ConfigSpace.Configuration = ConfigSpace.Configuration(
self.optimizer_parameter_space, values=config.to_dict())
Expand All @@ -112,7 +115,9 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
score=float(np.average(score.astype(float), weights=self._objective_weights)),
)

def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
def _suggest(
self, context: Optional[pd.DataFrame] = None
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""Suggests a new configuration.

Sampled at random using ConfigSpace.
Expand All @@ -130,10 +135,10 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
if context is not None:
warn(f"Not Implemented: Ignoring context {list(context.columns)}", UserWarning)
config: dict = self._get_next_config()
return pd.DataFrame(config, index=[0])
return pd.DataFrame(config, index=[0]), None

def register_pending(self, configurations: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
metadata: Optional[pd.DataFrame] = None) -> None:
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
raise NotImplementedError()

def _target_function(self, config: dict) -> Union[dict, None]:
Expand Down
99 changes: 66 additions & 33 deletions mlos_core/mlos_core/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class BaseOptimizer(metaclass=ABCMeta):

def __init__(self, *,
parameter_space: ConfigSpace.ConfigurationSpace,
optimization_targets: List[str],
optimization_targets: Optional[Union[str, List[str]]] = None,
jsfreischuetz marked this conversation as resolved.
Show resolved Hide resolved
objective_weights: Optional[List[float]] = None,
space_adapter: Optional[BaseSpaceAdapter] = None):
"""
Expand Down Expand Up @@ -56,9 +56,11 @@ def __init__(self, *,
raise ValueError("Number of weights must match the number of optimization targets")

self._space_adapter: Optional[BaseSpaceAdapter] = space_adapter
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]] = []
self._observations: List[Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets a bit unwieldy. Should we have a named tuple for _pending_observations, e.g.,

class PendingObservation(NamedTuple):
    """A named tuple representing a pending observation."""

    configurations: pd.DataFrame
    context: Optional[pd.DataFrame]
    meta: Optional[pd.DataFrame]

and do the same for _observations? (not sure we can inherit NamedTuples - most likely, we can't)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or, maybe, have _observed_configs, _observed_scores, _observed_contexts etc. and concatenate the dataframes instead of having a list of dataframes. I am pretty sure the schemas are the same from one _register call to the next

self._has_context: Optional[bool] = None
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame]]] = []
self._pending_observations: List[Tuple[pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]] = []
self.delayed_config: Optional[pd.DataFrame] = None
self.delayed_metadata: Optional[pd.DataFrame] = None
Comment on lines +62 to +63
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.delayed_config: Optional[pd.DataFrame] = None
self.delayed_metadata: Optional[pd.DataFrame] = None
self._delayed_config: Optional[pd.DataFrame] = None
self._delayed_metadata: Optional[pd.DataFrame] = None

probably should be private, right?


def __repr__(self) -> str:
return f"{self.__class__.__name__}(space_adapter={self.space_adapter})"
Expand All @@ -69,7 +71,7 @@ def space_adapter(self) -> Optional[BaseSpaceAdapter]:
return self._space_adapter

def register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None:
"""Wrapper method, which employs the space adapter (if any), before registering the configurations and scores.

Parameters
Expand All @@ -78,34 +80,40 @@ def register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations.
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : pd.DataFrame
Not Yet Implemented.
Not implemented yet.
metadata : pd.DataFrame
Implementation depends on instance (e.g., saved optimizer state to return).
"""
# Do some input validation.
assert set(scores.columns) == set(self._optimization_targets), \
"Mismatched optimization targets."
if type(self._optimization_targets) is str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if type(self._optimization_targets) is str:
assert self._optimization_targets, "Missing or invalid optimization targets"
if type(self._optimization_targets) is str:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also assert not empty (see also comment above about accepting None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes sense given my comment above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but separate PR for that one please

assert self._optimization_targets in scores.columns, "Mismatched optimization targets."
if type(self._optimization_targets) is list:
assert set(scores.columns) >= set(self._optimization_targets), "Mismatched optimization targets."
assert self._has_context is None or self._has_context ^ (context is None), \
"Context must always be added or never be added."
assert len(configurations) == len(scores), \
"Mismatched number of configurations and scores."
if context is not None:
assert len(configurations) == len(context), \
"Mismatched number of configurations and context."
if metadata is not None:
assert len(configurations) == len(metadata), \
"Mismatched number of configurations and metadata."
assert configurations.shape[1] == len(self.parameter_space.values()), \
"Mismatched configuration shape."
self._observations.append((configurations, scores, context))
self._observations.append((configurations, scores, context, metadata))
self._has_context = context is not None

if self._space_adapter:
configurations = self._space_adapter.inverse_transform(configurations)
assert configurations.shape[1] == len(self.optimizer_parameter_space.values()), \
"Mismatched configuration shape after inverse transform."
return self._register(configurations, scores, context)
return self._register(configurations, scores, metadata, context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


@abstractmethod
def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
def _register(self, *, configurations: pd.DataFrame, scores: pd.DataFrame,

Can force the args to be named to help avoid param ordering mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the elsewhere (e.g., public methods and suggest), though this might be a larger API change that needs its own PR first in prepration for this one since callers will also be affected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's fix _register now and update the public register in the next PR

context: Optional[pd.DataFrame] = None) -> None:
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations and scores.

Parameters
Expand All @@ -114,13 +122,16 @@ def _register(self, configurations: pd.DataFrame, scores: pd.DataFrame,
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations.
scores : pd.DataFrame
Scores from running the configurations. The index is the same as the index of the configurations.

context : pd.DataFrame
Not Yet Implemented.
Not implemented yet.
metadata : pd.DataFrame
Implementaton depends on instance.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False) -> pd.DataFrame:
def suggest(
self, context: Optional[pd.DataFrame] = None, defaults: bool = False
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""
Wrapper method, which employs the space adapter (if any), after suggesting a new configuration.

Expand All @@ -136,13 +147,25 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False
-------
configuration : pd.DataFrame
Pandas dataframe with a single row. Column names are the parameter names.
metadata : pd.DataFrame
Pandas dataframe with a single row containing the metadata.
Column names are the budget, seed, and instance of the evaluation, if valid.
"""
if defaults:
configuration = config_to_dataframe(self.parameter_space.get_default_configuration())
self.delayed_config, self.delayed_metadata = self._suggest(context)

configuration: pd.DataFrame = config_to_dataframe(
self.parameter_space.get_default_configuration()
)
metadata = self.delayed_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when creating PRs - try to keep your changes smaller. It's easier to review and debug.
If the order of this one didn't really matter you could have left the first line alone and only added the two new ones

if self.space_adapter is not None:
configuration = self.space_adapter.inverse_transform(configuration)
else:
configuration = self._suggest(context)
if self.delayed_config is None:
configuration, metadata = self._suggest(metadata)
else:
configuration, metadata = self.delayed_config, self.delayed_metadata
self.delayed_config, self.delayed_metadata = None, None
assert len(configuration) == 1, \
"Suggest must return a single configuration."
assert set(configuration.columns).issubset(set(self.optimizer_parameter_space)), \
Expand All @@ -151,10 +174,12 @@ def suggest(self, context: Optional[pd.DataFrame] = None, defaults: bool = False
configuration = self._space_adapter.transform(configuration)
assert set(configuration.columns).issubset(set(self.parameter_space)), \
"Space adapter produced a configuration that does not match the expected parameter space."
return configuration
return configuration, metadata

@abstractmethod
def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
def _suggest(
self, context: Optional[pd.DataFrame] = None
) -> Tuple[pd.DataFrame, Optional[pd.DataFrame]]:
"""Suggests a new configuration.

Parameters
Expand All @@ -166,12 +191,16 @@ def _suggest(self, context: Optional[pd.DataFrame] = None) -> pd.DataFrame:
-------
configuration : pd.DataFrame
Pandas dataframe with a single row. Column names are the parameter names.

metadata : pd.DataFrame
Pandas dataframe with a single row containing the metadata.
Column names are the budget, seed, and instance of the evaluation, if valid.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

@abstractmethod
def register_pending(self, configurations: pd.DataFrame,
context: Optional[pd.DataFrame] = None) -> None:
context: Optional[pd.DataFrame] = None, metadata: Optional[pd.DataFrame] = None) -> None:
"""Registers the given configurations as "pending".
That is it say, it has been suggested by the optimizer, and an experiment trial has been started.
This can be useful for executing multiple trials in parallel, retry logic, etc.
Expand All @@ -181,30 +210,34 @@ def register_pending(self, configurations: pd.DataFrame,
configurations : pd.DataFrame
Dataframe of configurations / parameters. The columns are parameter names and the rows are the configurations.
context : pd.DataFrame
Not Yet Implemented.
Not implemented yet.
metadata : pd.DataFrame
Implementaton depends on instance.
"""
pass # pylint: disable=unnecessary-pass # pragma: no cover

def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def get_observations(self) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
"""
Returns the observations as a triplet of DataFrames (config, score, context).
Returns the observations as a triplet of DataFrames (config, score, context, metadata).

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of (config, score, context) DataFrames of observations.
A triplet of (config, score, metadata) DataFrames of observations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A triplet of (config, score, metadata) DataFrames of observations.
A 4-tuple of (config, score, context, metadata) DataFrames of observations.

(or, better yet, a NamedTuple)

"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
configs = pd.concat([config for config, _, _ in self._observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _ in self._observations]).reset_index(drop=True)
configs = pd.concat([config for config, _, _, _ in self._observations]).reset_index(drop=True)
scores = pd.concat([score for _, score, _, _ in self._observations]).reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's have a List[Observation] NamedTuples - or, better yet, concatenate the dataframes right there in _register and forget about List and NamedTuple

contexts = pd.concat([pd.DataFrame() if context is None else context
for _, _, context in self._observations]).reset_index(drop=True)
return (configs, scores, contexts if len(contexts.columns) > 0 else None)
for _, _, context, _ in self._observations]).reset_index(drop=True)
metadatas = pd.concat([pd.DataFrame() if metadata is None else metadata
for _, _, _, metadata in self._observations]).reset_index(drop=True)
return (configs, scores, contexts, metadatas if len(metadatas.columns) > 0 else None)

def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]:
def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]:
"""
Get the N best observations so far as a triplet of DataFrames (config, score, context).
Get the N best observations so far as a triplet of DataFrames (config, score, metadata).
Default is N=1. The columns are ordered in ASCENDING order of the optimization targets.
The function uses `pandas.DataFrame.nsmallest(..., keep="first")` method under the hood.

Expand All @@ -215,15 +248,15 @@ def get_best_observations(self, n_max: int = 1) -> Tuple[pd.DataFrame, pd.DataFr

Returns
-------
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame]]
A triplet of best (config, score, context) DataFrames of best observations.
observations : Tuple[pd.DataFrame, pd.DataFrame, Optional[pd.DataFrame], Optional[pd.DataFrame]]
A triplet of best (config, score, context, metadata) DataFrames of best observations.
"""
if len(self._observations) == 0:
raise ValueError("No observations registered yet.")
(configs, scores, contexts) = self.get_observations()
(configs, scores, contexts, metadatas) = self.get_observations()
idx = scores.nsmallest(n_max, columns=self._optimization_targets, keep="first").index
return (configs.loc[idx], scores.loc[idx],
None if contexts is None else contexts.loc[idx])
None if contexts is None else contexts.loc[idx], None if metadatas is None else metadatas.loc[idx])

def cleanup(self) -> None:
"""
Expand Down
Loading