-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split registering multiple configs into a seperate function #804
Comments
Looks to me like we are using this functionality already:
It's especially useful for resuming an experiment and re-warming the optimizer with prior results. Back to this change, which was really only meant for code readability improvements, Your proposal of having a class called an Note that we could pretty easily extend some of these helper functions for conversion to other data structure types in the future as well. Some caveats: The An alternative would be to explicitly change the received return type in all callsites, but that might be a bunch of work too. There's probably some other refinements that could happen as well. As I said, this is just a quick stub idea. """Simple dataclass for storing observations from a hyperparameter optimization run."""
import itertools
from dataclasses import dataclass
from typing import List, Optional, Iterator
import pandas as pd
@dataclass(frozen=True)
class Observation:
"""Simple dataclass for storing a single Observation."""
config: pd.Series
score: pd.Series
context: Optional[pd.Series] = None
metadata: Optional[pd.Series] = None
def __iter__(self) -> Iterator[Optional[pd.Series]]:
"""A not quite type correct hack to allow existing code to use the Tuple style
return values.
"""
# Note: this should be a more effecient return type than using astuple()
# which makes deepcopies.
return iter((self.config, self.score, self.context, self.metadata))
@dataclass(frozen=True)
class Observations:
"""Simple dataclass for storing observations from a hyperparameter optimization
run.
"""
configs: pd.DataFrame
scores: pd.DataFrame
context: Optional[pd.DataFrame] = None
metadata: Optional[pd.DataFrame] = None
def __post_init__(self) -> None:
assert len(self.configs) == len(self.scores)
if self.context is not None:
assert len(self.configs) == len(self.context)
if self.metadata is not None:
assert len(self.configs) == len(self.metadata)
def __iter__(self) -> Iterator[Optional[pd.DataFrame]]:
"""A not quite type correct hack to allow existing code to use the Tuple style
return values.
"""
# Note: this should be a more effecient return type than using astuple()
# which makes deepcopies.
return iter((self.configs, self.scores, self.context, self.metadata))
def to_observation_list(self) -> List[Observation]:
"""Convert the Observations object to a list of Observation objects."""
return [
Observation(
config=config,
score=score,
context=context,
metadata=metadata,
)
for config, score, context, metadata in zip(
self.configs.iterrows(),
self.scores.iterrows(),
self.context.iterrows() if self.context is not None else itertools.repeat(None),
self.metadata.iterrows() if self.metadata is not None else itertools.repeat(None),
)
]
def get_observations() -> Observations:
"""Get some dummy observations."""
# Create some dummy data
configs = pd.DataFrame(
{
"x": [1, 2, 3],
"y": [4, 5, 6],
}
)
scores = pd.DataFrame(
{
"score": [0.1, 0.2, 0.3],
}
)
# Create an Observations object
return Observations(configs=configs, scores=scores)
def test_observations() -> None:
"""Test the Observations dataclass."""
# Create an Observations object
observations = get_observations()
observation = observations.to_observation_list()[0]
# Print the Observations object
print(observations)
print(observations.configs, observations.scores)
print(observation)
print(observation.config, observation.score)
# Or in tuple form using the __iter__ method:
configs, scores, contexts, metadatas = get_observations()
print(configs)
print(scores)
print(contexts)
print(metadatas)
config, score, context, metadata = get_observations().to_observation_list()[0]
print(config)
print(score)
print(context)
print(metadata)
if __name__ == "__main__":
test_observations() |
And tbh, that |
Gonna claim that #852 handles this, or at least lays most of the ground work to handle bulk registering more easily. |
Currently, I am working on condensing the return values of suggest, and the arguments of register into a class, which in many ways acts as a named tuple. Because of this, I have been looking into the structure of these classes as per comments on #771.
With the current implementation, it is possible to register multiple configurations simultaneously using the register function, however, this feature does not seem to be utilized in the test cases nor in mlos_bench. Additionally, it causes our observations to be lists of variable-length data frames.
I would propose that instead of supporting bulk registrations directly in the register function, we split this functionality out into an additional function (potentially called bulk_register) that calls into register. This would have the added benefit that it would remove the variable length list of dataframes in our observations as seen in
opimizer.py
.This would then allow for a representation of singular observations through an
Observation
object` turning the type on this object into:Observation would be a implemented as such:
The text was updated successfully, but these errors were encountered: