Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AX backed HPO component #570

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

AX backed HPO component #570

wants to merge 1 commit into from

Conversation

kurman
Copy link
Contributor

@kurman kurman commented Jul 31, 2022

Initial TorchX Component for Hyper-parameter tuning (#510)

UX:

Exposes grid_search and bayesian candidate selection strategies and requires input to define search space, eg:

{
  "params": {
    "p1": {
      "type": "float",
      "range": [
          "0.1",
          "1.0"
        ]

    },
    "p2": {
      "type": "str",
      "choice": [
          "sparse",
          "dense"
        ]
    }
  }
}

Further ideas: next things that can be added are constraints on output and constraints on the input pairs/combinations (depending whether underlining library supports that)

Candidate selection is just printed for now, until we get better tracking.

Implementation:

Uses AX Client library (hence limitation on sequential trails). We can migrate to AX's TorchXRunner+AX Scheduler, however this will require get UX correct to define evaluation and metric output processing.

Test plan:

  • Unit tests

Running locally:

python -m torchx.cli.main run -s local_cwd hpo.bayesian --eval_fn test_script:booth --objective booth_eval --hpo_params_file ./hpo_booth_params.json --hpo_trials 10 --hpo_maximize False`
torchx 2022-07-31 20:19:18 INFO     loaded configs from /home/ubuntu/torchx/.torchxconfig
torchx 2022-07-31 20:19:20 INFO     Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2022-07-31 20:19:20 INFO     Log directory is: /tmp/torchx_t11ek_1z
local_cwd://torchx/foo:booth-v1mnf6hxttv6w
torchx 2022-07-31 20:19:20 INFO     Waiting for the app to finish...
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Starting optimization with verbose logging. To disable logging, set the `verbose_logging` argument to `False`. Note that float values in the logs are rounded to 6 decimal points.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter x1. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter x2. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='x1', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x2', parameter_type=FLOAT, range=[0.0, 1.0])], parameter_constraints=[]).
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 0 with parameters {'x1': 0.21218, 'x2': 0.54938}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 0 with data: {'booth_eval': (48.576151, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 1 with parameters {'x1': 0.7242, 'x2': 0.036728}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 1 with data: {'booth_eval': (50.823399, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 2 with parameters {'x1': 0.490495, 'x2': 0.727069}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 2 with data: {'booth_eval': (36.393588, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 3 with parameters {'x1': 0.187858, 'x2': 0.654356}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 3 with data: {'booth_eval': (46.048082, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 4 with parameters {'x1': 0.792806, 'x2': 0.731653}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 4 with data: {'booth_eval': (29.70154, None)}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Generated new trial 5 with parameters {'x1': 0.872643, 'x2': 0.944132}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Completed trial 5 with data: {'booth_eval': (23.308695, None)}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Generated new trial 6 with parameters {'x1': 1.0, 'x2': 1.0}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Completed trial 6 with data: {'booth_eval': (20.0, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 7 with parameters {'x1': 1.0, 'x2': 0.916478}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 7 with data: {'booth_eval': (21.705324, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 8 with parameters {'x1': 0.142224, 'x2': 0.94607}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 8 with data: {'booth_eval': (38.86652, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 9 with parameters {'x1': 0.51109, 'x2': 0.36712}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 9 with data: {'booth_eval': (46.153399, None)}.
torchx 2022-07-31 20:19:25 INFO     Job finished: SUCCEEDED

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2022
@codecov
Copy link

codecov bot commented Jul 31, 2022

Codecov Report

Merging #570 (746ae6b) into main (b051e3f) will decrease coverage by 0.13%.
The diff coverage is 92.30%.

@@            Coverage Diff             @@
##             main     #570      +/-   ##
==========================================
- Coverage   94.85%   94.71%   -0.14%     
==========================================
  Files          66       68       +2     
  Lines        4042     4185     +143     
==========================================
+ Hits         3834     3964     +130     
- Misses        208      221      +13     
Impacted Files Coverage Δ
torchx/components/hpo_runner.py 91.93% <91.93%> (ø)
torchx/components/hpo.py 94.73% <94.73%> (ø)
torchx/util/entrypoints.py 89.28% <0.00%> (-10.72%) ⬇️
torchx/specs/file_linter.py 99.30% <0.00%> (+0.69%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@facebook-github-bot
Copy link
Contributor

@kurman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants