AX backed HPO component #570

kurman · 2022-07-31T20:20:47Z

Initial TorchX Component for Hyper-parameter tuning (#510)

UX:

Exposes grid_search and bayesian candidate selection strategies and requires input to define search space, eg:

{
  "params": {
    "p1": {
      "type": "float",
      "range": [
          "0.1",
          "1.0"
        ]

    },
    "p2": {
      "type": "str",
      "choice": [
          "sparse",
          "dense"
        ]
    }
  }
}

Further ideas: next things that can be added are constraints on output and constraints on the input pairs/combinations (depending whether underlining library supports that)

Candidate selection is just printed for now, until we get better tracking.

Implementation:

Uses AX Client library (hence limitation on sequential trails). We can migrate to AX's TorchXRunner+AX Scheduler, however this will require get UX correct to define evaluation and metric output processing.

Test plan:

Unit tests

Running locally:

python -m torchx.cli.main run -s local_cwd hpo.bayesian --eval_fn test_script:booth --objective booth_eval --hpo_params_file ./hpo_booth_params.json --hpo_trials 10 --hpo_maximize False`
torchx 2022-07-31 20:19:18 INFO     loaded configs from /home/ubuntu/torchx/.torchxconfig
torchx 2022-07-31 20:19:20 INFO     Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2022-07-31 20:19:20 INFO     Log directory is: /tmp/torchx_t11ek_1z
local_cwd://torchx/foo:booth-v1mnf6hxttv6w
torchx 2022-07-31 20:19:20 INFO     Waiting for the app to finish...
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Starting optimization with verbose logging. To disable logging, set the `verbose_logging` argument to `False`. Note that float values in the logs are rounded to 6 decimal points.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter x1. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Inferred value type of ParameterType.FLOAT for parameter x2. If that is not the expected value type, you can explicity specify 'value_type' ('int', 'float', 'bool' or 'str') in parameter dict.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.utils.instantiation: Created search space: SearchSpace(parameters=[RangeParameter(name='x1', parameter_type=FLOAT, range=[0.0, 1.0]), RangeParameter(name='x2', parameter_type=FLOAT, range=[0.0, 1.0])], parameter_constraints=[]).
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 0 with parameters {'x1': 0.21218, 'x2': 0.54938}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 0 with data: {'booth_eval': (48.576151, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 1 with parameters {'x1': 0.7242, 'x2': 0.036728}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 1 with data: {'booth_eval': (50.823399, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 2 with parameters {'x1': 0.490495, 'x2': 0.727069}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 2 with data: {'booth_eval': (36.393588, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 3 with parameters {'x1': 0.187858, 'x2': 0.654356}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 3 with data: {'booth_eval': (46.048082, None)}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Generated new trial 4 with parameters {'x1': 0.792806, 'x2': 0.731653}.
foo:booth/0 [INFO 07-31 20:19:21] ax.service.ax_client: Completed trial 4 with data: {'booth_eval': (29.70154, None)}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Generated new trial 5 with parameters {'x1': 0.872643, 'x2': 0.944132}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Completed trial 5 with data: {'booth_eval': (23.308695, None)}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Generated new trial 6 with parameters {'x1': 1.0, 'x2': 1.0}.
foo:booth/0 [INFO 07-31 20:19:22] ax.service.ax_client: Completed trial 6 with data: {'booth_eval': (20.0, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 7 with parameters {'x1': 1.0, 'x2': 0.916478}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 7 with data: {'booth_eval': (21.705324, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 8 with parameters {'x1': 0.142224, 'x2': 0.94607}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 8 with data: {'booth_eval': (38.86652, None)}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Generated new trial 9 with parameters {'x1': 0.51109, 'x2': 0.36712}.
foo:booth/0 [INFO 07-31 20:19:23] ax.service.ax_client: Completed trial 9 with data: {'booth_eval': (46.153399, None)}.
torchx 2022-07-31 20:19:25 INFO     Job finished: SUCCEEDED

codecov · 2022-07-31T20:26:48Z

Codecov Report

Merging #570 (746ae6b) into main (b051e3f) will decrease coverage by 0.13%.
The diff coverage is 92.30%.

@@            Coverage Diff             @@
##             main     #570      +/-   ##
==========================================
- Coverage   94.85%   94.71%   -0.14%     
==========================================
  Files          66       68       +2     
  Lines        4042     4185     +143     
==========================================
+ Hits         3834     3964     +130     
- Misses        208      221      +13

Impacted Files	Coverage Δ
torchx/components/hpo_runner.py	`91.93% <91.93%> (ø)`
torchx/components/hpo.py	`94.73% <94.73%> (ø)`
torchx/util/entrypoints.py	`89.28% <0.00%> (-10.72%)`	⬇️
torchx/specs/file_linter.py	`99.30% <0.00%> (+0.69%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

facebook-github-bot · 2022-08-03T17:48:14Z

@kurman has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

AX backed HPO component

746ae6b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AX backed HPO component #570

AX backed HPO component #570

kurman commented Jul 31, 2022

codecov bot commented Jul 31, 2022 •

edited

facebook-github-bot commented Aug 3, 2022

AX backed HPO component #570

Are you sure you want to change the base?

AX backed HPO component #570

Conversation

kurman commented Jul 31, 2022

UX:

Implementation:

codecov bot commented Jul 31, 2022 • edited

Codecov Report

facebook-github-bot commented Aug 3, 2022

codecov bot commented Jul 31, 2022 •

edited