You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In _check_operation_status() of AzureDeploymentService, debug logging relying on a JSON response from the management operation should be done after checking for response status code is 200.
In this case, we have been having cryptic JSON decoding errors that are hard to reproduce (trying to decode empty JSON). However looking more carefully into the stack trace, we can see that a 503 response caused this issue. 503 is a Service Unavailable issue from the compute-side API so that should be handled separately, but it would be good to surface the real error instead of JSON decoding issue.
2024-07-24 07:10:42,882 azure_deployment_services.py:238 _check_operation_status DEBUG Response: <Response [200]>
{
"startTime": "2024-07-24T06:52:41.3832595+00:00",
"status": "InProgress",
"name": "ced88d6f-259f-4fba-8636-376158009bba"
}
2024-07-24 07:10:42,883 azure_deployment_services.py:316 _wait_while DEBUG Sleep for: 6.11 of 20.00 s
2024-07-24 07:10:48,999 azure_auth.py:119 get_access_token DEBUG Time to renew the token: 940.00 sec.
2024-07-24 07:10:49,003 connectionpool.py:1052 _new_conn DEBUG Starting new HTTPS connection (1): management.azure.com:443
2024-07-24 07:12:40,005 connectionpool.py:546 _make_request DEBUG https://management.azure.com:443 "GET /subscriptions/<REDACTED>/providers/Microsoft.Compute/locations/eastus2/operations/<REDACTED> HTTP/1.1" 503 988
2024-07-24 07:12:40,081 base_scheduler.py:132 __exit__ WARNING Scheduler END :: SyncScheduler
Traceback (most recent call last):
File "/opt/conda/envs/mlos/lib/python3.11/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlos/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlos/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlos/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/run.py", line 29, in _main
scheduler_context.start()
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/schedulers/sync_scheduler.py", line 39, in start
self._run_schedule(is_warm_up)
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/schedulers/base_scheduler.py", line 243, in _run_schedule
self.run_trial(trial)
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/schedulers/sync_scheduler.py", line 56, in run_trial
(status, timestamp, results) = self.environment.run() # Block and wait for the final result.
^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/environments/composite_env.py", line 204, in run
(status, timestamp, metrics) = env_context.run()
^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/environments/composite_env.py", line 204, in run
(status, timestamp, metrics) = env_context.run()
^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/environments/composite_env.py", line 204, in run
(status, timestamp, metrics) = env_context.run()
^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/environments/remote/remote_env.py", line 139, in run
(status, timestamp, output) = self._remote_exec(self._script_run)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/environments/remote/remote_env.py", line 176, in _remote_exec
(status, output) = self._remote_exec_service.get_remote_exec_results(output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/services/remote/azure/azure_vm_services.py", line 512, in get_remote_exec_results
(status, result) = self.wait_host_operation(config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/services/remote/azure/azure_vm_services.py", line 217, in wait_host_operation
return self._wait_while(self._check_operation_status, Status.RUNNING, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/services/remote/azure/azure_deployment_services.py", line 319, in _wait_while
(status, output) = func(params)
^^^^^^^^^^^^
File "/workspaces/MySQL-Autotuning/MLOS/mlos_bench/mlos_bench/services/remote/azure/azure_deployment_services.py", line 239, in _check_operation_status
json.dumps(response.json(), indent=2)
^^^^^^^^^^^^^^^
File "/opt/conda/envs/mlos/lib/python3.11/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The text was updated successfully, but these errors were encountered:
In
_check_operation_status()
ofAzureDeploymentService
, debug logging relying on a JSON response from the management operation should be done after checking for response status code is 200.In this case, we have been having cryptic JSON decoding errors that are hard to reproduce (trying to decode empty JSON). However looking more carefully into the stack trace, we can see that a 503 response caused this issue. 503 is a Service Unavailable issue from the compute-side API so that should be handled separately, but it would be good to surface the real error instead of JSON decoding issue.
The text was updated successfully, but these errors were encountered: