Support Azure VM Agent #2014

jsommr · 2024-04-07T04:21:25Z

It's pretty cool that Nanos runs on Azure. It's the only unikernel able to do that so far.

I'd like to put it in a VM Scale Set with autoscaling based on cpu and memory usage, but memory metrics are only available with an agent.

Surely this is easier said than done, but now there's an issue on it.

Thanks for a great product!

francescolavra · 2024-04-27T08:27:08Z

Guest OS metrics such as memory usage are only available for a given VM or VM Scale Set if the Azure diagnostic extension has been enabled in that VM or VMSS. Otherwise, such metrics cannot be viewed in the Metrics section of the Azure portal, and cannot be used in autoscale trigger rules, even if the VM is correctly sending metrics data; one of the reasons for this is that metrics data is stored in tables in a given storage account, and the Azure portal needs to know what storage account is being used and takes this information from the configuration settings of the diagnostic extension.
Enabling an extension (either via a deployment template for a VMSS, or via the Azure portal or the Azure client for a VM) requires communication with the VM agent running in a VM, which is instructed to download, install, configure, and run the extension; if no agent is running, enabling the extension fails, and guest OS metrics are not available. So if we want this to work on Nanos, we have to implement (in addition to the code that sends the metrics to a storage account) an agent that responds to requests to enable extensions (and possibly to requests for configuration settings of a given extension).

This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, both as number of bytes as a percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the folowing attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create and add entities to Azure storage tables in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all samples; these data are insterted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is the representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metric data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal doe not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanod manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually applies the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014

This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, as both number of bytes and percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the following attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create Azure storage tables and add table entities in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all values; these data are insterted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is a representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metrics data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal does not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanos manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually apply the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014

francescolavra · 2024-05-12T14:58:06Z

#2022 implements an Azure VM agent and a diagnostic extension that publishes memory metrics. Please note that the extension is enabled and configured via manifest options when creating a Nanos image; it sends metrics regardless of whether the diagnostic extension is enabled in the Azure portal, but in order for these metrics to show up in the charts you need to enable the extension (in fact, without enabling the extension you cannot even select guest OS metrics in the charts).

jsommr · 2024-05-12T18:06:03Z

Awesome! Thanks!

This change adds a new "azure" klib that implements an Azure extension similar to the Linux Diagnostic extension. The current implementation supports sending 4 types of memory metrics (i.e. available and used memory, as both number of bytes and percentage of total memory). This klib is configured in the manifest options via an "azure" tuple; the diagnostic functionalities are enabled and configured by inserting a "diagnostic" tuple with the following attributes: - storage_account: indicates the Azure storage account to be used to store metrics data generated by the klib; the storage account must be located in the same region as the region where the Azure instance is deployed - storage_account_sas: Shared Access Signature token for accessing the storage account: this token must have proper permissions to create Azure storage tables and add table entities in the above storage account; SAS tokens for a given storage account can be generated for example via the Azure portal in the "Security + networking" menu. - metrics: tuple that enables sending memory metrics; it can contain 2 optional attributes: - sample_interval: interval expressed in seconds at which metrics data is collected (default: 15) - transfer_interval: interval expressed in seconds at which metrics data is aggregated and sent to the storage account (default: 60) Example snippet of Ops configuration file: ``` "ManifestPassthrough": { "azure": { "diagnostics": { "storage_account": "mystorageaccount", "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz", "metrics": {"sample_interval": "15","transfer_interval": "60"} } } } ``` Aggregated memory metrics data consist of the number of samples, the minimum, maximum, last, and average value, and the sum of all values; these data are inserted in an Azure storage table (one entity per aggregated data). The name of the table is in the format "WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer interval expressed with ISO8601 format, and yyyymmdd is a representation of the 10-day date interval to which the metrics refer (thus, a new table is created every 10 days). For example, a table named WADMetricsPT1MP10DV2S20240503 contains metrics data aggregated every minute ("PT1M" is the ISO8601 representation of a 1-minute period) generated for a 10-day period starting on May 3, 2024. By default, the Azure portal does not display these metrics in its charts; in order for metrics to be available in the portal, the Linux Diagnostics Extension must be enabled and configured in a running instance (this can be done in the "Diagnostic settings" section in the portal) to match the settings in the Nanos manifest options. More specifically, the storage account and the metric aggregation interval specified in the Azure diagnostic settings must match those specified in the manifest options. Note: the Azure VM agent implemented in the cloud_init klib responds to requests to enable and configure the diagnostic extension, but does not actually apply the extension settings specified in the requests; instead, it always applies the settings from the manifest. Closes #2014

eyberg added the azure label Apr 8, 2024

francescolavra self-assigned this Apr 18, 2024

francescolavra mentioned this issue May 12, 2024

Azure: implement sending memory metrics via diagnostic extension #2022

Merged

jsommr closed this as completed May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Azure VM Agent #2014

Support Azure VM Agent #2014

jsommr commented Apr 7, 2024 •

edited

francescolavra commented Apr 27, 2024

francescolavra commented May 12, 2024

jsommr commented May 12, 2024

Support Azure VM Agent #2014

Support Azure VM Agent #2014

Comments

jsommr commented Apr 7, 2024 • edited

francescolavra commented Apr 27, 2024

francescolavra commented May 12, 2024

jsommr commented May 12, 2024

jsommr commented Apr 7, 2024 •

edited