Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Azure VM Agent #2014

Closed
jsommr opened this issue Apr 7, 2024 · 3 comments · Fixed by #2022
Closed

Support Azure VM Agent #2014

jsommr opened this issue Apr 7, 2024 · 3 comments · Fixed by #2022
Assignees
Labels

Comments

@jsommr
Copy link

jsommr commented Apr 7, 2024

It's pretty cool that Nanos runs on Azure. It's the only unikernel able to do that so far.

I'd like to put it in a VM Scale Set with autoscaling based on cpu and memory usage, but memory metrics are only available with an agent.

Surely this is easier said than done, but now there's an issue on it.

Thanks for a great product!

@eyberg eyberg added the azure label Apr 8, 2024
@francescolavra francescolavra self-assigned this Apr 18, 2024
@francescolavra
Copy link
Member

Guest OS metrics such as memory usage are only available for a given VM or VM Scale Set if the Azure diagnostic extension has been enabled in that VM or VMSS. Otherwise, such metrics cannot be viewed in the Metrics section of the Azure portal, and cannot be used in autoscale trigger rules, even if the VM is correctly sending metrics data; one of the reasons for this is that metrics data is stored in tables in a given storage account, and the Azure portal needs to know what storage account is being used and takes this information from the configuration settings of the diagnostic extension.
Enabling an extension (either via a deployment template for a VMSS, or via the Azure portal or the Azure client for a VM) requires communication with the VM agent running in a VM, which is instructed to download, install, configure, and run the extension; if no agent is running, enabling the extension fails, and guest OS metrics are not available. So if we want this to work on Nanos, we have to implement (in addition to the code that sends the metrics to a storage account) an agent that responds to requests to enable extensions (and possibly to requests for configuration settings of a given extension).

francescolavra added a commit that referenced this issue May 12, 2024
This change adds a new "azure" klib that implements an Azure
extension similar to the Linux Diagnostic extension.
The current implementation supports sending 4 types of memory
metrics (i.e. available and used memory, both as number of bytes
as a percentage of total memory).
This klib is configured in the manifest options via an "azure"
tuple; the diagnostic functionalities are enabled and configured by
inserting a "diagnostic" tuple with the folowing attributes:
- storage_account: indicates the Azure storage account to be used
to store metrics data generated by the klib; the storage account
must be located in the same region as the region where the Azure
instance is deployed
- storage_account_sas: Shared Access Signature token for accessing
the storage account: this token must have proper permissions to
create and add entities to Azure storage tables in the above
storage account; SAS tokens for a given storage account can be
generated for example via the Azure portal in the
"Security + networking" menu.
- metrics: tuple that enables sending memory metrics; it can
contain 2 optional attributes:
  - sample_interval: interval expressed in seconds at which metrics
data is collected (default: 15)
  - transfer_interval: interval expressed in seconds at which
metrics data is aggregated and sent to the storage account
(default: 60)

Example snippet of Ops configuration file:
```
"ManifestPassthrough": {
  "azure": {
    "diagnostics": {
      "storage_account": "mystorageaccount",
      "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz",
      "metrics": {"sample_interval": "15","transfer_interval": "60"}
    }
  }
}
```

Aggregated memory metrics data consist of the number of samples,
the minimum, maximum, last, and average value, and the sum of all
samples; these data are insterted in an Azure storage table (one
entity per aggregated data). The name of the table is in the format
"WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer
interval expressed with ISO8601 format, and yyyymmdd is the
representation of the 10-day date interval to which the metrics
refer (thus, a new table is created every 10 days). For example, a
table named WADMetricsPT1MP10DV2S20240503 contains metric data
aggregated every minute ("PT1M" is the ISO8601 representation of a
1-minute period) generated for a 10-day period starting on May 3,
2024.

By default, the Azure portal doe not display these metrics in its
charts; in order for metrics to be available in the portal, the
Linux Diagnostics Extension must be enabled and configured in a
running instance (this can be done in the "Diagnostic settings"
section in the portal) to match the settings in the Nanod manifest
options. More specifically, the storage account and the metric
aggregation interval specified in the Azure diagnostic settings
must match those specified in the manifest options.
Note: the Azure VM agent implemented in the cloud_init klib
responds to requests to enable and configure the diagnostic
extension, but does not actually applies the extension settings
specified in the requests; instead, it always applies the settings
from the manifest.

Closes #2014
francescolavra added a commit that referenced this issue May 12, 2024
This change adds a new "azure" klib that implements an Azure
extension similar to the Linux Diagnostic extension.
The current implementation supports sending 4 types of memory
metrics (i.e. available and used memory, as both number of bytes
and percentage of total memory).
This klib is configured in the manifest options via an "azure"
tuple; the diagnostic functionalities are enabled and configured by
inserting a "diagnostic" tuple with the following attributes:
- storage_account: indicates the Azure storage account to be used
to store metrics data generated by the klib; the storage account
must be located in the same region as the region where the Azure
instance is deployed
- storage_account_sas: Shared Access Signature token for accessing
the storage account: this token must have proper permissions to
create Azure storage tables and add table entities in the above
storage account; SAS tokens for a given storage account can be
generated for example via the Azure portal in the
"Security + networking" menu.
- metrics: tuple that enables sending memory metrics; it can
contain 2 optional attributes:
  - sample_interval: interval expressed in seconds at which metrics
data is collected (default: 15)
  - transfer_interval: interval expressed in seconds at which
metrics data is aggregated and sent to the storage account
(default: 60)

Example snippet of Ops configuration file:
```
"ManifestPassthrough": {
  "azure": {
    "diagnostics": {
      "storage_account": "mystorageaccount",
      "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz",
      "metrics": {"sample_interval": "15","transfer_interval": "60"}
    }
  }
}
```

Aggregated memory metrics data consist of the number of samples,
the minimum, maximum, last, and average value, and the sum of all
values; these data are insterted in an Azure storage table (one
entity per aggregated data). The name of the table is in the format
"WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer
interval expressed with ISO8601 format, and yyyymmdd is a
representation of the 10-day date interval to which the metrics
refer (thus, a new table is created every 10 days). For example, a
table named WADMetricsPT1MP10DV2S20240503 contains metrics data
aggregated every minute ("PT1M" is the ISO8601 representation of a
1-minute period) generated for a 10-day period starting on May 3,
2024.

By default, the Azure portal does not display these metrics in its
charts; in order for metrics to be available in the portal, the
Linux Diagnostics Extension must be enabled and configured in a
running instance (this can be done in the "Diagnostic settings"
section in the portal) to match the settings in the Nanos manifest
options. More specifically, the storage account and the metric
aggregation interval specified in the Azure diagnostic settings
must match those specified in the manifest options.
Note: the Azure VM agent implemented in the cloud_init klib
responds to requests to enable and configure the diagnostic
extension, but does not actually apply the extension settings
specified in the requests; instead, it always applies the settings
from the manifest.

Closes #2014
@francescolavra
Copy link
Member

#2022 implements an Azure VM agent and a diagnostic extension that publishes memory metrics. Please note that the extension is enabled and configured via manifest options when creating a Nanos image; it sends metrics regardless of whether the diagnostic extension is enabled in the Azure portal, but in order for these metrics to show up in the charts you need to enable the extension (in fact, without enabling the extension you cannot even select guest OS metrics in the charts).

@jsommr
Copy link
Author

jsommr commented May 12, 2024

Awesome! Thanks!

@jsommr jsommr closed this as completed May 12, 2024
francescolavra added a commit that referenced this issue May 27, 2024
This change adds a new "azure" klib that implements an Azure
extension similar to the Linux Diagnostic extension.
The current implementation supports sending 4 types of memory
metrics (i.e. available and used memory, as both number of bytes
and percentage of total memory).
This klib is configured in the manifest options via an "azure"
tuple; the diagnostic functionalities are enabled and configured by
inserting a "diagnostic" tuple with the following attributes:
- storage_account: indicates the Azure storage account to be used
to store metrics data generated by the klib; the storage account
must be located in the same region as the region where the Azure
instance is deployed
- storage_account_sas: Shared Access Signature token for accessing
the storage account: this token must have proper permissions to
create Azure storage tables and add table entities in the above
storage account; SAS tokens for a given storage account can be
generated for example via the Azure portal in the
"Security + networking" menu.
- metrics: tuple that enables sending memory metrics; it can
contain 2 optional attributes:
  - sample_interval: interval expressed in seconds at which metrics
data is collected (default: 15)
  - transfer_interval: interval expressed in seconds at which
metrics data is aggregated and sent to the storage account
(default: 60)

Example snippet of Ops configuration file:
```
"ManifestPassthrough": {
  "azure": {
    "diagnostics": {
      "storage_account": "mystorageaccount",
      "storage_account_sas": "sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-05-22T14:50:28Z&st=2024-05-12T06:50:28Z&spr=https&sig=xxyyzz",
      "metrics": {"sample_interval": "15","transfer_interval": "60"}
    }
  }
}
```

Aggregated memory metrics data consist of the number of samples,
the minimum, maximum, last, and average value, and the sum of all
values; these data are inserted in an Azure storage table (one
entity per aggregated data). The name of the table is in the format
"WADMetricsxxxxP10DV2Syyyymmdd", where xxxx is the transfer
interval expressed with ISO8601 format, and yyyymmdd is a
representation of the 10-day date interval to which the metrics
refer (thus, a new table is created every 10 days). For example, a
table named WADMetricsPT1MP10DV2S20240503 contains metrics data
aggregated every minute ("PT1M" is the ISO8601 representation of a
1-minute period) generated for a 10-day period starting on May 3,
2024.

By default, the Azure portal does not display these metrics in its
charts; in order for metrics to be available in the portal, the
Linux Diagnostics Extension must be enabled and configured in a
running instance (this can be done in the "Diagnostic settings"
section in the portal) to match the settings in the Nanos manifest
options. More specifically, the storage account and the metric
aggregation interval specified in the Azure diagnostic settings
must match those specified in the manifest options.
Note: the Azure VM agent implemented in the cloud_init klib
responds to requests to enable and configure the diagnostic
extension, but does not actually apply the extension settings
specified in the requests; instead, it always applies the settings
from the manifest.

Closes #2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants