Arbiter 3 is currently in beta. Feel free play around with Arbiter, but know that there will be issues. When you encounter them, please open an issue on GitHub, or better yet, submit a PR. See CONTRIBUTING.md for details
Arbiter 3 is a system of software created to monitor and manage resource usage on HPC cluster login nodes. A successor of Arbiter2, it aims to be easier to configure and deploy.
Arbiter 3 is composed of three main components: the Prometheus time-series database (TSDB), the Python Arbiter service, and the cgroup-wardens running on login nodes. The wardens expose user usage via https, which can then be ingested into Prometheus. The Arbiter service queries the TSDB for user usage data, and creates violations of pre-defined policies. Arbiter then sends RPC calls, again over https, to the wardens affected by the policy violation, setting hard limits on user resources. Arbiter also evaluates the state of violations (to possibly expire them), and sends emails to users regarding their violations.
We model the Policies and Penalties of the Arbiter Service as the following:
- A Property is a systemd property that can be set on a cgroup.
- A Limit is a vlaue associated with one of those systemd Properties that will be set when a user goes in Penalty
- A Penalty is a set of limits (CPU and Memory) and duration that gets applied to a user when a Policy with this penalty gets violated
- A Policy is a rule that a user can violate, which will cause a Violation to be made with the Policy's repective penalty
- A Violation is an instance of a Violation on a Target, or unit-host pairing, which has an associated Penalty and Policy
- A Target is a user on a specific host that Arbiter monitors usage and penalizes
erDiagram
Property{
string name
Operation operation
Type type
}
Limit{
string value
}
Penalty{
string name
time duration
float scale_factor
}
Policy{
string name
string query
time timewindow
time lookback
time grace_period
}
Violation{
int num_offense
datetime expiration
}
Target{
string unit
string host
}
Policy }|--o| Penalty : Enforces
Target |o--|{ Violation : On
Violation }|--o| Policy : From
Property |o--|{ Limit : For
Penalty }o--o{ Limit : Applies
Target }o--o{ Limit : Last-Applied