-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative fix for #3805 by creating a snapshot and using it #3827
base: main
Are you sure you want to change the base?
Conversation
@grobinson-grafana A full mr and separating the snapshot-ing from the behavior change. The failing test seems unrelated. |
Awesome! Yes that test flakes a lot. It needs to be fixed 😞 |
bd37add
to
933c3af
Compare
Thanks once again for opening this @zecke! I'm going to review it this week! 👍 |
} | ||
} | ||
|
||
func (a *AlertSnapshot) Resolved() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is what I was expecting 👍
for _, a := range alerts { | ||
v := a.Alert | ||
// If the end timestamp is not reached yet, do not expose it. | ||
func (as AlertsSnapshot) Less(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can share the Less
function between AlertSlice
and AlertsSnapshot
? Then we can avoid duplication of the same code! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we do not need both AlertSlice
and AlertsSnapshot
. We can either a.) rename AlertSlice
to AlertsSnapshot
and change its definition to []*AlertSnapshot
or b.) delete AlertSlice
and make the following additional changes:
diff --git a/api/v2/api.go b/api/v2/api.go
index 6f034d25..958d1db9 100644
--- a/api/v2/api.go
+++ b/api/v2/api.go
@@ -290,7 +290,7 @@ func (api *API) getAlertsHandler(params alert_ops.GetAlertsParams) middleware.Re
continue
}
- alert := AlertToOpenAPIAlert(a, api.getAlertStatus(a.Fingerprint()), receivers)
+ alert := AlertToOpenAPIAlert(types.NewAlertSnapshot(a, now), api.getAlertStatus(a.Fingerprint()), receivers)
res = append(res, alert)
}
diff --git a/api/v2/compat.go b/api/v2/compat.go
index 112bfc4f..4b17b368 100644
--- a/api/v2/compat.go
+++ b/api/v2/compat.go
@@ -117,7 +117,7 @@ func PostableSilenceToProto(s *open_api_models.PostableSilence) (*silencepb.Sile
}
// AlertToOpenAPIAlert converts internal alerts, alert types, and receivers to *open_api_models.GettableAlert.
-func AlertToOpenAPIAlert(alert *types.Alert, status types.AlertStatus, receivers []string) *open_api_models.GettableAlert {
+func AlertToOpenAPIAlert(alert *types.AlertSnapshot, status types.AlertStatus, receivers []string) *open_api_models.GettableAlert {
startsAt := strfmt.DateTime(alert.StartsAt)
updatedAt := strfmt.DateTime(alert.UpdatedAt)
endsAt := strfmt.DateTime(alert.EndsAt)
diff --git a/dispatch/dispatch.go b/dispatch/dispatch.go
index 8bf7ad6a..4860a64e 100644
--- a/dispatch/dispatch.go
+++ b/dispatch/dispatch.go
@@ -199,7 +199,7 @@ func (d *Dispatcher) run(it provider.AlertIterator) {
// AlertGroup represents how alerts exist within an aggrGroup.
type AlertGroup struct {
- Alerts types.AlertSlice
+ Alerts types.AlertsSnapshot
Labels model.LabelSet
Receiver string
}
@@ -241,7 +241,7 @@ func (d *Dispatcher) Groups(routeFilter func(*Route) bool, alertFilter func(*typ
}
alerts := ag.alerts.List()
- filteredAlerts := make([]*types.Alert, 0, len(alerts))
+ filteredAlerts := make([]*types.AlertSnapshot, 0, len(alerts))
for _, a := range alerts {
if !alertFilter(a, now) {
continue
@@ -258,7 +258,7 @@ func (d *Dispatcher) Groups(routeFilter func(*Route) bool, alertFilter func(*typ
receivers[fp] = []string{receiver}
}
- filteredAlerts = append(filteredAlerts, a)
+ filteredAlerts = append(filteredAlerts, types.NewAlertSnapshot(a, now))
}
if len(filteredAlerts) == 0 {
continue
@@ -501,10 +501,10 @@ func (ag *aggrGroup) flush(notify func(...*types.AlertSnapshot) bool) {
alertsSlice := types.SnapshotAlerts(ag.alerts.List(), time.Now())
sort.Stable(alertsSlice)
- resolvedSlice := make(types.AlertSlice, 0, len(alertsSlice))
+ resolvedSlice := make(types.AlertsSnapshot, 0, len(alertsSlice))
for _, alert := range alertsSlice {
if alert.Resolved() {
- resolvedSlice = append(resolvedSlice, &alert.Alert)
+ resolvedSlice = append(resolvedSlice, alert)
}
}
diff --git a/store/store.go b/store/store.go
index 5de0c3ee..0ae0e853 100644
--- a/store/store.go
+++ b/store/store.go
@@ -116,7 +116,7 @@ func (a *Alerts) Set(alert *types.Alert) error {
// DeleteIfNotModified deletes the slice of Alerts from the store if not
// modified.
-func (a *Alerts) DeleteIfNotModified(alerts types.AlertSlice) error {
+func (a *Alerts) DeleteIfNotModified(alerts types.AlertsSnapshot) error {
a.Lock()
defer a.Unlock()
for _, alert := range alerts {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. The implication is that all callers of Groups including the API will work on a snapshot.
Signed-off-by: Holger Hans Peter Freyther <[email protected]>
Make sure that the alerts passed to notifyFunc are consistent regardless of retries and other delays. Signed-off-by: Holger Hans Peter Freyther <[email protected]>
This commit fixes a bug where firing alerts had zeroed EndsAt timestamps in notifications. This bug was introduced in another PR (prometheus#1686) to fix issues where alerts are resolved during a flush. (Commit message from George Robinson) Update the acceptance tests as the webhook now has a non-zero EndsAt timestamp for firing alerts. The models.GettableAlert doesn't havea `status` field which means the current test is unable to differentiate between firing and resolved alerts. Signed-off-by: Holger Hans Peter Freyther <[email protected]>
Change the AlertGroup to use the AlertsSnapshot and the Group call to take the snapshot time. Make changes to accomodate this.
Introduce types.AlertSnapshot and types.AlertsSnapshot that keeps track of the original dispatch time and uses it for the
Status()
andResolved()
method. This makes sure the notification is consistent across retries and/or other delays.Change the behavior to not zero EndsAt for firing alerts and update the acceptance tests.