-
Notifications
You must be signed in to change notification settings - Fork 93
Add metrics #158
base: master
Are you sure you want to change the base?
Add metrics #158
Conversation
docs/tasks/metrics.md
Outdated
|
||
### kube_oidc_proxy_http_client_requests | ||
counter - {http status code, path, remote address} | ||
The number of requests for incoming requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of requests for incoming requests. | |
The number of incoming requests. |
docs/tasks/metrics.md
Outdated
|
||
### kube_oidc_proxy_http_server_requests | ||
counter - {http status code, path, remote address} | ||
The requests for outgoing server requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requests for outgoing server requests. | |
The number of outgoing server requests. |
cmd/app/options/app.go
Outdated
"Adress to serving metrics on at the /metrics path. An empty address will "+ | ||
"disable serving metrics.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that this can't conflict with the other addresses here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the readiness probe is exposed as a port, but this is a full address. I think that makes sense despite the inconsistency.
cmd/app/run.go
Outdated
return err | ||
} | ||
} else { | ||
klog.Info("metrics listen address empty, disabling serving") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Info("metrics listen address empty, disabling serving") | |
klog.Info("metrics listen address empty, disabling serving metrics") |
pkg/proxy/handlers.go
Outdated
http.Error(rw, "Impersonation requests are disabled when using kube-oidc-proxy", http.StatusForbidden) | ||
return | ||
|
||
// No name given or available in oidc request | ||
case errNoName: | ||
klog.V(2).Infof("no name available in oidc info %s", r.RemoteAddr) | ||
statusCode = http.StatusForbidden | ||
klog.V(2).Infof("no name available in oidc info %s", remoteAddr) | ||
http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden) | |
http.Error(rw, "Username claim not available in OIDC Issuer response", statusCode) |
pkg/proxy/handlers.go
Outdated
http.Error(rw, "Username claim not available in OIDC Issuer response", http.StatusForbidden) | ||
return | ||
|
||
// No impersonation configuration found in context | ||
case errNoImpersonationConfig: | ||
klog.Errorf("if you are seeing this, there is likely a bug in the proxy (%s): %s", r.RemoteAddr, err) | ||
statusCode = http.StatusInternalServerError | ||
klog.Errorf("if you are seeing this, there is likely a bug in the proxy (%s): %s", remoteAddr, err) | ||
http.Error(rw, "", http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.Error(rw, "", http.StatusInternalServerError) | |
http.Error(rw, "", statusCode) |
pkg/proxy/handlers.go
Outdated
http.Error(rw, "", http.StatusInternalServerError) | ||
return | ||
|
||
// Server or unknown error | ||
default: | ||
klog.Errorf("unknown error (%s): %s", r.RemoteAddr, err) | ||
statusCode = http.StatusInternalServerError | ||
klog.Errorf("unknown error (%s): %s", remoteAddr, err) | ||
http.Error(rw, "", http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http.Error(rw, "", http.StatusInternalServerError) | |
http.Error(rw, "", statusCode) |
var statusCode int | ||
if resp != nil { | ||
statusCode = resp.StatusCode | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we will report statusCode
as nil if there was an error in round-tripping? What will the client see in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client will see the error code that comes from the error handler if the there is an error.
If the response is nil then we have to report the status code as 0.
I actually made a mistake here that if there is an error, then we will be observing a client call twice (once here, and again in the error handler). I have now wrapped the client observation in an err != nil
. The server still need to be observed here regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some wrapper we could make the observation for both the error case (with the real error code) and the success case? I guess we lose access to some of the labels if we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly, the error handler code path isn't in most requests so we can't invoke it there, and indeed we loose some info.
@@ -0,0 +1,218 @@ | |||
// Copyright Jetstack Ltd. See LICENSE for details. | |||
package metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the prom client testing support? Is it possible to write tests that check that samples were recorded with certain labels set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but it is a bit of a string matching exercise. Happy to stick something in if you are keen...
https://github.com/jetstack/cert-manager/blob/master/pkg/metrics/certificates_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's critical, but I always prefer tests if they aren't too onerous.
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
buckets for request duration Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoshVanL The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: JoshVanL <[email protected]>
Signed-off-by: JoshVanL <[email protected]>
@@ -43,6 +44,10 @@ func (k *KubeOIDCProxyOptions) AddFlags(fs *pflag.FlagSet) *KubeOIDCProxyOptions | |||
fs.IntVarP(&k.ReadinessProbePort, "readiness-probe-port", "P", 8080, | |||
"Port to expose readiness probe.") | |||
|
|||
fs.StringVar(&k.MetricsListenAddress, "metrics-serving-address", "0.0.0.0:80", | |||
"Address to serve metrics on at the /metrics path. An empty address will "+ | |||
"disable serving metrics. Cannot use the same address as proxy or probe.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to explicitly set --metrics-serving-address=""
without it being defaulted to 0.0.0.0:80
?
return err | ||
} | ||
hooks.AddPreShutdownHook("Readiness Probe", readinessHandler.Shutdown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: are spaces in hook names okay/normal?
@@ -74,6 +82,9 @@ spec: | |||
{{- range $key, $value := .Values.extraArgs -}} | |||
- "--{{ $key }}={{ $value -}}" | |||
{{ end }} | |||
{{- if and .Values.metrics.enabled }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this and
for? Should it be if .Values.metrics.enabled
?
@@ -74,6 +82,9 @@ spec: | |||
{{- range $key, $value := .Values.extraArgs -}} | |||
- "--{{ $key }}={{ $value -}}" | |||
{{ end }} | |||
{{- if and .Values.metrics.enabled }} | |||
- "--metrics-serving-address={{ .Values.metrics.address }}:{{ .Values.metrics.port }}" | |||
{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we want this to be {{- end }}
to avoid a trailing whitespace, but I see that others don't use it and it isn't really essential at all to this PR :)
annotations: | ||
prometheus.io/path: /metrics | ||
prometheus.io/port: "80" | ||
prometheus.io/scrape: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This manifest does not have - "--metrics-serving-address={{ .Values.metrics.address }}:{{ .Values.metrics.port }}"
in it, but does enable scraping of the endpoint. Is it being generated from the Helm chart?
klog.Infof("serving readiness probe on %s/ready", ln.Addr()) | ||
|
||
if err := h.Serve(ln); err != nil { | ||
klog.Errorf("failed to serve readiness probe: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we don't have any way to pass 'up' the stack if this is failing from what I can tell? Meaning if the readiness probe listener fails, the pod will never be restarted (assuming the liveness probe handler doesn't fail).
This isn't a change from what happened before from what I can see, but may be worth looking into as a follow up
// Auth request and handle unauthed | ||
info, ok, err := p.oidcRequestAuther.AuthenticateRequest(req) | ||
if err != nil { | ||
// Since we have failed OIDC auth, we will try a token review, if enabled. | ||
p.metrics.IncrementOIDCAuthCount(false, remoteAddr, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user specifically isn't connecting with OIDC, is there any way to avoid that one user flooding the metric with failure counts?
If I wanted to use token passthrough for one client, I'd end out with something that looks like an error but actually isn't really (from what I can tell).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a distinct 'number of passthroughs' metric? 🤷♂️
|
||
// Setup unauthed handler so that it is passed through the audit | ||
unauthedHandler := audit.NewUnauthenticatedHandler(p.auditor, func(rw http.ResponseWriter, req *http.Request) { | ||
_, remoteAddr := context.RemoteAddr(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit of a nit, but why does RemoteAddr
return the request as well? Does it modify it in any way? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see:
clientAddress, ok := ctx.Value(clientAddressKey).(string)
if !ok {
clientAddress = xff.GetRemoteAddr(req)
req = req.WithContext(request.WithValue(ctx, clientAddressKey, clientAddress))
}
Not one for this PR, but that seems like non-obvious behaviour 😬 is xff.GetRemoteAddr(req)
particularly expensive? What about just making this only accept the req
, and then having a separate function WithRemoteAddress(ctx context.Context, remoteAddr string) context.Context
or something? I don't quite see why we need to add that context
back into the request, but perhaps I am missing something 😄 is this to allow passing through a header containing the connecting client's address?
@@ -199,8 +186,26 @@ func (p *Proxy) RoundTrip(req *http.Request) (*http.Response, error) { | |||
// Set up impersonation request. | |||
rt := transport.NewImpersonatingRoundTripper(*conf, p.clientTransport) | |||
|
|||
req, remoteAddr := context.RemoteAddr(req) | |||
serverDuration := time.Now() | |||
clientDuration := context.ClientRequestTimestamp(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you might want to consider adopting the k8s.io/utils
'clock' package sometime soon - this will be tough to test otherwise 😄
|
||
// Start clock on metrics | ||
tokenReviewDuration := time.Now() | ||
req, remoteAddr := proxycontext.RemoteAddr(req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm.. do we want to be overwriting req
here to include this remote address?
Pardon my ignorance, is this complete ? |
This PR metrics adds a number of metrics to the proxy.
The PR should be split up fairly well per commit.
It would be good to have a bit of insight into whether these metrics look sane... It is tricky not to make the number of time series explode here.
/assign @munnerz
/assign @simonswine
fixes #156