Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trickster v2.0.0-beta2 : Prometheus query_range calls resulting in proxy-only sometimes instead of cache hit #594

Open
sumitk23 opened this issue Jan 10, 2022 · 2 comments

Comments

@sumitk23
Copy link

We are trying to upgrade from Trickster v1.1 to v2.0.0-beta2 and facing the following issue. Hit rate has decreased and some of the calls are now being served as proxy-only. More details in logs below.

The same suite which used to give 100% hit rate in v1.1 is giving lesser hit rate in v.2.0 beta version.

Test setup :
Trickster is deployed in our K8s cluster and we tried running the test suite with different number of pod replicas and here are our findings. We are making 10 queries/sec. Test is run continuously for ~10 mins. Same query (test_metrics) which is already present in TSDB is made every time.
Backend used is Prometheus and config is such that recent 5 mins data is not cached, anything older than till 3 weeks is cached.

Screenshot 2022-01-10 at 11 50 15 AM

Fixed time range means test_metrics is queried for same fixed time range every time REST call is made ie. start & end query params is fixed for all calls.
Variable time range uses a dynamic time range ie. end as the currentTime and start as currentTime - 900 secs (15 mins ago) for each API call this should result in partial hit (phit) for Trickster.

Fixed time range means test_metrics is queried for same fixed time range every time REST call is made ie. start & end query params is fixed for all calls.

Variable time range uses a dynamic time range ie. end as the currentTime and start as currentTime - 900 secs (15 mins ago) for each API call this should result in partial hit (phit) for Trickster.
Trickster-hit-rates-diff-replica-counts

Panel Queries :
Hit Rate : sum(rate(trickster_proxy_requests_total{path=~"/api/.*", cache_status=~"hit|phit|nchit|rhit", job="m3-query-pod-name"}[2m])) / sum(rate(trickster_proxy_requests_total{path=~"/api/.*", job="m3-query-pod-name"}[2m]))

Proxy Only Rate : sum(rate(trickster_proxy_requests_total{path=~"/api/.*", cache_status=~".*proxy.*", job="m3-query-pod-name"}[2m])) / sum(rate(trickster_proxy_requests_total{path=~"/api/.*", job="m3-query-pod-name"}[2m]))

Trickster Configuration :

--- 
trickster-conf: 
  backends: 
    default: 
      max_ttl_ms: 1814400000
      origin_url: "http://127.0.0.1:8500"
      provider: prometheus
      timeseries_retention_factor: 4096
      timeseries_ttl_ms: 1814400000
  caches: 
    default: 
      index: 
        max_size_bytes: 5120000000
      provider: filesystem
  frontend: 
    listen_port: 8480
  metrics: 
    listen_port: 8481

Test suite logs with trickster response headers :

16:31:41.239 [DEBUG] i.g.h.e.t.HttpTxExecutor - Sending request=query uri=http://prometheus-url/api/v1/query_range?query=test_metrics&start=1640946196&end=1640947096&step=30s: scenario=Query, userId=5029
16:31:41.254 [DEBUG] i.g.h.c.i.HttpAppHandler - Read msg='DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
access-control-allow-origin: *
cache-control: s-maxage=1814400
content-type: application/json; charset=UTF-8
**x-trickster-result: engine=DeltaProxyCache; status=hit; ffstatus=off**
date: Fri, 31 Dec 2021 11:01:41 GMT
x-envoy-upstream-service-time: 1
transfer-encoding: chunked'

16:31:41.338 [DEBUG] i.g.h.e.t.HttpTxExecutor - Sending request=query uri=http://prometheus-url/api/v1/query_range?query=test_metrics&start=1640946196&end=1640947096&step=30s: scenario=Query, userId=5030
16:31:41.415 [DEBUG] i.g.h.c.i.HttpAppHandler - Read msg='DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
access-control-allow-origin: *
content-type: application/json
date: Fri, 31 Dec 2021 11:01:41 GMT
**x-trickster-result: engine=HTTPProxy; status=proxy-only**
x-envoy-upstream-service-time: 92
transfer-encoding: chunked'

Please let me know if their is something wrong/missing in config which is leading to some queries getting served as proxy-only instead of cache hit.

@PKhivasra
Copy link
Contributor

We have found that in trickster-v2, the reason for decreased hit rate and calls being served as proxy-only is because, all the pods don't use cache. It is also found that request it tracks is: /api/v1 instead of /api/v1/query_range.

Due to incorrect request tracking, the handler for /api/v1 used is Proxy-handler instead of Query-range handler.

Can you please let us know if this could be related to: https://github.com/trickstercache/trickster/pull/672/files where the paths is not being copied.

@PKhivasra
Copy link
Contributor

It is found that following is the cause of this issue:

When the default backend routes are registered, currently it is as per unordered map. This means that the routes can have "\" or "\api\v1\" before "api\v1\query_range". When the router matches the regex expression with the routes map, if partial routes appears first that will be considered as a match, which causes request to be "Proxy-Only".
Thus, with sorting we will ensure that routes will be a sorted array in the descending order of path lengths, meaning: "\api\v1\query_range" would appear before "api\v1" and "\" so that we have appropriate match.

Will raise a PR for the same. Please reach out in case any issues. Thanks!

PKhivasra added a commit to PKhivasra/trickster that referenced this issue Oct 10, 2023
…uest uri

When the default backend routes are registered, currently it is as per unordered map. This means that the routes can have "" or "\api\v1" before "api\v1\query_range". When the router matches the regex expression (Refs: https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/route.go#L40 https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/regexp.go#L323) with the routes map, if partial routes appears first that will be considered as a match, which causes request to be "Proxy-Only".
Thus, with sorting we will ensure that routes will be a sorted array in the descending order of path lengths, meaning: "\api\v1\query_range" would appear before "api\v1" and "" so that we have appropriate match.
Issue:trickstercache#594
PKhivasra added a commit to PKhivasra/trickster that referenced this issue Oct 10, 2023
…uest uri

When the default backend routes are registered, currently it is as per unordered map. This means that the routes can have "" or "\api\v1" before "api\v1\query_range". When the router matches the regex expression (Refs: https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/route.go#L40 https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/regexp.go#L323) with the routes map, if partial routes appears first that will be considered as a match, which causes request to be "Proxy-Only".
Thus, with sorting we will ensure that routes will be a sorted array in the descending order of path lengths, meaning: "\api\v1\query_range" would appear before "api\v1" and "" so that we have appropriate match.
Issue:trickstercache#594

Signed-off-by: PKhivasra <[email protected]>
jranson pushed a commit that referenced this issue Dec 2, 2023
…uest uri (#687)

* Test only

Signed-off-by: PKhivasra <[email protected]>

* Sort the paths in routes to ensure matcher picks nearest regex to request uri

When the default backend routes are registered, currently it is as per unordered map. This means that the routes can have "" or "\api\v1" before "api\v1\query_range". When the router matches the regex expression (Refs: https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/route.go#L40 https://gecgithub01.walmart.com/Telemetry/trickster-v2/blob/main/pkg/router/regexp.go#L323) with the routes map, if partial routes appears first that will be considered as a match, which causes request to be "Proxy-Only".
Thus, with sorting we will ensure that routes will be a sorted array in the descending order of path lengths, meaning: "\api\v1\query_range" would appear before "api\v1" and "" so that we have appropriate match.
Issue:#594

Signed-off-by: PKhivasra <[email protected]>

---------

Signed-off-by: PKhivasra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants