Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.

v2.0.0. Performance Review #74

Open
1 of 2 tasks
thrawn01 opened this issue Oct 28, 2020 · 2 comments
Open
1 of 2 tasks

v2.0.0. Performance Review #74

thrawn01 opened this issue Oct 28, 2020 · 2 comments
Assignees
Milestone

Comments

@thrawn01
Copy link
Contributor

thrawn01 commented Oct 28, 2020

Purpose

  • In production we are seeing 300ms response times during very high volumes. (Response times are usually in the 2-5ms range)
  • Profile the distribution hit updates when using Behavior=GLOBAL. Reference our implementation with that of https://ipfs.io.

TODO

  • Profile running gubernators in production.
  • Profile GLOBAL update behavior
@thrawn01 thrawn01 added this to the v1.0.0 milestone Oct 28, 2020
@thrawn01 thrawn01 self-assigned this Oct 28, 2020
@thrawn01 thrawn01 changed the title 1.0.0. Performance Review v1.0.0. Performance Review Oct 28, 2020
@thrawn01 thrawn01 mentioned this issue Oct 28, 2020
9 tasks
@valer-cara
Copy link

valer-cara commented Jan 27, 2021

Have you considered any optimizations in the use of FanOut in GetRateLimits? (eg: not fanning out for local cache hits, ...)

I've been trying to use a gub cluster taking in ~40-80k QPS, each with ~5 items in the requests list and I've been reaching a ceiling (image below).

Screenshot_2021-01-27_20:07:39_540x347

I tried a number of things: load balancing with envoy, various cluster sizes (from 1 to 5 machines of 16 cores each), etc.. However I wasn't able to saturate those machines, so I went hunting for blocking points. I initially thought it might be the global mutex on the cache and tried a sync.Map alternative but to no result.

I've taken some blocking profiles and there's quite some time spent in FanOut/ChanRecv (even locally, since it's expected on remote).

As a quick wip dirty hack, I eliminated the FanOut for local cache hits (and disabled remote). I only tested this locally in a single instance (as it made most sense given that I stripped out all GetPeerRatelimit for the quick proof of concept). I was able to go from 25k QPS to 40k QPS which indicated that I should be trying out a complete fanOut optimization.

Not sure if there's light at the end of this tunnel, but that almost 2x increase in QPS on the local machine definitely caught my attention.

@thrawn01
Copy link
Contributor Author

Thank you for doing this analysis! (I kept seeing FanOut show up in my CPU profiles, but never followed up on it). Avoiding fanout for local cache hits is a great optimization! My current optimization research is looking into how we can use GLOBAL behavior to avoid the network requests to owning peers. But it's stalled because work priorities are not leaving me with free time to work on this. If you are interested, a PR with this optimization would be most welcome!

@thrawn01 thrawn01 changed the title v1.0.0. Performance Review v2.0.0. Performance Review Aug 20, 2021
@thrawn01 thrawn01 modified the milestones: v1.0.0, v2.0.0 Aug 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants