Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement caching to avoid a ton of requests to crowdsec api #21

Open
aleksandarmomic opened this issue Mar 3, 2022 · 8 comments
Open
Assignees

Comments

@aleksandarmomic
Copy link

aleksandarmomic commented Mar 3, 2022

Currently every request gets forwarded to crowdsec one by one and it is slow and resource intensive. In my setup I have setup mariadb additionally and calling crowdsec on every request results in a call to the db. All this can be avoided with a single json file with cached ip addresses on the bouncers side. Similar to how cloudflare bouncer is caching them.
This also results in pretty big mariadb binary logs.
Simple cache mechanism would save space and increase performance by having less impact on the system. File based caching (like json) would be enough, but redis would be awesome.

@fbonalair
Copy link
Owner

Hi, thanks for the suggestion.

Yes, caching would definitely be a nice feature, plus the data are quite suitable to be cached.

I'm more concerned about the cache duration, since the bouncer does not own the data, nor can be notified on change.
I guess the ban duration is a first step? Though, a scanner would be allowed for far too long. Maybe also a cache eviction with a number or call?

About the cache location, I'm thinking of memory first: can't be faster and easier to garbage collect in case of bug. Lastly, disk IO are a pain that I don't want to launch myself into...
Second would be Redis, well known and battle tested.

What do you think?

@fbonalair fbonalair self-assigned this Mar 4, 2022
@tinolin
Copy link

tinolin commented Mar 4, 2022

Hello! I leave what I think:

  1. CacheTime: Configurable environment variable
  2. Redis is the best for that.

@aleksandarmomic
Copy link
Author

Hi, thanks for the suggestion.

Yes, caching would definitely be a nice feature, plus the data are quite suitable to be cached.

I'm more concerned about the cache duration, since the bouncer does not own the data, nor can be notified on change. I guess the ban duration is a first step? Though, a scanner would be allowed for far too long. Maybe also a cache eviction with a number or call?

About the cache location, I'm thinking of memory first: can't be faster and easier to garbage collect in case of bug. Lastly, disk IO are a pain that I don't want to launch myself into... Second would be Redis, well known and battle tested.

What do you think?

@fbonalair
With a quick look at the crowdsec api values returned from the decisions endpoint include the decision duration and the until per one decision. I believe those can be used to control the cache lifetime. This looks pretty promising for Redis as a cache as it is possible to control the lifetime per ip which is not the case for the in-memory or file based caching as you would have to control the deletion of expired decisions.

@fbonalair
Copy link
Owner

@fbonalair With a quick look at the crowdsec api values returned from the decisions endpoint include the decision duration and the until per one decision. I believe those can be used to control the cache lifetime.

That is what I was thinking of using for cache lifetime. Though, I'm still worried about first offenders getting unlimited access for the cache duration.
I guess while looking for a solution I will put the caching system as non default with a warning.

Hello! I leave what I think:

1. CacheTime: Configurable environment variable

Depending on the caching solution parameters, some will be available through environment variables. Thanks for the suggestion.

@el-joseppe
Copy link

Any updates to this issue?

I had to shut down my traefik-crowdsec-bouncer. My server would randomly went unresponsive even over ssh because the bouncer-container got unstable and jamed up the cpu. My initial guess was also that this is caused by some sort of overload issue with to many requests and therefore to many calls to the crowdsec-LAPI via the bouncer-middleware.

I read some of the decumentation over at crowdsec an found, that the official nginx-bouncer has two operation modes:

  • Live mode (query the local API for each request (like the traefik-crowdsec-bouncer, right?))
  • Stream mode (pull the local API for new/old decisions every X seconds; in combination with a cache and a configurable CACHE_EXPIRATION parameter)

That sounds like a solid solution in my estimation. Wouldn't that be also beneficial for the traefik-bouncer, especially in a more demanding environment or with limited ressources.

@mathieuHa
Copy link

Hello,

I've been following this project for a while and wanted to contribute somehow.

I've implemented a local cache using the library go-cache

It is configurable using 2 environnement variables:

  • CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE - Configure the use of a local cache in memory. Default to false
  • CROWDSEC_DEFAULT_CACHE_DURATION - Configure default duration of the cached data. Default to "4h00m00s"

When the cache is enabled, the first time an IP has to be checked, it is first looked up in the local cache.
This can produce 2 outcomes:

  • the IP was found (was it considered malicious or not ?) -> we can continue without asking crowdsec
  • the IP was not found -> we have to ask crowdsec and cache the result after the first request

Cache invalidation is provided by the library, a background job will remove from the cache every entry which are not valid anymore.
This background job runs every 5 min (could be configured), and the default cache validity is 4h and can be overrided by using CROWDSEC_DEFAULT_CACHE_DURATION.

I've got some idea on how to implement a redis configurable version as well to mix cache with the streaming mode which could greatly improve performance.

What do you think about this ?
@el-joseppe
@fbonalair

@mathieuHa
Copy link

Hello,

I've been following this project for a while and wanted to contribute somehow.

I've implemented a local cache using the library go-cache

It is configurable using 2 environnement variables:

  • CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE - Configure the use of a local cache in memory. Default to false
  • CROWDSEC_DEFAULT_CACHE_DURATION - Configure default duration of the cached data. Default to "4h00m00s"

When the cache is enabled, the first time an IP has to be checked, it is first looked up in the local cache. This can produce 2 outcomes:

  • the IP was found (was it considered malicious or not ?) -> we can continue without asking crowdsec
  • the IP was not found -> we have to ask crowdsec and cache the result after the first request

Cache invalidation is provided by the library, a background job will remove from the cache every entry which are not valid anymore. This background job runs every 5 min (could be configured), and the default cache validity is 4h and can be overrided by using CROWDSEC_DEFAULT_CACHE_DURATION.

I've got some idea on how to implement a redis configurable version as well to mix cache with the streaming mode which could greatly improve performance.

What do you think about this ? @el-joseppe @fbonalair

I've just finished working on the streaming mode, it works pretty well.

At the start it takes all known banned IP and cache it in local-cache, and then every minute local cache is updated with the new information only.
I used the robfig/cron library for the recurrent job

It can also be configured with env variables:

  • CROWDSEC_LAPI_ENABLE_STREAM_MODE - Enable streaming mode to pull decisions from the LAPI. Will override CROWDSEC_BOUNCER_ENABLE_LOCAL_CACHE and enable it. Default to "true"
  • CROWDSEC_LAPI_STREAM_MODE_INTERVAL - Define the interval between two calls to LAPI. Default to "1m"

I've took the liberty to enable it by default.
any feedback appreciated @fbonalair

@fbonalair
Copy link
Owner

I took the liberty only to review the #33 PR since it's written to be based on the #32.
Anyway, many thanks for the work! I have put some comments as reviews.

About the default mode, couple of thoughts:

  1. It is a breaking change, the service won't have the same comportment than before. For people not fixing their container version, I prefer avoiding the "unwanted" change.
  2. Strictly speaking, the stream interval let a unknown malicious user free to do whatever during that time. I prefer users making the choice knowing this drawback.
  3. I would prefer to wait for the return from users before it being the default mode. Defensive with security I am.

To prepare for a redis cache or other caches, I would be nice to externalize the cache logic into it's own file / service / folder. And depending on user configuration, the right one would initialized in bouncer.go . It was my rough start in branch feat/cache.
Though it's not mandatory in a first cache implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants