Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REQUEST-912-DOS-PROTECTION paranoia level 2 sets ip.dos_block=1 outside of dos_burst_time_slice window #11

Open
jelsdon opened this issue Feb 2, 2021 · 24 comments
Assignees

Comments

@jelsdon
Copy link

jelsdon commented Feb 2, 2021

Describe the bug

I've experienced issues whereby dos-protection is kicking in, without the request count exceeding a threshold within the given time slice window. This appears when the client is slowly sending traffic regularly over a period of time.

Premise is that paranoid level 2 should take into account dos_burst_time_slice before blocking an address, and it is not.

I've tried to follow the logic through https://github.com/coreruleset/coreruleset/blob/v3.3/master/rules/REQUEST-912-DOS-PROTECTION.conf

  • and looking for validation or correction of my understanding.

With paranoia level set to 2, IP:DOS_COUNTER is able to exceed TX:DOS_COUNTER_THRESHOLD and set ip.dos_block=1 without consideration of account dos_burst_time_slice

  • id:912160 or id:912161 can reset ip:dos_counter.
  • id:912160 sets ip.dos_burst_counter=1 irrespective of time taken for ip.dos_burst_counter to reach tx.dos_counter_threshold
  • id:912171 with ip.dos_burst_counter set to 1, ip.dos_block is set to 1
  • id:912120 bocks request

With paranoia of 1 this is not an issue as id:912170, both id:912160 and id: 912161 and this flow takes into account tx.dos_burst_time_slice.

https://github.com/coreruleset/coreruleset/blob/v3.3/master/rules/REQUEST-912-DOS-PROTECTION.conf

Steps to reproduce

Set paranoia to 2 and send a slow rate of requests that reach tx.dos_counter_threshold within modsecurity's overall capture time, but less than dos_counter_threshold

Expected behaviour

I expect paranoia 2 to set ip.dos_block=1 only when ip:dos_counter has exceeded tx.dos_counter_threshold within a given tx.dos_burst_time_slice

Actual behaviour

ip.dos_block is set to 1 within id:912171 and traffic is blocked, without ip:dos_counter reaching tx.dos_counter_threshold within tx.dos_burst_time_slice

Additional context

Your Environment

  • CRS version: 3.3.0
  • Paranoia level setting: 2
@dune73
Copy link
Member

dune73 commented Apr 16, 2021

The plugin pull request is about to be merged. This will allow to take the DOS rules and transform them into an official CRS plugin as has been our plan for quite some time. I will also address this issue here when making that transformation.

Thank you for reporting and sorry for not responding earlier.

@jacen05
Copy link

jacen05 commented Apr 25, 2021

I think this affects Paranoia level 1 also.
The issue was already described here.

I've tried to raise the dos_counter_threshold, but it's just delaying the problem : it seems that dos_burst_time_slice does not work as intended. I'm also using CRS v3.3.0

For now I've disabled the rule, but I'll be happy to test a correction.

@dune73
Copy link
Member

dune73 commented Apr 25, 2021

Thank you for this complementary remark, @jacen05.

@dune73
Copy link
Member

dune73 commented Sep 3, 2021

Status: This is clearly on my turf and I am postponing this constantly while life interferes. Sorry. Yet I am none the wise and can not give a clear date, I'm behind on many fronts.

@diablodale
Copy link

diablodale commented Feb 28, 2022

related bug moved to coreruleset/coreruleset#2413 as requested.

@azurit
Copy link
Member

azurit commented Mar 1, 2022

@diablodale Thanks for digging into this! Can you be so kind and open a new issue? Thank you.

@theseion
Copy link
Contributor

theseion commented Apr 2, 2022

I think there is a misconception here w.r.t. how DOS blocking works (admittedly, it's not intuitive). TX:DOS_BURST_TIME_SLICE is only relevant for paranoia level 1, because it is applied to IP:DOS_BURST_COUNTER, not to IP:DOS_COUNTER. Hence, at paranoia level 2 the rules block immediately whenever a burst is detected.

It is possible that this was never the intention. To fix it I see two options:

  1. Make PL 2 depend on TX:DOS_BURST_TIME_SLICE:
    1. PL 1 should block at IP:DOS_BURST_COUNTER >= 3
    2. PL 2 should block at IP:DOS_BURST_COUNTER >= 2
  2. Expire IP:DOS_COUNTER after TX:DOS_BURST_TIME_SLICE, not IP:DOS_BURST_COUNTER (or both, for that matter).

Option two is probably the right way to go.

@dune73
Copy link
Member

dune73 commented Apr 2, 2022

Digging in my memory and reporting from family holidays in Calanca, GR, Switzerland, where I used the early morning hours to document and clean up the DOS rules, I can report the following:

  • The idea was indeed to see PL2 as strict sibling of the burst counter rule in PL1. I think I remember that it was intended to block regardless of burst counter.

I am not convinced this is correct or smart, but this is what I thought was the intention of the rules and one way to add a stricter sibling. A new concept back in the day.

Can you elaborate some more what option 2 would mean and how the behaviour would change.

If we would opt for options 1, then I think there is nothing wrong with this approach. Maybe add a PL3 rule retains the current PL2 behaviour: blocking on the first burst.

@theseion
Copy link
Contributor

theseion commented Apr 2, 2022

I think option 2 is semantically correct. It is not the burst that becomes invalid after a certain time. We actually want some way to define what we consider a burst to look like. I think that we should use two different timeout variables, one for counting requests toward a burst, and one to expire bursts.

What I'm struggling with is how to properly define a sliding window. Static windows would be easy to do but aren't optimal. In theory, an attacker could send requests to determine the window size and then bombard the service such that it never registers enough requests to go into blocking mode. With a sliding window, the attacker would at least have to wait for an entire window before resuming the attack.

However, as pointed out elsewhere, CRS isn't the proper tool to defend against serious DoS attacks, so maybe static windows are good enough.

With the current approach, almost every IP will trigger the first burst, since the request counter is only reset once the threshold has been reached. To retain the current behaviour, we would actually need block at PL1 after the first burst, however, I think it would be better to wait until we've identified a second burst. So, for both PL1 and PL2, adding a timeout to the request counter means that the rules will block later than they would now (one timeout window later) but I think that is actually a good thing and will reduce the number of FP's (I had a couple of issues with FP's and had to increase the timeout / disable the rules for certain IP's).

Technically speaking, we would introduce one new variable TX:DOS_COUNTER_TIME_SLICE which would be initialized to the same value as TX:DOS_BURST_TIME_SLICE by default. Then we'd add a new rule that would expire IP:DOS_COUNTER with the value of TX:DOS_COUNTER_TIME_SLICE if it is unset. When TX:DOS_COUNTER_THRESHOLD has been reached, we unset IP:DOS_COUNTER (instead of using "unset" as the initialization state we could also use the value 0 of the counter; this could be safer to use with the other rules).

To summarise, I think option 2 would be cleaner and semantically correct, however, it also introduces additional complexity into a system that is very hard to understand in its current state. Option 1 would certainly be the "simple" fix.

@theseion
Copy link
Contributor

theseion commented Apr 4, 2022

@dune73 I've created a proof of concept PR #3.

@github-actions
Copy link

github-actions bot commented Aug 3, 2022

This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days

@diablodale
Copy link

ping alive

@github-actions
Copy link

github-actions bot commented Dec 3, 2022

This issue has been open 120 days with no activity. Remove the stale label or comment, or this will be closed in 14 days

@diablodale
Copy link

ping alive

@fzipi
Copy link
Member

fzipi commented Dec 12, 2022

@diablodale Did you tested what @theseion created?

@diablodale
Copy link

diablodale commented Dec 12, 2022

hi @fzipi, I have not tested the proof of concept. I doubt I can provide solid testing feedback as I do not have deep firewall/attack experience to test it. I can read the code, yet that's not the same. :-/

@dune73
Copy link
Member

dune73 commented Dec 15, 2022

I sat down and looked at the proof of concept PR by @theseion linked above.

There are several issues and I eventually gave up on this.

Here is a list:

  • rule ID 9514110 used twice
  • dos-protection-before.conf, line 221, single quote not closed
  • dos-protection-config.conf, line 36, single quote not closed

When fixing these, apache starts.

But then comes the next problem:

The central rule 9514120 runs in phase 1 and thus before REQUEST-901 that initialized the IP collection. 9514120 references the IP collection that is only initialized afterwards. So this can not work and rules ought to be rearranged.

I think we would fare best if IP collection initialization would be removed from main CRS alltogether and the plugin should come with the init that can also be switched off via a config item. Explain in the README that you have to pay attention that not more than one plugin initializes the collection etc. Then it should work.

@theseion
Copy link
Contributor

I've updated the PR with fixes to the format issues you found. Do we have a resolution in collection initialisation by now? I vaguely recall that we had discussed something in a meeting.

@RedXanadu
Copy link
Member

@theseion Found it:

  • CRS no longer needs collections, but plugins may want to work with collections
    -> Decisions: CRS initializes them for the plugins, off by default

crs-setup.conf has (commented out) rule 900130 which contains the action setvar:tx.enable_default_collections=1 to enable collection initialisation.

@theseion
Copy link
Contributor

I just added the other issue to the list of discussions for tonight. initcol actions must run before plugin setup.

@fzipi
Copy link
Member

fzipi commented May 29, 2024

@theseion What's next here?

@theseion
Copy link
Contributor

We had decided that:

🔵 Decision: @theseion with help of @dune73 and @airween will try to check for initcol. If the collection is initialized by the plugin -> remove the 901 rule via ctl statement and do initcol yourself.

@fzipi
Copy link
Member

fzipi commented Jul 8, 2024

Should we move this one to https://github.com/coreruleset/dos-protection-plugin-modsecurity ? Or the ctl will be for the core instead?

@theseion
Copy link
Contributor

theseion commented Jul 8, 2024

Yes

@theseion theseion transferred this issue from coreruleset/coreruleset Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants