Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated yesterday, 2021-05-13, suddenly seem to be blocking Googlebot. #157

Open
abclution opened this issue May 14, 2021 · 4 comments
Open

Comments

@abclution
Copy link

abclution commented May 14, 2021

iptables -L

DROP all -- crawl-66-249-66-55.googlebot.com anywhere
DROP all -- crawl-66-249-66-53.googlebot.com anywhere
DROP all -- crawl-66-249-66-51.googlebot.com anywhere
DROP all -- crawl-66-249-66-55.googlebot.com anywhere
DROP all -- crawl-66-249-66-53.googlebot.com anywhere
DROP all -- crawl-66-249-66-51.googlebot.com anywhere

As read here: https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot?visit_id=637564291852388228-1847563569&rd=1

From what I read the best way to NOT block Googlebot now, it to make sure the Reverse DNS includes the googlebot.com domain.

Bit new to the AUBBB so I'm not 100% sure its not something I did, but other than update the global blacklist, I didnt change anything and it wasn't blocking Googlebots before. I am only running AUBBB for a week and a half and I'm still learning

I grabbed the logs but nothing really stands out, just resetting the ban for now and seeing if it happens again.

66.249.66.55 - - [02/May/2021:15:05:10 +0300] "GET /index.php/default/shop-by-brand/rotair/air-compressors.html HTTP/1.1" 403 407 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.130 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Also, PS to Mitchell and other contributors, amazing work! Saved my ass, my server has been destroyed by bots.

@abclution
Copy link
Author

iptables -L

Chain f2b-apacherepeatoffender (1 references)
DROP all -- crawl-66-249-64-151.googlebot.com anywhere

globalblacklist.conf contains the goodbot setting
BrowserMatchNoCase "(?:\b)Googlebot(?:\b)" good_bot

also manually added to whitelist-domains

SetEnvIfNoCase Referer ~*googlebot.com good_ref

And its still blocking it. Why?

@mitchellkrogza
Copy link
Owner

Rather disable the repeateoffender jail it can be troublesome and just rely on what the blocker does

@abclution
Copy link
Author

abclution commented Jun 10, 2021

Hi Mitchell thanks for replying.

Yes, I just realized after I updated it was a problem with the F2B situation, not the ABBB. Derp.

If I keep getting the issue, I'll give it a try, but the jail really helps ALOT when there are botnets attacking constantly.

What I did for now for better or worse is grabbed the ips (from ABBB global list) for Bing/Google/Cloudflare and added them to jail.local like this.

[DEFAULT]

ignoreip = 108.177.0.0/17 172.217.0.0/16 173.194.0.0/16 2001:4860:4000::/36 203.208.60.0/24 207.126.144.0/20 209.85.128.0/17 216.239.32.0/19 216.58.192.0/19 2404:6800:4000::/36 2607:f8b0:4000::/36 2800:3f0:4000::/36 2a00:1450:4000::/36 2c0f:fb50:4000::/36 35.192.0.0/12 64.18.0.0/20 64.233.160.0/19 64.68.80.0/21 65.52.0.0/14 66.102.0.0/20 66.249.64.0/19 72.14.192.0/18 74.125.0.0/16 131.253.21.0/24 131.253.22.0/23 131.253.24.0/21 131.253.24.0/22 131.253.32.0/20 157.54.0.0/15 157.56.0.0/14 157.60.0.0/16 199.30.16.0/24 199.30.27.0/24 207.46.0.0/16 40.112.0.0/13 40.120.0.0/14 40.124.0.0/16 40.125.0.0/17 40.74.0.0/15 40.76.0.0/14 40.80.0.0/12 40.96.0.0/12 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2405:8100::/32 2405:b500::/32 2606:4700::/32 2803:f800::/32 2a06:98c0::/29 2c0f:f248::/32

Also, I followed the instructions here: https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example

And grabbed that script and used it as the ignorecommand for the apacherepeatoffender jail. In theory it should work to reverse dns the ip and possible tell F2B to play nice.

ALLOWED_HOSTS = [
".googlebot.com",
".search.msn.com",
".google.com"]
etc

I'll update if I find some success. :)

Ninjaedit: Also am planning to try this if the python screen doesnt work:

https://deeb.me/20180320/how-not-to-ban-googlebot

Thanks!

Server Fault
I would like to add ".googlebot.com" to the ignore iplist for fail2ban since the ignoreip explanation mentions DNS host as an accepted input. Is this a proper format?

"ignoreip" can be an IP add...

@abclution
Copy link
Author

abclution commented Jun 11, 2021

The python script from the linked stack overflow wouldn't work for me, needed some small modification.

I edited this script a bit later as it had some issues.
It only seems to work with reverse pointer style domain lookups

Like,
crawl-66-249-64-157.googlebot.com

And I dont know enough about python regular expression to make it more flexible.


#!/usr/bin/env fail2ban-python
# Inspired by apache-fakegooglebot script
#
# Written in Python to reuse built-in Python batteries and not depend on
# presence of host and cut commands
# https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example
import sys
import re
from fail2ban.server.ipdns import DNSUtils, IPAddr

ALLOWED_HOSTS = [
        ".googlebot.com",
        ".search.msn.com"]
       
def process_args(argv):
    if len(argv) != 2:
       raise ValueError("Please provide a single IP as an argument. Got: %s\n"
                        % (argv[1:]))
    ip = argv[1]

    if not IPAddr(ip).isValid:
       raise ValueError("Argument must be a single valid IP. Got: %s\n"
                        % ip)
    print("Ip received!")

    return ip

def is_allowed_host(ip):
    host = DNSUtils.ipToName(ip)
    print (f"Host is {host}")
    if not host:
        return False
    else:
#        m = re.search('.\S+(-\d+)(?P<domain>\.\S+)', host)
        m = re.match('.\S+(-\d+)(?P<domain>\.\S+)', host)
        print(f"Match: {m}")
###         domain = m.group('domain')
        try:
          domain = m.group('domain')
          print(domain)
        except:
#          print(domain)
          return False
        if domain in ALLOWED_HOSTS:
           print("True")
           return True
        else:
           print("FALSE")
           return False

if __name__ == '__main__': # pragma: no cover
    try:
      ret = is_allowed_host(process_args(sys.argv))
    except ValueError as e:
      sys.stderr.write(str(e))
      sys.exit(2)

    print(f"Return {ret}")
    sys.exit(0 if ret else 1)



Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants