Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim UA block list for false positives #189

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

yitzhaq
Copy link

@yitzhaq yitzhaq commented Feb 5, 2023

After 1,5 months of using your UA block list in a production environment of a certain size, we had to make some local alterations to weed out false positives we were encountering. I'm submitting them all here, together with an explanation and justification for each, presented as individual commits in case you feel they need to be cherry-picked.

Do note that we're not using the list "as intended", but rather consume the source list directly to use in HAProxy, where we do case-insensitive sub-string matching against the client's UA. If you feel there's a better (unified) way to make use of this data - such as use word boundaries or case sensitivity or whatnot - I'm all ears.

Thanks for all your hard work!

yitzhaq added 15 commits February 5, 2023 23:30
…icrosoft Teams. Their agent uses UA strings like this:

MicrosoftNinja/1.0+Teams/1.0+(ExchangeServicesClient/0.0.0.0)+SkypeSpaces/1.0a$*+
This breaks MS Teams integration with MS Exchange, possibly more.
…e of robot) "UptimeRobot", which is supposed to be limited - not blocked
… block legitimate WebDAV clients correctly identifying as such. WebDAV is a very widely used official HTTP standard, and is the foundation of popular solutions like Nextcloud and ownCloud.
…ent stable release), and the version codename string is used as part of UA for several of its packages, such as Python's pip
…, in terms of features such as accessibility. If you're receiving requests from them, it's likely because you asked (and paid) them to.
…to archive digital content and create a library for current and future generations. They are good guys, and if one person is annoyed by their work not strictly adhering to robots.txt, let that person block it locally, rather than cause this level of collateral damage for a good cause. Closes mitchellkrogza#87
…ighting cybercrime on a plethora of different fronts. When they do mine technical data, it's for a good cause, that benefits all of us and makes the Internet a safer place. Their good efforts should not be blocked.
…rability scanning. It doesn't do harm, it merely discovers weakness in your setup, which makes it possible for you to fix them. Hiding from it doesn't make you more secure - on the contrary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant