-
-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS - scapers and bots issue #163
Comments
For a start the referrer shows as Google can you search for your full link to the RSS file on Google to see if it got indexed. If so request removal of it and block indexing of it then also deny access to it via robots.txt or disable the RSS feed completely as they are very abused |
i believe i will disable rss completely as many bots do not respect the robots.txt directives, |
I did too as it's highly abused by meaningless content scraping sites & bots that fill pages with the content scraped and serve ads on those pages. |
I get in my logs a lot of these requests:
HTTP/1.1" 200 26701 "http://www.google.co.uk/url?sa=t&source=web&cd=1" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0"
GET /rss.xml HTTP/1.1" 200 375883 "http://www.google.co.uk/url?sa=t&source=web&cd=1" "PHP/7.4"
crawling each post of my site all days, and producing fake stats, consuming resources and ruining seo.
What i believe is that these requests are produced from some sort of scraping program. My xml rss feed is the most abused part of the site as scrapers and bots try to crawl contents each day. I found that a program called 'Full-Text RSS' from fivefilters[.]org can do exactly this:
(from https://forum.fivefilters.org/t/question-in-what-cases-does-debug-rawhtml-return-nothing/1006)
as i said, xml rss are the most abused from scapers, bots and fake bots.
I have ultimate bad bot blocker, csf firewall and fail2ban configured, each day i try to harden the configuration but i always get such type of bloodsuckers in my logs .. please help me to stop them
The text was updated successfully, but these errors were encountered: