robots.txt honeypot

2025-04-08


robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
Wikipedia

robots.txt allows website owners to control which crawlers can access their site and how they can access it.

However, some bots don't respect the robots.txt rules.

To detect this, I created a honeypot that serves a robots.txt file disallowing all crawling.
Bots that ignore this rule are automatically reported to AbuseIPDB.

If you want to check it out, it is available on Github. It is licensed under the MIT License.

Github-Repo