robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
— Wikipedia
robots.txt
allows website owners to control which crawlers can access their site and how they can access it.
However, some bots don't respect the robots.txt
rules.
To detect this, I created a honeypot that serves a robots.txt
file disallowing all crawling.
Bots that ignore this rule are automatically reported to AbuseIPDB.
If you want to check it out, it is available on Github. It is licensed under the MIT License.