Robots.txt

Robots all the way down

we found out that at least 65% of this resource-consuming traffic we get for the website is coming from bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total. This high usage is also causing constant disruption for our Site Reliability team, who has to block overwhelming traffic from such crawlers before it causes issues for our readers. I do wonder about the future of internet.

Example of AI not honoring robots.txt

The AI scraper (I can only assume thats what they are) scourge continued, and intensified in the last week. This time they were hitting pagure.io really quite hard. Robots.txt is only honor system. One of many postings calling it out. Quote Citation: Kevin, “Mid March infra bits 2025”, 2025-03-15 17:52, https://www.scrye.com/blogs/nirik/posts/2025/03/15/mid-march-infra-bits-2025/