Wiki under heavy load by new wave of scrapers

Baptiste Jonglez baptiste at bitsofnetworks.org
Thu Apr 17 15:01:26 PDT 2025


Hello,

The wiki has been under heavy load for a few days because of a new kind of
scrapers (thank you dear LLM companies)

Requests come from a huge number of residential IP addresses,
predominantly from Brazil but also from many other countries.

The requests use legitimate-looking User-Agent, but they are very likely
made-up (among classical ones, there is dubious stuff like Windows 98,
MacOS PowerPC, Internet Explorer 6...)

As a result, this traffic is extremely difficult to rate-limit or block.

I'm pretty certain that the people behind these residential IPs are being
paid to serve as proxy for LLM companies scraping, precisely to make the
traffic very hard to block.

This looks related: https://community.openai.com/t/tips-experience-how-i-used-residential-proxies-to-collect-training-data-for-ai/1230577

Ideas welcome...

Baptiste



More information about the openwrt-adm mailing list