Wiki under heavy load by new wave of scrapers
Baptiste Jonglez
baptiste at bitsofnetworks.org
Thu Apr 17 15:01:26 PDT 2025
Hello,
The wiki has been under heavy load for a few days because of a new kind of
scrapers (thank you dear LLM companies)
Requests come from a huge number of residential IP addresses,
predominantly from Brazil but also from many other countries.
The requests use legitimate-looking User-Agent, but they are very likely
made-up (among classical ones, there is dubious stuff like Windows 98,
MacOS PowerPC, Internet Explorer 6...)
As a result, this traffic is extremely difficult to rate-limit or block.
I'm pretty certain that the people behind these residential IPs are being
paid to serve as proxy for LLM companies scraping, precisely to make the
traffic very hard to block.
This looks related: https://community.openai.com/t/tips-experience-how-i-used-residential-proxies-to-collect-training-data-for-ai/1230577
Ideas welcome...
Baptiste
More information about the openwrt-adm
mailing list