Wiki under heavy load by new wave of scrapers

Christian Marangi (Ansuel) ansuelsmth at gmail.com
Thu Apr 17 18:21:46 PDT 2025


Il giorno ven 18 apr 2025 alle ore 00:01 Baptiste Jonglez
<baptiste at bitsofnetworks.org> ha scritto:
>
> Hello,
>
> The wiki has been under heavy load for a few days because of a new kind of
> scrapers (thank you dear LLM companies)
>
> Requests come from a huge number of residential IP addresses,
> predominantly from Brazil but also from many other countries.
>
> The requests use legitimate-looking User-Agent, but they are very likely
> made-up (among classical ones, there is dubious stuff like Windows 98,
> MacOS PowerPC, Internet Explorer 6...)
>
> As a result, this traffic is extremely difficult to rate-limit or block.
>
> I'm pretty certain that the people behind these residential IPs are being
> paid to serve as proxy for LLM companies scraping, precisely to make the
> traffic very hard to block.
>
> This looks related: https://community.openai.com/t/tips-experience-how-i-used-residential-proxies-to-collect-training-data-for-ai/1230577
>
> Ideas welcome...
>

>From what I notice, service like bootlin elixir and even some linux
branch are starting using anubis.

We might consider starting to use it and serve static page for indexing...
The overload is caused by bot accessing dynamic pages, static page
should not cause that big of load.



More information about the openwrt-adm mailing list