Wiki under heavy load by new wave of scrapers
Ted Hess
thess at kitschensync.net
Sun Apr 20 08:58:26 PDT 2025
Hi all -
Yes, it has been frustrating lately trying to chase down and deal with
the most recent DDoS attacks. I briefly tried rate-limiting the entire
site for all. I'm not sure I was strict enough - It didn't alleviate the
load while still presenting reasonable service. I removed it but was
thinking of trying something slightly different. BTW - It takes 19-20
URL fetches to bring up the home page, so some sore of burst allowance
must be made.
I don't think Anubis is the ideal solution - It will be one more
configuration profile to manage and keep up-to-date. Additionally, it
puts a large requirement on the browser/client capability which is
probably something we don't need to add to the list things we will need
to respond to from unsatisfied users. I'm willing to try it anyway -- if
you all think it would help.
The only other solution(s) that come to mind are more costly.
/ted
On 4/17/2025 6:01:26 PM, "Baptiste Jonglez"
<baptiste at bitsofnetworks.org> wrote:
>Hello,
>
>The wiki has been under heavy load for a few days because of a new kind of
>scrapers (thank you dear LLM companies)
>
>Requests come from a huge number of residential IP addresses,
>predominantly from Brazil but also from many other countries.
>
>The requests use legitimate-looking User-Agent, but they are very likely
>made-up (among classical ones, there is dubious stuff like Windows 98,
>MacOS PowerPC, Internet Explorer 6...)
>
>As a result, this traffic is extremely difficult to rate-limit or block.
>
>I'm pretty certain that the people behind these residential IPs are being
>paid to serve as proxy for LLM companies scraping, precisely to make the
>traffic very hard to block.
>
>This looks related: https://community.openai.com/t/tips-experience-how-i-used-residential-proxies-to-collect-training-data-for-ai/1230577
>
>Ideas welcome...
>
>Baptiste
More information about the openwrt-adm
mailing list