Removing outdated releases from downloads.o.o

Etienne Champetier champetier.etienne at gmail.com
Sun Jan 7 02:35:21 PST 2024


Le sam. 6 janv. 2024 à 17:05, Christian Marangi (Ansuel)
<ansuelsmth at gmail.com> a écrit :
>
> Il giorno sab 6 gen 2024 alle ore 11:57 Etienne Champetier
> <champetier.etienne at gmail.com> ha scritto:
> >
> > (resending in text only)
> >
> > Le sam. 6 janv. 2024 à 00:13, Baptiste Jonglez
> > <baptiste at bitsofnetworks.org> a écrit :
> > >
> > > Hi,
> > >
> > > On 03-01-24, Paul Spooren wrote:
> > > > Hi, I’m currently checking what needs to remove mirror-01/02 and replace them by a both faster and cheaper machine.
> > > >
> > > > While we store all old releases (e.g. 5 year old 17.01), all links on the website point to archive.o.o. I suggest that we only serve snapshots and supported releases on the main download server and serve all other releases via the archive server only.
> > > >
> > > > This would break very old forum links but for those cases we could either say such old shouldn’t be used in the first place or add a forwarding rule to the archive server.
> > >
> > > I agree that the current amount of storage is becoming unmanageable.
> > > I put some storage numbers for mirror operators on [1], the TL;DR is that
> > > we generate around +700 GB of new release files per year and it's increasing.
> > > This is really a lot.
> > >
> > > However, changing the model needs some thinking, around several axes:
> > >
> > > - determining relevant classes of data.  We could have three logical
> > >   repositories: snapshots, active releases, old releases.  They might
> > >   still technically be hosted on the same server, but they would be easier
> > >   to separate.
> > >
> > > - catering for mirror operators: currently some operators decide to mirror
> > >   only part of the data for space reason, but it's not consistent.  The
> > >   classes of data above would help: we could ask every mirror to carry at
> > >   least "active releases" as a common denominator for consistency.
> > >
> > > - delivery: keeping stable URLs over time, ensuring HTTP stays available,
> > >   taking into account limited clients like uclient-fetch for stuff like
> > >   TLS config and redirection
> > >
> > > - disaster recovery: backups, recovery
> > >
> > > - server cost vs. reliability vs. stability over time.  My feeling is that
> > >   you can get only two out of three.  For instance, donations or free
> > >   credits usually don't last for a long time.  The current download server
> > >   is mostly reliable and stable, but expensive.  And so on.
> > >
> > > From here, there are many solutions: put everything on object storage and
> > > be done with it; keep an expensive and reliable server for active releases
> > > only and rely on the community for the rest of the data; manage a second
> > > cheaper server for the rest as part of the OpenWrt infra; use object
> > > storage for the rest.
> > >
> > > I would go with the lowest maintenance option whenever possible, like a
> > > beefy VM for active releases and snapshots, and move to object storage for
> > > the rest (not Amazon of course, but a EU provider such as Infomaniak [2]).
> > > Then have a frontend configuration such as the current CDN that would
> > > transparently proxy to the right place.
> >
> > There are object storages with 0 egress cost (at least for now),
> > this gives you pretty predictable costs and way better reliability,
> > the most well known would be Cloudflare R2.
>
> This is what we currently use for our ccache cache for github CI test.
> 200gb under s3 (cloudflare r2) and cost only 1$ a month.

Nice ! I see we just upload the full cache at the end and download at start,
was https://github.com/mozilla/sccache looked at ? (never used it)

> > There are also even cheaper object storage that have agreement with CDN
> > to have 0 egree cost, like backblaze with cloudflare / fastly / ...
>
> Also fastly offer 0 egree cost?

I don't think so, it's between the object storage and the CDN, CDN
would still have a cost
it's more if some CDN offer discount / sponsorship for opensource we
might be able to pair them a cheap
object storage and get an even lower price, but R2 might already be
cheap enough ($0.015 / GB-month)

>
> >
> > What are the current costs, storage needs and bandwidth usage ?
> >



More information about the openwrt-adm mailing list