Conclusions from CVE-2024-3094 (libxz disaster)

Sun Mar 31 10:06:43 PDT 2024

> Le 31 mars 2024 à 18:46, Daniel Golle <daniel at makrotopia.org> a écrit :
> 
> On Sun, Mar 31, 2024 at 12:05:03PM +0200, Thibaut wrote:
>> 
>>> Le 31 mars 2024 à 01:07, Elliott Mitchell <ehem+openwrt at m5p.com> a écrit :
>>> 
>>>> Normally upstream publishes release tarballs that are different than the
>>>> automatically generated ones in GitHub. In these modified tarballs, a
>>>> malicious version of build-to-host.m4 is included to execute a script
>>>> during the build process.
>>> 
>>> So the malicious source code was part of all tarballs, but only the
>>> tarballs with the modified `build-to-host.m4` would trigger the malicious
>>> payload.
>>> 
>>> So obtaining GitHub's tarballs which came directly from the Git
>>> repository *does* avoid the breach.
>> 
>> https://git.tukaani.org/?p=xz.git;a=commitdiff;h=f9cf4c05edd14dedfe63833f8ccbe41b55823b00
>> 
>> Let’s not lure ourselves into thinking that not using upstream-provided tarballs but upstream-provided repo instead is inherently safer. With adversarial upstream, *nothing* is safe anyway.
> 
> Just using git checkouts (or **repoducible** tarballs generated from a
> repo's git-ref, ie. tag or commit) by itself of course doesn't help
> much.

*nod*

> But for myself, maintaining a medium 2-digit number of packages, using
> git checkouts (or **reproducible** tarballs generated from git
> checkouts) would mean that I can at least be sure that the git
> commits I've been seeing and the diff between version tags **would
> really correspond to the content of tarball**, without having to put
> extra work just into that (which imho nobody does).

I believe most maintainers don’t even do commits lookup. Happy to be proven wrong tho.

> I've never claimed that this alone is the solution, but if we are
> already used to
> 
> a) the content of a release tarball not matching the git repo
>   (because of `make dist` autotools nonsense, for example),
> b) the hash of such tarball being different depending on who generates
>   it with subtle difference such as the folder name,
> c) people all the time "fix" PKG_MIRROR_HASH without anyone having
>   any option to validate the cause for the "wrong" hash in first
>   place.
> 
> Then the added security of PKG_HASH and esp. PKG_MIRROR_HASH is very
> small. Too small, if you ask me.

ACK.

> And other than the complex
> social/economical/political problems which lead to something like the
> xz backdoor (out of question: those are the bigger problems), that's a
> technical problem we could quite easily improve **and it would have
> been sufficient to prevent the attack** in this case.
> 
> There is a reason the attacker(s) went through great lengths to move
> the official mirror site of the project, change the PGP key and hide
> the key piece of the exploit in the tarballs they generated (and
> signed) instead of in a git commit. This is not by chance.

You could chalk it on extra precaution, but they could conceivably have achieved the same result more openly, given the level of control they had over the repo and the level of scrutiny they received (essentially none).

> What we need is "Reproducible Source/Release Tarballs", not as a
> solution to all our problems, but as a **pre-condition** which
> currently isn't met for obvious reasons.

ACK.

> Hence I'm still arguing that the lesser resource use of downloading
> Github archive/codeload/release tarballs is not worth the loss of
> integrity and audit-trail of git.

ACK.

> Yes, I know SHA-1 is outdated, but in the context of git it's not so
> easy to add lots of random padding which would be required to generate
> a hash collission, which has yet to be seen even for contexts with
> much more freedom than the narrow syntax of a git diff (and commit
> message). So sure, it's not perfect, but it's better than nothing.
> 
> And while release tarballs (being *delibertely* different from the content
> of the source repo at their corresponding tag for things like an added
> VERSION or ChangeLog file or stuff like that which is information the
> build process could otherwise learn from .git) have some small arguable
> value, hard or impossible to reproduce Github-generated tarballs really
> do NOT have any value. They are an obstacle, and lure people into bad
> practices such as all those "Fix PKG_MIRROR_HASH" commits which become
> the norm (and should really not).

ACK.

I agree with all that, although earlier you seemed to claim that using GH-generated tarballs would somehow be better than using upstream-provided tarballs, and here it seems you’re reverting course :)

My point is: neither is secure, and neither is using a git checkout secure *by itself*: if upstream is compromised, and nobody pays attention, it’s *still* game over.

And given the number of packages in our packages repo and the corresponding available manpower, I frankly doubt using one or the other would have made any difference here, as I expect the average maintainership level to be between 0 and 1 in your list.

>> And even when upstream repo isn’t entirely under adversarial control, a bad actor can sneak stuff in:
>> https://github.com/libarchive/libarchive/commit/6110e9c82d8ba830c3440f36b990483ceaaea52c
> 
> I've seen that, and by itself it does not present a security risk in
> the context libarchive is intended to be used.

I think you might be a tad over assertive here.

Someone already disagrees with you:
https://github.com/libarchive/libarchive/pull/1609#issuecomment-2028388707

BTW, if you haven’t followed:
https://github.com/libarchive/libarchive/issues/2103

Cheers,
T