[PATCH 1/4] locking/atomic/x86: Silence intentional wrapping addition

Kees Cook keescook at chromium.org
Thu Apr 25 10:39:20 PDT 2024


On Thu, Apr 25, 2024 at 11:17:52AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 24, 2024 at 04:20:20PM -0700, Kees Cook wrote:
> 
> > > This is arse-about-face. Signed stuff wraps per -fno-strict-overflow.
> > > We've been writing code for years under that assumption.
> > 
> > Right, which is why this is going to take time to roll out. :) What we
> > were really doing with -fno-strict-overflow was getting rid of undefined
> > behavior. That was really really horrible; we don't need the compiler
> > hallucinating.
> 
> Right, but that then got us well defined semantics for signed overflow.

Yes, and this gets us to the next step: disambiguation for general
users. It's good that we have a well-defined overflow resolution strategy,
but our decades of persistent wrap-around flaws in the kernel show
that many devs (even experienced ones) produce code with unexpected and
unwanted (to the logic of the code) wrap-around. So we have to find a
way to distinguish wrapping and non-wrapping operations or types up
front and in a clear way.

> 
> > > You want to mark the non-wrapping case.
> > 
> > What we want is lack of ambiguity. Having done these kinds of things in
> > the kernel for a while now, I have strong evidence that we get much better
> > results with the "fail safe" approach, but start by making it non-fatal.
> > That way we get full coverage, but we don't melt the world for anyone
> > that doesn't want it, and we can shake things out over a few years. For
> > example, it has worked well for CONFIG_FORTIFY, CONFIG_UBSAN_BOUNDS,
> > KCFI, etc.
> 
> The non-fatal argument doesn't have bearing on the mark warp or mark
> non-wrap argument though.

This gets at the strategy of refactoring our code to gain our unambiguous
coverage. Since we can't sanely have a flag-day, we have to go piecemeal,
and there will continue to be places where the coverage was missed, and
so we want to progress through marking wrapping cases without BUGing the
kernel. (We don't care about catching non-wrapping -- the exceptional
condition is hitting an overflow.)

> > The riskier condition is having something wrap when it wasn't expected
> > (e.g. allocations, pointer offsets, etc), so we start by defining our
> > regular types as non-wrapping, and annotate the wrapping types (or
> > specific calculations or functions).
> 
> But but most of those you mention are unsigned. Are you saying you're
> making all unsigned variables non-wrap by default too? That's bloody
> insane.

We have a mix (and a regular confusion even in core code) where "int"
gets passed around even though at one end or another of a call chain
it's actually u32 or u16 or whatever. Regardless, yes, the next step
after signed overflow mitigation would be unsigned overflow mitigation,
and as you suggest, it's much more tricky.

> > For signed types in particular, wrapping is overwhelmingly the
> > uncommon case, so from a purely "how much annotations is needed"
> > perspective, marking wrapping is also easiest. Yes, there are cases of
> > expected wrapping, but we'll track them all down and get them marked
> > unambiguously. 
> 
> But I am confused now, because above you seem to imply you're making
> unsigned non-wrap too, and there wrapping is *far* more common, and I
> must say I hate this wrapping_add() thing with a passion.

Yes, most people are not a fan of the wrapping_*() helpers, which is why
I'm trying to get a typedef attribute created. But again, to gain the
"fail safe by default" coverage, we have to start with the assumption
that the default is non-wrapping, and mark those that aren't. (Otherwise
we're not actually catching unexpected cases.) And no, it's not going
to be over-night. It's taken almost 5 years to disambiguate array bounds
and we're still not done. :)

> > One thing on the short list is atomics, so here we are. :)
> 
> Well, there are wrapping and non-wrapping users of atomic. If only C had
> generics etc.. (and yeah, _Generic doesn't really count).

Non-wrapping users of atomics should be using refcount_t, which is
our non-wrapping atomic type. But regardless, atomics are internally
wrapping, yes?

Anyway, I suspect this whole plan needs wider discussion. I will write
up a more complete RFC that covers my plans, including the rationale for
why we should adopt this in a certain way. (These kinds of strategic RFCs
don't usually get much traction since our development style is much more
"show the patches", so that's why I have been just sending patches. But
since it's a pretty big topic, I'll give it a shot...)

-- 
Kees Cook



More information about the linux-arm-kernel mailing list