[PATCH v2 0/6] ARM branch predictor hardening

Tue Jan 16 10:23:35 PST 2018

On Tue, Jan 16, 2018 at 09:11:54AM -0800, Florian Fainelli wrote:
> On 01/10/2018 09:16 AM, Marc Zyngier wrote:
> > On 10/01/18 16:50, Nishanth Menon wrote:
> >> On 01/08/2018 12:55 PM, Marc Zyngier wrote:
> >>> This small series implements some basic BP hardening by invalidating
> >>> the BTB on CPUs that are known to be susceptible to aliasing attacks.
> >>>
> >>> These patches are closely modelled against what we do on arm64,
> >>> although simpler as we can rely on an architected instruction to
> >>> perform the invalidation. The notable exception is Cortex-A15, where
> >>> BTB invalidation behaves like a NOP, and the only way to shoot the
> >>> predictor down is to invalidate the icache *and* to have ACTLR[0] set
> >>> to 1 (which is a secure-only operation).
> >>>
> >>
> >>
> >> btw, just wanted to understand if we had any reasons as to why 
> >> we'arent tagging these for stable? Yes, I am aware of Greg's comments 
> >> in [1], but the v7 series impacts a heck of a lot of existing products 
> >> and is not that extensive to cause too much of a pain is it?
> >>
> >> OR, am I missing some thing else?
> >>
> >> [1] http://www.kroah.com/log/blog/2018/01/06/meltdown-status/
> > 
> > This is a work in progress. It is not ready for being merged yet. It can
> > be backported to stable after being merged into mainline.
> 
> When do you expect to post a v3 of these patches? Happy to test anything
> and report back the results. As Russell pointed out earlier, his test
> cases against these patches + adding special casing for the Brahma-B15
> did not result in any improvement for his "spectre" or "meltdown" test
> cases...

I'm not expecting these patches to have an effect on my test cases.

The "spectre" test case doesn't involve a crossing a privilege boundary,
but illustrates the effect of speculation causing the cache contents to
be manipulated, and that manipulation to be readable.

If you care about that exact case, then the fix is to basically fix *all*
the software you're running to have additional mitigations in place.  For
example, in spectre, the victim function is as follows, and the branch
that we're manipulating is at "6:".

   0:   4b07            ldr     r3, [pc, #28]   ; (20 <victim_function+0x20>)
   2:   681a            ldr     r2, [r3, #0]
   4:   4282            cmp     r2, r0
   6:   d90a            bls.n   1e <victim_function+0x1e>
   8:   1818            adds    r0, r3, r0
   a:   4a06            ldr     r2, [pc, #24]   ; (24 <victim_function+0x24>)
   c:   f890 0104       ldrb.w  r0, [r0, #260]  ; 0x104
  10:   7811            ldrb    r1, [r2, #0]
  12:   eb03 2340       add.w   r3, r3, r0, lsl #9
  16:   f893 32a4       ldrb.w  r3, [r3, #676]  ; 0x2a4
  1a:   400b            ands    r3, r1
  1c:   7013            strb    r3, [r2, #0]
  1e:   4770            bx      lr

The kernel can't do anything to mitigate that, the only way that can be
mitigated is by changing the code so when the branch is mis-predicted,
and speculation continues past the branch, the subsequent loads can be
used to give information away.  One way to do that would be to ensure
that 'r0' can't be outside the legal range of values by doing something
like:

	it	ls
	movls	r0, #0

Surprisingly, this seems to work - quite why it does, I'm not entirely
sure, because if the branch is speculated not to be taken, then surely
the movls should also be speculated not to be needed.  That doesn't
appear to be the case, as adding the above code prevents 'spectre'
working (tested on A9, A72).

This needs compiler support, and means that _all_ software needs to be
rebuilt to add such mitigations to the generated code.

"meltdown" is more about retrieving data across a privilege boundary.
Unfortunately, my Juno platform has died again, so I can't test there,
but everywhere else the results agree with what ARM Ltd has published
(indicating that the tested CPUs are not vulnerable.)  It would be nice
to test a vulnerable CPU so that I know that the program works!

"meltdown-3a" is a variation on the "meltdown" which implements ARM Ltd's
specific variant 3a, and this I've proven to work, again in agreement
with ARM Ltd's published CPU table.  This involves the program taking
and dealing with the SIGILL generated by the MRC instruction to avoid
the program being terminated, and then uses the cache side-channel attack
to read the value read from the *system control register*.  This one is
pretty hard to mitigate without severely impacting performance - a BTB
flush is not the answer, because the speculation has already occurred.
The BTB really doesn't have anything to do with this attack.

Possibilities for mitigating this would be to have the undefined
instruction exception handler check whether it's an instruction accessing
CP15, parse the following code, and flush any cache lines associated that
may have been speculatively loaded, but that's going to be pretty horrid
to do, and I'd also expect to be unreliable (who knows what an attacker
would use to bypass that...)

The good news is that, from what I can tell, variant 3a only affects CP15
and not the others - I've tried reading the VFP, and that's not readable
using this technique.  Trying:

	mrc     p15, 0, %0, c1, c0, 0 @ System Control Register
	fmrs	%0, fpscr
	lsr     %0, %0, %[bit]
	and     %0, %0, #1
	lsl     %0, %0, %[shift]
	ldr     %0, [%[target], %0]

results in %0 apparently having the SCR value, not the FPSCR value.  The
observed behaviour is as if the fmrs instruction does not exist.  Why is
this significant - it means the VFP/Neon registers are not speculatively
readable, and that's important for cross-thread/task security.

Over the last few days, I turned my attention to other aspects of the
relevations from google, and then I've become ill for the third time
in the last four months - with yesterday being sufficiently bad that
I was unable to productively do very much.

I was really concerned when the news was announced on the 3rd January,
but today I'm less concerned about 32-bit ARM CPU cores that aren't
vulnerable to variant 3 from a kernel perspective.

I think much of the mitigation needs to come from the compiler side,
especially for many of the spectre-based attacks.  Yes, there's going
to be a few things in the kernel assembly that need to be changed -
which needs review, and that hasn't happened yet afaik (there's only
a limited amount of resource!)

Variant 3 CPUs are much more serious, and are harder to mitigate
without severely impacting performance - since speculatively read
data in the kernel is observable from userspace, if the cache lines
exist, then we'd have to assume that they're readable.  Thankfully
with a PIPT cache, we need the translation to occur, so rather than
needing to flush the data case, we could unmap the kernel pages and
flush the _entire_ TLB on exit from the kernel.  Not as drastic as
an entire data cache flush, but certainly not a zero cost operation.

My overall feeling today is that 32-bit ARM comes out of this really
quite well (apart from variant 3 CPUs) - yes, there's a few issues
but nothing like the problems that x86 developers are having!

So, practically, what does it mean (ignoring variant 3 CPUs for the
time being) ?

1. We need updated compilers.
2. We need a few fixes to certain kernel code paths.
3. We need to ensure all future kernel builds that need to be secure
   against this are using the updated compiler.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up