[GIT PULL] RISC-V updates for v7.0

Thu Feb 26 13:04:19 PST 2026

Hi Peter,

Responses inline.

On Thu, Feb 26, 2026 at 02:23:42PM +0100, Peter Zijlstra wrote:
>On Wed, Feb 18, 2026 at 05:57:45PM -0800, Deepak Gupta wrote:
>
>> x86 doesn't have any equivalent BTI bit in PTEs to mark code pages. IIRC, it
>> does have mechanism where a bitmap has to be prepared and each entry in bitmap
>> encodes whether a page is legacy code page (without `endbr64`) or a modern code
>> page (with `endbr64`). And CPU will consult this bitmap to suppress the fault.
>
>So; all of this is only ever relevant for programs that are mixing CFI
>and !CFI code. If a program has no CFI, all good. If a program is all
>CFI enabled, also all good.
>
>If it starts mixing things, then you get to be 'creative'.
>
>Now the thing is, if you start to do that you need to deal with both
>forward CFI (BTI) and backward CFI (shadow-stack) #CF exceptions. That
>bitmap, that can only deal with BTI, but doesn't help with shadow
>stack, so its useless.
>
>My proposal was to ignore that whole bitmap; that's dead hardware, never
>used. Instead use a software PTE bit, like ARM has, and simply eat the
>#CF look at PTE and figure out what to do.

IIRC, arm has hardware PTE bit saying this is a guarded page. That can be kept
in ITLB as part of virt addr translation during instruction fetch. So whenever
indir_call --> target happens, if target translation was already in ITLB, CPU
already knows whether to suppress the fault or not, without going to kernel.

In x86 case, using a software PTE bit would be different. There will be a fault
always and kernel won't be able to make a decision on what to do. It'll need
some delegating authority to make that decision. That delegating authority can
be a signal handler in userspace which may need a bitmap/auxilliary data
structure of sort to make that decision whether target address is a taken target
or should not be taken.

So decision point is either

  - do a software bitmap or
  - hardware bitmap (legacy interworking bitmap)
    (both will be slow).

OR

  Just don't allow/support that configuration to enable CFI. And put onus on
  workload owner to do the work to enable the feature.

Sidenote: I wish we were able to convince someone certain in Redmond to give a
sw bit back and this all would have been nicer. Given there wasn't a lot of
traction from open source for this feature, it was mostly a redmond driven
feature.

>
>Yes, this is 'slow', but my claim is that this doesn't matter. There are
>2 ways out of this slow-ness:
>
> - fully disable CFI for your program (probably not the thing you want,
>   but a quick fix, and not really less secure than partial CFI anyway).
>
> - fully enable CFI for your program (might be a bit of work).
>
>The whole mixed thing is a transition state where userspace doesn't have
>its ducks in a row. It will go away.

I have spent 8 years defining features to kill class of low-level exploits back
at Intel. And then next 6 years in places where software is deployed on these
CPUs. 

I am a security engineer and would have loved to get these features enabled.
But in all honesty, I am yet to see anyone at these places (hyperscalars)
willing to give up an ounce of perf budget (1-2% demands discussion and strong
justification) for enabling just the shadow stack feature.

So my advise would be not to care about enabling path where there is a perf hit.

Keep it simple
- Enable when all binaries have feature awareness.
- Disable when there is one binary with no feature awareness.

>
>