[RFC PATCH v2 20/21] x86: Add support for CONFIG_CFI_CLANG
Kees Cook
keescook at chromium.org
Mon May 16 15:59:41 PDT 2022
On Mon, May 16, 2022 at 08:30:47PM +0200, Peter Zijlstra wrote:
> On Mon, May 16, 2022 at 10:15:00AM -0700, Sami Tolvanen wrote:
> > On Mon, May 16, 2022 at 2:54 AM Peter Zijlstra <peterz at infradead.org> wrote:
> > >
> > > On Fri, May 13, 2022 at 01:21:58PM -0700, Sami Tolvanen wrote:
> > > > With CONFIG_CFI_CLANG, the compiler injects a type preamble
> > > > immediately before each function and a check to validate the target
> > > > function type before indirect calls:
> > > >
> > > > ; type preamble
> > > > __cfi_function:
> > > > int3
> > > > int3
> > > > mov <id>, %eax
> > > > int3
> > > > int3
> > > > function:
> > > > ...
> > >
> > > When I enable CFI_CLANG and X86_KERNEL_IBT I get:
> > >
> > > 0000000000000c80 <__cfi_io_schedule_timeout>:
> > > c80: cc int3
> > > c81: cc int3
> > > c82: b8 b5 b1 39 b3 mov $0xb339b1b5,%eax
> > > c87: cc int3
> > > c88: cc int3
> > >
> > > 0000000000000c89 <io_schedule_timeout>:
> > > c89: f3 0f 1e fa endbr64
> > >
> > >
> > > That seems unfortunate. Would it be possible to get an additional
> > > compiler option to suppress the endbr for all symbols that get a __cfi_
> > > preaamble?
> >
> > What's the concern with the endbr? Dropping it would currently break
> > the CFI+IBT combination on newer hardware, no?
>
> Well, yes, but also that combination isn't very interesting. See,
>
> https://lore.kernel.org/all/20220420004241.2093-1-joao@overdrivepizza.com/T/#m5d67fb010d488b2f8eee33f1eb39d12f769e4ad2
>
> and the patch I did down-thread:
>
> https://lkml.kernel.org/r/YoJKhHluN4n0kZDm@hirez.programming.kicks-ass.net
>
> If we have IBT, then FineIBT is a much better option than kCFI+IBT.
I'm still not convinced about this, but I'm on the fence.
Cons:
- FineIBT does callee-based hash verification, which means any
attacker-constructed memory region just has to have an endbr and nops at
"shellcode - 9". KCFI would need the region to have the hash at
"shellcode - 6" and an endbr at "shellcode". However, that hash is well
known, so it's not much protection.
- Potential performance hit due to making an additional "call" outside
the cache lines of both caller and callee.
Pros:
- FineIBT can be done without read access to the kernel text, which will
be nice in the exec-only future.
I'd kind of like the "dynamic FineIBT conversion" to be a config option,
at least at first. We could at least do performance comparisons between
them.
> Removing that superfluous endbr also shrinks the whole thing by 4 bytes.
>
> So I'm fine with the compiler generating working code for that
> combination; but please get me an option to supress it in order to save
> those pointless bytes. All this CFI stuff is enough bloat as it is.
So, in the case of "built for IBT but running on a system without IBT",
no rewrite happens, and no endbr is present (i.e. address-taken
functions have endbr emission suppressed)?
Stock kernel build:
function:
[normal code]
caller:
call __x86_indirect_thunk_r11
IBT kernel build:
function:
endbr
[normal code]
caller:
call __x86_indirect_thunk_r11
CFI kernel build:
__cfi_function:
[int3/mov/int3 preamble]
function:
[normal code]
caller:
cmpl \hash, -6(%r11)
je .Ltmp1
ud2
.Ltmp1:
call __x86_indirect_thunk_r11
CFI+IBT kernel build:
__cfi_function:
[int3/mov/int3 preamble]
function:
endbr
[normal code]
caller:
cmpl \hash, -6(%r11)
je .Ltmp1
ud2
.Ltmp1:
call __x86_indirect_thunk_r11
CFI+IBT+FineIBT kernel build:
__cfi_function:
[int3/mov/int3 preamble]
function:
/* no endbr emitted */
[normal code]
caller:
cmpl \hash, -6(%r11)
je .Ltmp1
ud2
.Ltmp1:
call __x86_indirect_thunk_r11
at boot, if IBT is detected:
- replace __cfi_function with:
endbr
call __fineibt_\hash
- replace caller with:
movl \hash, %r10d
sub $9, %r11
nop2
call *%r11
- inject all the __fineibt_\hash elements via module_alloc()
__fineibt_\hash:
xor \hash, %r10
jz 1f
ud2
1: ret
int3
--
Kees Cook
More information about the linux-arm-kernel
mailing list