[PATCH v3 2/2] lkdtm: Add Shadow Call Stack tests

Fri Mar 4 06:34:30 PST 2022

On 3/3/22 10:42, Kees Cook wrote:
> On Wed, Mar 02, 2022 at 11:43:39PM -0800, Dan Li wrote:
>> Add tests for SCS (Shadow Call Stack) based
>> backward CFI (as implemented by Clang and GCC).
> 
> Cool; thanks for writing these!
> 
>> +lkdtm-$(CONFIG_LKDTM)		+= scs.o
> 
> I'd expect these to be in cfi.c, rather than making a new source file.
> 

Got it.

>> +static noinline void lkdtm_scs_clear_lr(void)
>> +{
>> +	unsigned long *lr = (unsigned long *)__builtin_frame_address(0) + 1;
>> +
>> +	asm volatile("str xzr, [%0]\n\t" : : "r"(lr) : "x30");
> 
> Is the asm needed here? Why not:
> 
> 	unsigned long *lr = (unsigned long *)__builtin_frame_address(0) + 1;
> 
> 	*lr = 0;
> 

Yeah, with "volatile", this one looks better.

>> +
>> +/*
>> + * This tries to call a function protected by Shadow Call Stack,
>> + * which corrupts its own return address during execution.
>> + * Due to the protection, the corruption will not take effect
>> + * when the function returns.
>> + */
>> +void lkdtm_CFI_BACKWARD_SHADOW(void)
> 
> I think these two tests should be collapsed into a single one.
> 

It seems that there is currently no cross-line matching in
selftests/lkdtm/run.sh, if we put these two into one function and
assume we could make noscs_set_lr _survivable_ (like in your example).

Then we could only match "CFI_BACKWARD_SHADOW ok: scs takes effect."
in texts.txt

But if the test result is:
XPASS: Unexpectedly survived lr corruption without scs?
ok: scs takes effect.

It may not be a real pass, but the xxx_set_lr function doesn't work.

>> +{
>> +#ifdef CONFIG_ARM64
>> +	if (!IS_ENABLED(CONFIG_SHADOW_CALL_STACK)) {
>> +		pr_err("FAIL: kernel not built with CONFIG_SHADOW_CALL_STACK\n");
>> +		return;
>> +	}
>> +
>> +	pr_info("Trying to corrupt lr in a function with scs protection ...\n");
>> +	lkdtm_scs_clear_lr();
>> +
>> +	pr_err("ok: scs takes effect.\n");
>> +#else
>> +	pr_err("XFAIL: this test is arm64-only\n");
>> +#endif
> 
> This is slightly surprising -- we have no detection when a function has
> its non-shadow-stack return address corrupted: it just _ignores_ the
> value stored there. That seems like a missed opportunity for warning
> about an unexpected state.
> 

Yes.
Actually I used to try in the plugin to add a detection before the function
returns, and call a callback when a mismatch is found. But since almost
every function has to be instrumented, the performance penalty is
improved from <3% to ~20% (rough calculation, should still be optimized).

>> +}
>> +
>> +/*
>> + * This tries to call a function not protected by Shadow Call Stack,
>> + * which corrupts its own return address during execution.
>> + */
>> +void lkdtm_CFI_BACKWARD_SHADOW_WITH_NOSCS(void)
>> +{
>> +#ifdef CONFIG_ARM64
>> +	if (!IS_ENABLED(CONFIG_SHADOW_CALL_STACK)) {
>> +		pr_err("FAIL: kernel not built with CONFIG_SHADOW_CALL_STACK\n");
>> +		return;
> 
> Other tests try to give some hints about failures, e.g.:
> 
> 		pr_err("FAIL: cannot change for SCS\n");
> 		pr_expected_config(CONFIG_SHADOW_CALL_STACK);
> 
> Though, having the IS_ENABLED in there makes me wonder if this test
> should instead be made _survivable_ on failure. Something like this,
> completely untested:
> 
> 
> #ifdef CONFIG_ARM64
> static noinline void lkdtm_scs_set_lr(unsigned long *addr)
> {
> 	unsigned long **lr = (unsigned long **)__builtin_frame_address(0) + 1;
> 	*lr = addr;
> }
> 
> /* Function with __noscs attribute clears its return address. */
> static noinline void __noscs lkdtm_noscs_set_lr(unsigned long *addr)
> {
> 	unsigned long **lr = (unsigned long **)__builtin_frame_address(0) + 1;
> 	*lr = addr;
> }
> #endif
> 
> 
> void lkdtm_CFI_BACKWARD_SHADOW(void)
> {
> #ifdef CONFIG_ARM64
> 
> 	/* Verify the "normal" condition of LR corruption working. */
> 	do {
> 		/* Keep label in scope to avoid compiler warning. */
> 		if ((volatile int)0)
> 			goto unexpected;
> 
> 		pr_info("Trying to corrupt lr in a function without scs protection ...\n");
> 		lkdtm_noscs_set_lr(&&expected);
> 
> unexpected:
> 		pr_err("XPASS: Unexpectedly survived lr corruption without scs?!\n");
> 		break;
> 
> expected:
> 		pr_err("ok: lr corruption redirected without scs.\n");
> 	} while (0);
> 
> 
> 	do {
> 		/* Keep labe in scope to avoid compiler warning. */
> 		if ((volatile int)0)
> 			goto good_scs;
> 
> 		pr_info("Trying to corrupt lr in a function with scs protection ...\n");
> 		lkdtm_scs_set_lr(&&bad_scs);
> 
> good_scs:
> 		pr_info("ok: scs takes effect.\n");
> 		break;
> 
> bad_scs:
> 		pr_err("FAIL: return address rewritten!\n");
> 		pr_expected_config(CONFIG_SHADOW_CALL_STACK);
> 	} while (0);
> #else
> 	pr_err("XFAIL: this test is arm64-only\n");
> #endif
> }
> 

Thanks for the example, Kees :)
This code (with a little modification) works correctly with clang 12,
but to make sure it's always correct, I think we might need to add the
__attribute__((optnone)) attribute to it, because under -O2 the result
doesn't seem to be "very stable" (as in your example in the next email).

> And we should, actually, be able to make the "set_lr" functions be
> arch-specific, leaving the test itself arch-agnostic....
> 

I'm not sure if my understanding is correct, do it means we should
remove the "#ifdef CONFIG_ARM64" in lkdtm_CFI_BACKWARD_SHADOW?

Then we may not be able to distinguish between failures caused by
platform unsupported (XFAIL) and features not enabled (or not
working properly).

Thanks,
Dan.